std.algorithm.splitter on a string not always bidirectional (page 2)

On Friday, 22 January 2021 at 17:29:08 UTC, Steven Schveighoffer wrote: > On 1/22/21 11:57 AM, Jon Degenhardt wrote: [...] >> Another way to look at it: If split (eager) took a predicate, that 'xyz.splitter(args).back' and 'xyz.split(args).back' should produce the same result. But they will not with the example given. > > With what example given? The example you gave is incomplete (what are args?) [...] Here is a case for which iterating forwards yields a different sequence from iterating backwards (if we were to allow the latter): "bbcbcba".splitter("bcb") Iterating forwards gives us the subranges: "b", "cba". Iterating backwards gives us: "a", "bbc". So it cannot be a bidirectional range, at least not in the expected sense that iterating from the back ought to give us the same subranges as iterating from the front, only in a reverse order. Here iterating backwards yields a completely different decomposition. --T

On 1/22/21 2:56 PM, H. S. Teoh wrote: > On Friday, 22 January 2021 at 17:29:08 UTC, Steven Schveighoffer wrote: >> On 1/22/21 11:57 AM, Jon Degenhardt wrote: > [...] >>> Another way to look at it: If split (eager) took a predicate, that 'xyz.splitter(args).back' and 'xyz.split(args).back' should produce the same result. But they will not with the example given. >> >> With what example given? The example you gave is incomplete (what are args?) > [...] > > Here is a case for which iterating forwards yields a different sequence > from iterating backwards (if we were to allow the latter): > > "bbcbcba".splitter("bcb") > > Iterating forwards gives us the subranges: "b", "cba". > > Iterating backwards gives us: "a", "bbc". > > So it cannot be a bidirectional range, at least not in the expected sense > that iterating from the back ought to give us the same subranges as > iterating from the front, only in a reverse order. Here iterating > backwards yields a completely different decomposition. Yes thank you! That makes sense, and I wasn't thinking of that. I still believe that any splitter based on individual elements should be bidirectional. -Steve

January 23, 2021

Re: std.algorithm.splitter on a string not always bidirectional

Posted by Steven Schveighoffer
in reply to Jon Degenhardt

Permalink

Steven Schveighoffer

Posted in reply to Jon Degenhardt

Permalink

On 1/22/21 2:13 PM, Jon Degenhardt wrote:
> On Friday, 22 January 2021 at 17:29:08 UTC, Steven Schveighoffer wrote:
>> On 1/22/21 11:57 AM, Jon Degenhardt wrote:
>>>
>>> I think the idea is that if a construct like 'xyz.splitter(args)' produces a range with the sequence of elements {"a", "bc", "def"}, then 'xyz.splitter(args).back' should produce "def". But, if finding the split points starting from the back results in something like {"f", "de", "abc"} then that relationship hasn't held, and the results are unexpected.
>>
>> But that is possible with all 3 splitter variants. Why is one allowed to be bidirectional and the others are not?
> 
> I'm not defending it, just explaining what I believe the thinking was based on the examination I did. It wasn't just looking at the code, there was a discussion somewhere. A forum discussion, PR discussion, bug or code comments. Something somewhere, but I don't remember exactly.
> 
> However, to answer your question - The relationship described is guaranteed if the basis for the split is a single element. If the range is a string, that's a single 'char'. If the range is composed of integers, then a single integer. Note that if the basis for the split is itself a range, then the relationship described is not guaranteed.
> 
> Personally, I can see a good argument that bidirectionality should not be supported in any of these cases, and instead force the user to choose between eager splitting or reversing the range via retro. For the common case of strings, the further argument could be made that the distinction between char and dchar is another point of inconsistency.

I would not want that. My use case is splitting a string on punctuation, and using the lazy result for testing equality of something. But I have some special suffix items that I want to handle first (and pop off).

dchar/char inconsistency isn't a problem, because they are both dchar ranges (and both are bidirectional).

> Regardless whether the choices made were the best choices, there was some thinking that went into it, and it is worth understanding the thinking when considering changes.

I believe there was that thinking. It's why I posted, because before I filed a bug, I wanted to make sure there wasn't a good reason.

It looks like there is NOT a good reason for the single-item based splitting as you say to prevent bidirectional access. But there IS a good reason (thanks for the example H.S. Teoh) to prevent it for multi-element delimiters.

-Steve

Forums