Thread overview | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
January 21, 2021 std.algorithm.splitter on a string not always bidirectional | ||||
---|---|---|---|---|
| ||||
auto sp1 = "a|b|c".splitter('|'); writeln(sp1.back); // ok auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v)); writeln(sp2.back); // error, not bidirectional Why? is it an oversight, or is there a good reason for it? -Steve |
January 21, 2021 Re: std.algorithm.splitter on a string not always bidirectional | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Thu, Jan 21, 2021 at 05:43:37PM -0500, Steven Schveighoffer via Digitalmars-d-learn wrote: > auto sp1 = "a|b|c".splitter('|'); > > writeln(sp1.back); // ok > > auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v)); > > writeln(sp2.back); // error, not bidirectional > > Why? is it an oversight, or is there a good reason for it? [...] Likely an oversight. But I wouldn't be surprised if there was some surprising corner case for which this doesn't work / would have onerous characteristics. T -- MSDOS = MicroSoft's Denial Of Service |
January 21, 2021 Re: std.algorithm.splitter on a string not always bidirectional | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On 1/21/21 4:51 PM, H. S. Teoh wrote: > But I wouldn't be surprised if there was some > surprising corner case for which this doesn't work / would have onerous > characteristics. Likely. :) Here is one for uniq.back, which includes a link at the bottom for zip.back: https://issues.dlang.org/show_bug.cgi?id=16588 Ali |
January 22, 2021 Re: std.algorithm.splitter on a string not always bidirectional | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Thursday, 21 January 2021 at 22:43:37 UTC, Steven Schveighoffer wrote:
> auto sp1 = "a|b|c".splitter('|');
>
> writeln(sp1.back); // ok
>
> auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v));
>
> writeln(sp2.back); // error, not bidirectional
>
> Why? is it an oversight, or is there a good reason for it?
>
> -Steve
I believe the reason is two-fold. First, splitter is lazy. Second, the range splitting is defined in the forward direction, not the reverse direction. A bidirectional range is only supported if it is guaranteed that the splits will occur at the same points in the range when run in either direction. That's why the single element delimiter is supported. Its clearly the case for the predicate function in your example. If that's known to be always true then perhaps it would make sense to enhance splitter to generate bidirectional results in this case.
--Jon
|
January 22, 2021 Re: std.algorithm.splitter on a string not always bidirectional | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jon Degenhardt | On Friday, 22 January 2021 at 05:51:38 UTC, Jon Degenhardt wrote:
> On Thursday, 21 January 2021 at 22:43:37 UTC, Steven Schveighoffer wrote:
>> auto sp1 = "a|b|c".splitter('|');
>>
>> writeln(sp1.back); // ok
>>
>> auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v));
>>
>> writeln(sp2.back); // error, not bidirectional
>>
>> Why? is it an oversight, or is there a good reason for it?
>>
>> -Steve
>
> I believe the reason is two-fold. First, splitter is lazy. Second, the range splitting is defined in the forward direction, not the reverse direction. A bidirectional range is only supported if it is guaranteed that the splits will occur at the same points in the range when run in either direction. That's why the single element delimiter is supported. Its clearly the case for the predicate function in your example. If that's known to be always true then perhaps it would make sense to enhance splitter to generate bidirectional results in this case.
>
> --Jon
Note that the predicate might use a random number generator to pick the split points. Even for same sequence of random numbers, the split points would be different if run from the front than if run from the back.
|
January 22, 2021 Re: std.algorithm.splitter on a string not always bidirectional | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jon Degenhardt | On 1/22/21 12:55 AM, Jon Degenhardt wrote:
> On Friday, 22 January 2021 at 05:51:38 UTC, Jon Degenhardt wrote:
>> On Thursday, 21 January 2021 at 22:43:37 UTC, Steven Schveighoffer wrote:
>>> auto sp1 = "a|b|c".splitter('|');
>>>
>>> writeln(sp1.back); // ok
>>>
>>> auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v));
>>>
>>> writeln(sp2.back); // error, not bidirectional
>>>
>>> Why? is it an oversight, or is there a good reason for it?
>>>
>>
>> I believe the reason is two-fold. First, splitter is lazy. Second, the range splitting is defined in the forward direction, not the reverse direction. A bidirectional range is only supported if it is guaranteed that the splits will occur at the same points in the range when run in either direction. That's why the single element delimiter is supported. Its clearly the case for the predicate function in your example. If that's known to be always true then perhaps it would make sense to enhance splitter to generate bidirectional results in this case.
>>
>
> Note that the predicate might use a random number generator to pick the split points. Even for same sequence of random numbers, the split points would be different if run from the front than if run from the back.
I think this isn't a good explanation.
All forms of splitter accept a predicate (including the one which supports a bi-directional result). Many other phobos algorithms that accept a predicate provide bidirectional support. The splitter result is also a forward range (which makes no sense in the context of random splits).
Finally, I'd suggest that even if you split based on a subrange that is also bidirectional, it doesn't make sense that you couldn't split backwards based on that. Common sense says a range split on substrings is the same whether you split it forwards or backwards.
I can do this too (and in fact I will, because it works, even though it's horrifically ugly):
auto sp3 = "a.b|c".splitter!((c, unused) => !isAlphaNum(c))('?');
writeln(sp3.back); // ok
Looking at the code, it looks like the first form of spltter uses a different result struct than the other two (which have a common implementation). It just needs cleanup.
-Steve
|
January 22, 2021 Re: std.algorithm.splitter on a string not always bidirectional | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Friday, 22 January 2021 at 14:14:50 UTC, Steven Schveighoffer wrote:
> On 1/22/21 12:55 AM, Jon Degenhardt wrote:
>> On Friday, 22 January 2021 at 05:51:38 UTC, Jon Degenhardt wrote:
>>> On Thursday, 21 January 2021 at 22:43:37 UTC, Steven Schveighoffer wrote:
>>>> auto sp1 = "a|b|c".splitter('|');
>>>>
>>>> writeln(sp1.back); // ok
>>>>
>>>> auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v));
>>>>
>>>> writeln(sp2.back); // error, not bidirectional
>>>>
>>>> Why? is it an oversight, or is there a good reason for it?
>>>>
>>>
>>> I believe the reason is two-fold. First, splitter is lazy. Second, the range splitting is defined in the forward direction, not the reverse direction. A bidirectional range is only supported if it is guaranteed that the splits will occur at the same points in the range when run in either direction. That's why the single element delimiter is supported. Its clearly the case for the predicate function in your example. If that's known to be always true then perhaps it would make sense to enhance splitter to generate bidirectional results in this case.
>>>
>>
>> Note that the predicate might use a random number generator to pick the split points. Even for same sequence of random numbers, the split points would be different if run from the front than if run from the back.
>
> I think this isn't a good explanation.
>
> All forms of splitter accept a predicate (including the one which supports a bi-directional result). Many other phobos algorithms that accept a predicate provide bidirectional support. The splitter result is also a forward range (which makes no sense in the context of random splits).
>
> Finally, I'd suggest that even if you split based on a subrange that is also bidirectional, it doesn't make sense that you couldn't split backwards based on that. Common sense says a range split on substrings is the same whether you split it forwards or backwards.
>
> I can do this too (and in fact I will, because it works, even though it's horrifically ugly):
>
> auto sp3 = "a.b|c".splitter!((c, unused) => !isAlphaNum(c))('?');
>
> writeln(sp3.back); // ok
>
> Looking at the code, it looks like the first form of spltter uses a different result struct than the other two (which have a common implementation). It just needs cleanup.
>
> -Steve
I think the idea is that if a construct like 'xyz.splitter(args)' produces a range with the sequence of elements {"a", "bc", "def"}, then 'xyz.splitter(args).back' should produce "def". But, if finding the split points starting from the back results in something like {"f", "de", "abc"} then that relationship hasn't held, and the results are unexpected.
Note that in the above example, 'xyz.retro.splitter(args)' might produce {"f", "ed", "cba"}, so again not the same.
Another way to look at it: If split (eager) took a predicate, that 'xyz.splitter(args).back' and 'xyz.split(args).back' should produce the same result. But they will not with the example given.
I believe these consistency issues are the reason why the bidirectional support is limited.
Note: I didn't design any of this, but I did redo the examples in the documentation at one point, which is why I looked at this.
--Jon
|
January 22, 2021 Re: std.algorithm.splitter on a string not always bidirectional | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jon Degenhardt | On 1/22/21 11:57 AM, Jon Degenhardt wrote: > > I think the idea is that if a construct like 'xyz.splitter(args)' produces a range with the sequence of elements {"a", "bc", "def"}, then 'xyz.splitter(args).back' should produce "def". But, if finding the split points starting from the back results in something like {"f", "de", "abc"} then that relationship hasn't held, and the results are unexpected. But that is possible with all 3 splitter variants. Why is one allowed to be bidirectional and the others are not? > > Another way to look at it: If split (eager) took a predicate, that 'xyz.splitter(args).back' and 'xyz.split(args).back' should produce the same result. But they will not with the example given. With what example given? The example you gave is incomplete (what are args?) > I believe these consistency issues are the reason why the bidirectional support is limited. > > Note: I didn't design any of this, but I did redo the examples in the documentation at one point, which is why I looked at this. We sometimes spend time justifying why the existing implementation is the way it is, when we should be questioning why it was designed that way in the first place. If splitter should be restricted based on possible edge cases, then it should be consistently restricted. My opinion is it should not be restricted in any case. All three cases provide equal possibility of bidirectional correctness. The one case that should be restricted is the splitter version that accepts a non-bi-directional delimiting range. -Steve |
January 22, 2021 Re: std.algorithm.splitter on a string not always bidirectional | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Friday, 22 January 2021 at 17:29:08 UTC, Steven Schveighoffer wrote:
> On 1/22/21 11:57 AM, Jon Degenhardt wrote:
>> [...]
>
> But that is possible with all 3 splitter variants. Why is one allowed to be bidirectional and the others are not?
>
> [...]
+1
|
January 22, 2021 Re: std.algorithm.splitter on a string not always bidirectional | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Friday, 22 January 2021 at 17:29:08 UTC, Steven Schveighoffer wrote:
> On 1/22/21 11:57 AM, Jon Degenhardt wrote:
>>
>> I think the idea is that if a construct like 'xyz.splitter(args)' produces a range with the sequence of elements {"a", "bc", "def"}, then 'xyz.splitter(args).back' should produce "def". But, if finding the split points starting from the back results in something like {"f", "de", "abc"} then that relationship hasn't held, and the results are unexpected.
>
> But that is possible with all 3 splitter variants. Why is one allowed to be bidirectional and the others are not?
I'm not defending it, just explaining what I believe the thinking was based on the examination I did. It wasn't just looking at the code, there was a discussion somewhere. A forum discussion, PR discussion, bug or code comments. Something somewhere, but I don't remember exactly.
However, to answer your question - The relationship described is guaranteed if the basis for the split is a single element. If the range is a string, that's a single 'char'. If the range is composed of integers, then a single integer. Note that if the basis for the split is itself a range, then the relationship described is not guaranteed.
Personally, I can see a good argument that bidirectionality should not be supported in any of these cases, and instead force the user to choose between eager splitting or reversing the range via retro. For the common case of strings, the further argument could be made that the distinction between char and dchar is another point of inconsistency.
Regardless whether the choices made were the best choices, there was some thinking that went into it, and it is worth understanding the thinking when considering changes.
--Jon
|
Copyright © 1999-2021 by the D Language Foundation