March 28, 2019
On Thursday, 28 March 2019 at 07:32:33 UTC, Andrei Alexandrescu wrote:
> On 3/28/19 1:36 AM, Meta wrote:
>> DirEntry states that it is a subtype of `string`
>
> No, please refer to the "Generality creep" thread.

Last time I checked, curt dismissal is not an argument. I maintain that this is not "generality creep", but working around a broken language feature.
March 28, 2019
On 3/28/19 5:38 AM, Andrei Alexandrescu wrote:
> BTW the intended title was "More generality creep in Phobos". Turns out generality creep begets complexity creep, too...

Creepy!
March 28, 2019
On 3/28/19 10:49 AM, Meta wrote:
> On Thursday, 28 March 2019 at 07:32:33 UTC, Andrei Alexandrescu wrote:
>> On 3/28/19 1:36 AM, Meta wrote:
>>> DirEntry states that it is a subtype of `string`
>>
>> No, please refer to the "Generality creep" thread.
> 
> Last time I checked, curt dismissal is not an argument.

Sorry. It was a pertinent response dressed as a curt dismissal. The opening post of the "Generality creep" is trivial to find and contains an explanation of exactly this matter. Walter posted a link to it, too (thanks). Here it is:

https://digitalmars.com/d/archives/digitalmars/D/Generality_creep_324867.html

What remains is that I copy and paste it for you, which I'll do at the end of this message.

I actually stopped reading your post following your assertion, because I inferred it would be increasingly wrong because it is based on a wrong assumption. Just now I went and read it, and indeed it is so.

> I maintain that this is not "generality creep", but working around a broken language feature.

You are wrong in a sense (factually that is not subtyping, it's coercion) and right in another (it's questionable to allow odd forms of alias this in the language in the first place). I wouldn't disagree either way. Depends on the angle.

Here's the text explaining why alias this to an rvalue is not subtyping:

> Walter and I were looking at indexOf in the standard library. It has several overloads, of which let's look at the first two:
> 
> https://dlang.org/library/std/string/index_of.html
> 
> Now, why couldn't they be merged into a single function with this constraint (their bodies are identical):
> 
> ptrdiff_t indexOf(Range) (
>   Range s,
>   dchar c,
>   CaseSensitive cs = Yes.caseSensitive
> )
> if (isInputRange!Range && isSomeChar!(ElementType!Range));
> 
> It makes sense to look for a character in any range of characters.
> 
> Attempting to build that fails because of this type fails to work:
> 
>     struct TestAliasedString
>     {
>         string get() @safe @nogc pure nothrow { return _s; }
>         alias get this;
>         @disable this(this);
>         string _s;
>     }
> 
> The intuition is that the function should be general enough to figure out that, hey, TestAliasedString is kinda sorta a subtype of string. So the call should work.
> 
> So let's see why it doesn't work - i.e. why is TestAliasedString an input range? The definition of isInputRange (in std.range.primitives) is:
> 
> enum bool isInputRange(R) =
>     is(typeof(R.init) == R)
>     && is(ReturnType!((R r) => r.empty) == bool)
>     && is(typeof((return ref R r) => r.front))
>     && !is(ReturnType!((R r) => r.front) == void)
>     && is(typeof((R r) => r.popFront));
> 
> Turns out the second clause fails. That takes us to the definition of empty in the same module:
> 
> @property bool empty(T)(auto ref scope const(T) a)
> if (is(typeof(a.length) : size_t))
> {
>     return !a.length;
> }
> 
> The intent is fairly clear - if a range defines empty as a size_t (somewhat oddly relaxed to "convertible to size_t"), then empty can be nicely defined in terms of length. Cool. But empty doesn't work with TestAliasedString due to an overlooked matter: the "const". A mutable TestAliasedString converts to a string, but a const or immutable TestAliasedString does NOT convert to a const string! So this fixes that matter:
> 
>     struct TestAliasedString
>     {
>         string get() @safe @nogc pure nothrow { return _s; }
>         const(string) get() @safe @nogc pure nothrow const { return _s; }
>         alias get this;
>         @disable this(this);
>         string _s;
>     }
> 
> That makes empty() work, but also raises a nagging question: what was the relationship of TestAliasedString to string before this change? Surely that wasn't subtyping. (My response would be: "Odd.") And why was Phobos under the obligation to cater for such a type and its tenuous relationship to a range?
> 
> But wait, there's more. Things still don't work because of popFront. Looking at its definition:
> 
> void popFront(C)(scope ref inout(C)[] str) @trusted pure nothrow
> if (isNarrowString!(C[]))
> { ... }
> 
> So, reasonably this function takes the range by reference so it can modify its internals. HOWEVER! The implementation of TestAliasedString.get() returns an rvalue, i.e. it's the equivalent of a conversion involving a temporary. Surely that's not to match, whether in the current language or the one after the rvalue DIP.
> 
> The change that does make the code work is:
> 
>     struct TestAliasedString
>     {
>         ref string get() @safe @nogc pure nothrow { return _s; }
>         ref const(string) get() @safe @nogc pure nothrow const { return _s; }
>         alias get this;
>         @disable this(this);
>         string _s;
>     }
> 
> This indeed does implement a subtyping relationship, and passes the isInputRange test.
> 
> What's the moral of the story here? Generality is good, but it seems in several places in phobos (of which this is just one example), a combination of vague specification and aiming for a nice ideal of "work with anything remotely reasonable" has backfired into a morass of inconsistently defined and supported corner cases.
> 
> For this case in particular - I don't think we should support all types that support some half-hearted form of subtyping, at the cost of reducing generality and deprecating working code.
March 28, 2019
On Thursday, 28 March 2019 at 16:12:34 UTC, Andrei Alexandrescu wrote:
> On 3/28/19 10:49 AM, Meta wrote:
>> On Thursday, 28 March 2019 at 07:32:33 UTC, Andrei Alexandrescu wrote:
>>> On 3/28/19 1:36 AM, Meta wrote:
>>>> DirEntry states that it is a subtype of `string`
>>>
>>> No, please refer to the "Generality creep" thread.
>> 
>> Last time I checked, curt dismissal is not an argument.
>
> Sorry. It was a pertinent response dressed as a curt dismissal. The opening post of the "Generality creep" is trivial to find and contains an explanation of exactly this matter. Walter posted a link to it, too (thanks). Here it is:
>
> https://digitalmars.com/d/archives/digitalmars/D/Generality_creep_324867.html
>
> What remains is that I copy and paste it for you, which I'll do at the end of this message.
>
> I actually stopped reading your post following your assertion, because I inferred it would be increasingly wrong because it is based on a wrong assumption. Just now I went and read it, and indeed it is so.

Thanks, stopped reading your post here. I'll leave the discussion to those more qualified.
March 28, 2019
On 3/28/19 12:23 PM, Meta wrote:
> On Thursday, 28 March 2019 at 16:12:34 UTC, Andrei Alexandrescu wrote:
>> On 3/28/19 10:49 AM, Meta wrote:
>>> On Thursday, 28 March 2019 at 07:32:33 UTC, Andrei Alexandrescu wrote:
>>>> On 3/28/19 1:36 AM, Meta wrote:
>>>>> DirEntry states that it is a subtype of `string`
>>>>
>>>> No, please refer to the "Generality creep" thread.
>>>
>>> Last time I checked, curt dismissal is not an argument.
>>
>> Sorry. It was a pertinent response dressed as a curt dismissal. The opening post of the "Generality creep" is trivial to find and contains an explanation of exactly this matter. Walter posted a link to it, too (thanks). Here it is:
>>
>> https://digitalmars.com/d/archives/digitalmars/D/Generality_creep_324867.html 
>>
>>
>> What remains is that I copy and paste it for you, which I'll do at the end of this message.
>>
>> I actually stopped reading your post following your assertion, because I inferred it would be increasingly wrong because it is based on a wrong assumption. Just now I went and read it, and indeed it is so.
> 
> Thanks, stopped reading your post here. I'll leave the discussion to those more qualified.

No need to convert this into an exchange of broadsides. The appropriate response is as simple as "oh ok so that's not subtyping, interesting" - which was my reaction as well. It's a subtle matter but once seen it is clear.
March 30, 2019
On Wednesday, March 27, 2019 8:17:35 PM MDT Andrei Alexandrescu via Digitalmars-d wrote:
> I started an alphabetical search through std modules, and as luck would have it I got as far as the first module: std/algorithm/comparison.d. In there, there's these overloads:
>
> size_t levenshteinDistance(alias equals = (a,b) => a == b, Range1, Range2)
> (Range1 s, Range2 t)
> if (isForwardRange!(Range1) && isForwardRange!(Range2));
>
> size_t levenshteinDistance(alias equals = (a,b) => a == b, Range1, Range2)
> (auto ref Range1 s, auto ref Range2 t)
> if (isConvertibleToString!Range1 || isConvertibleToString!Range2)
>
> (similar for levenshteinDistanceAndPath)
>
> What's with the second overload nonsense? The Levenshtein algorithm works on forward ranges. And that's about it, in a profound sense: Platonically what the algorithm needs is two forward ranges to operate. (We ought to be pretty proud of it, too: in all other languages I looked at, it's misimplemented to require random access.)
>
> The second overload comes from a warped sense of DWIM: well if this type converts to a string, we commit to support that too. Really. How about a struct that converts to an array? Should we put in a pull request to that, too? Where do we even stop?
>
> I hope there's not much more of this nonsense, because it all should be deprecated with fire.

This particular case is one that's a bit tricky. The correct solution with stuff like this IMHO is to require that you pass actual forward ranges and not types that convert to types that are forward ranges (like strings). The problem is that a lot of code like this was originally written to take strings and then generalized later, and when a function takes something like string or const(char)[], it's going to implicitly convert varous types to the target type at the call site. In contrast, you get no such implicit conversion with a template constraint.

The result is that a bunch of functions in Phobos now use isConvertibleToString on a separate overload to then convert the argument to a string and pass it to the primary function. This is very, very wrong even if it's not obvious. The problem is that the conversion then happens inside the function instead of at the call site, which means that you get fun stuff like when a static array is passed in, it's sliced, and if the function returns any portion of the range it's given, then you're returning a slice of the stack that is then invalid. The only way to make this work is to have two overloads - the primary one, and one that accepts arrays specifically. e.g.

auto foo(R)(R range)
    if(isForwardRange!R && ...)
{
    return fooImpl(range);
}

auto foo(C)(C[] str)
{
    return fooImpl(range);
}

private auto fooImpl(R)(R range)
{
}

That way, the conversion happens at the call site like it did when the function accepted string. Similar problems exist in general with accepting implicit conversions with a template constraint and is generally not something that we should be doing unless it's for something really simple like bool.

This is not at all a nice way to have to write this code, but it is required to avoid code breakage when making the function work on ranges rather than just strings. isConvertibleToString _could_ be used internally with static if to avoid the additional overload, but again, it runs into the problem that the implicit conversion is done internally. So, to support the implicit conversion, a second overload is required.

The way that these functions _should_ have been written was to have them operate on ranges in the first place, and then they wouldn't accept any implict conversions from stuff like enums, but many of these functions weren't. I don't know if that's quite the problem with levenshteinDistance, but it's definitely causing a lot of the ugly overloads in Phobos with anything string-related.

Unfortunately, I don't think that it's actually possible to just deprecate the overload that accepts implicit conversions, because that's also the overload that accepts strings. So, for many of these functions, I think that we're stuck with having two overloads unless we do something like std.v2 and have more of a hard breakage.

Either way, isConvertibleToString must go. Using it in a template constraint is just begging for bugs, because the implicit conversion won't happen at the call site like it needs to. I've done some work towards removing it from Phobos, but due to health issues, I haven't had enough time to finish the job.

- Jonathan M Davis



March 30, 2019
On 3/28/19 12:23 PM, Meta wrote:
> On Thursday, 28 March 2019 at 16:12:34 UTC, Andrei Alexandrescu wrote:
>> On 3/28/19 10:49 AM, Meta wrote:
>>> On Thursday, 28 March 2019 at 07:32:33 UTC, Andrei Alexandrescu wrote:
>>>> On 3/28/19 1:36 AM, Meta wrote:
>>>>> DirEntry states that it is a subtype of `string`
>>>>
>>>> No, please refer to the "Generality creep" thread.
>>>
>>> Last time I checked, curt dismissal is not an argument.
>>
>> Sorry. It was a pertinent response dressed as a curt dismissal. The opening post of the "Generality creep" is trivial to find and contains an explanation of exactly this matter. Walter posted a link to it, too (thanks). Here it is:
>>
>> https://digitalmars.com/d/archives/digitalmars/D/Generality_creep_324867.html 
>>
>>
>> What remains is that I copy and paste it for you, which I'll do at the end of this message.
>>
>> I actually stopped reading your post following your assertion, because I inferred it would be increasingly wrong because it is based on a wrong assumption. Just now I went and read it, and indeed it is so.
> 
> Thanks, stopped reading your post here. I'll leave the discussion to those more qualified.

The author of this post wrote me a very nice private note in lieu of an unpleasant public response to my nasty way of handling this exchange. He refrained from posting in the spirit of "write letters to people you hate, then burn them" (as the joke goes, "okay, then what do I do to the letters?")

Anyhow, I wanted to apologize to Meta in the same place where the offense take place. I am sorry for the unkind tone I took to.

An explanation (but in no way a justification) of this came from a friend: "The more sure you are of being right, the more insufferable you become."
March 30, 2019
On 3/30/19 3:22 PM, Jonathan M Davis wrote:
> Either way, isConvertibleToString must go.

Word.

We should have one rcstring type that is reference counted, uses UTF8, is not a range, offers ranges (bytes, codepoints, graphemes), and consolidate around it. Algorithms should only use ranges, and whenever someone wants to use an algorithm with a string, they choose the iteration mode that fits the application.
April 01, 2019
On Sat, Mar 30, 2019 at 06:31:05PM -0400, Andrei Alexandrescu via Digitalmars-d wrote:
> On 3/30/19 3:22 PM, Jonathan M Davis wrote:
> > Either way, isConvertibleToString must go.
> 
> Word.
> 
> We should have one rcstring type that is reference counted, uses UTF8, is not a range, offers ranges (bytes, codepoints, graphemes), and consolidate around it. Algorithms should only use ranges, and whenever someone wants to use an algorithm with a string, they choose the iteration mode that fits the application.

+1.  That's the correct approach.  If std.v2 ever happens, this is definitely the way to go.


T

-- 
Insanity is doing the same thing over and over again and expecting different results.
April 01, 2019
On Saturday, 30 March 2019 at 22:31:05 UTC, Andrei Alexandrescu wrote:
> We should have one rcstring type that is reference counted, uses UTF8, is not a range, offers ranges (bytes, codepoints, graphemes), and consolidate around it. Algorithms should only use ranges, and whenever someone wants to use an algorithm with a string, they choose the iteration mode that fits the application.

Why?

Strings have the advantage of being extremely simple constructs that represent exactly the right abstraction: they're a slice of chars, period. They can be scoped, sliced and concatenated just like any other range.

Getting rid of auto-decoding would be good, but adding a whole new layer of abstraction and special cases on top of it, with a dedicated allocation strategy C++ style, seems superfluous at best.

Honestly, it sounds like the kind of thing that will make reference-counting the new class, where 10 years from now people will be told "Yeah, it would be more convenient to do things this way, but back when this standard library was written we were using reference-counting everywhere, and now we're stuck with it".

tl;dr Keep strings and reference-counting separate.