Jump to page: 1 2
Thread overview
Re: ranges of characters and the overabundance of traits that go with them
Mar 13, 2017
H. S. Teoh
Mar 13, 2017
Dmitry Olshansky
Mar 13, 2017
Jonathan M Davis
Mar 14, 2017
H. S. Teoh
Mar 16, 2017
Dukc
Mar 16, 2017
H. S. Teoh
Mar 13, 2017
Jonathan M Davis
Mar 15, 2017
Jonathan M Davis
Mar 15, 2017
H. S. Teoh
Mar 16, 2017
Jonathan M Davis
Mar 16, 2017
Sebastiaan Koppe
Mar 16, 2017
Jonathan M Davis
Mar 16, 2017
Sebastiaan Koppe
Mar 16, 2017
Jonathan M Davis
Mar 17, 2017
Sebastiaan Koppe
Mar 17, 2017
Jonathan M Davis
Mar 16, 2017
H. S. Teoh
Mar 16, 2017
Jonathan M Davis
March 13, 2017
Ugh.  What a horrible mess!

I think, instead of wading through the specifics and losing sight of the forest for the myriad trees, we should take a step back and consider the big picture.

1) First of all, the user-facing API should be as simple as possible, and things like which overload gets which type are, apropos Walter's recent post, implementation details that ought not to be exposed to the user directly.  So instead of:

	auto func(R)(R r)
		if (isInputRange!R && ...) { ...}
	auto func(C)(C[] s)
		if (isSomeChar!C) { ... }
	auto func(R)(R r)
		if (isWhoKnowsWhat!R && !isNotWhoKnowsWhatElse!R) { ... }

we really should consolidate the whole overload set into a single template function:

	auto func(R)(R r)
		if (/* user-readable constraints */)
	{
		static if (isWhatever!R)
			return implA(r);
		else static if (isWhateverElse!R)
			return implB(r);
		else ...
		else static assert(0, niceErrorMessageHere);
	}

implA, implB, etc., can then be useful for deprecating stuff that's otherwise hard to deprecate, like no longer accepting implicitly convertible enums, alias this, or whatever.

2) The /* user-readable constraints */  ought to be one of a small number of self-describing templates that tell the user exactly what the *intent* of the function is (note, *intent*, as in, implementation limitations should not be a factor).  For example, is the function operating on r in a range-like way? Or is it primarily treating r as a more-or-less opaque blob that has string-like characteristics (e.g., pass it as a filename to the OS)?  I think there are only a very small number of such cases, and Phobos' user-facing API really should limit themselves to only these cases and nothing more.

The current mess of isSomeString, isConvertibleToString, isNarrowString, etc., really should be internal to Phobos, and should not be exposed to the user at all. At the most, I'd say just ONE template that works for all strings and string-like ranges (and whatever else we wish to include) ought to be exposed to the user.  What should be done within Phobos itself (in private std.* space) is another matter -- we do need to handle the dirty details like auto-decoding, narrow strings, etc., here. But the point is that this should be *internal* to Phobos, not exposed to the user like dirty laundry.


3) The purpose of a particular template function, concerning which Jonathan wrote:

> In general, when you templatize a function involving strings, you're doing one of
> 
> - create a generic function that works on any range (which just so
>   happens to include strings)
> - create a function that specifically operates on strings (but is
>   templated so that it can operate on different constness or operate
>   on different character types)
> - create a function that specifically operates on a range of
>   characters, which happens to include strings
> - create a function that operates on strings or ranges of characters
>   or operates on types that implicitly convert to a string type
> - templatize a function that used to take string (or took any dynamic
>   array of characters) and make it take any range of characters

I think the first case (generic function that works on any range) is the simplest to handle. It just accepts anything that's a range of the level of functionality (input, forward, bidi, etc.) required by the function, and that's good enough.  Everything else, like whether something is a narrow string, enum, alias this, or whatever we decide to support / not support, should be handled as implementation details. The user shouldn't have to care about this. Inside the user-facing function, we can either use static ifs or overloads of private implementation functions to do whatever needs to be done to handle all those different cases.

For the rest of the cases, my question is, is there any reason to *not* accept any arbitrary range of characters? Sure, the current implementation may not be able to handle all the different combinations, but again, that's implementation details.  In fact, I'd even say that as far as is possible, we should try to generalize functions into the previous category (works on any range).  If that's not possible, e.g., the function relies on the fact that the elements must be some kind of characters, then I'd say it should, if at all possible, accept *any* range of characters regardless of the specifics (it's a string / wstring / dstring, it's a user-defined range over char/wchar/dchar, etc.).

If there's any function that *cannot* be generalized into something that accepts any arbitrary range of characters, I'd like to know about it. I'm expecting that if there are any, they should be in the minority, so they shouldn't need special template constraints specially dedicated for their special use case -- we should just write out explicit sig constraints (in combination with the "standardized" constraints on ranges, etc.).

So basically, this leaves us really with just two general categories:

- Generic range function: use isInputRange, isForwardRange, etc., in the
  sig constraints.

- Function that needs to operate on some kind of characters: either have
  a general constraint, say isRangeOfChar (tentative name), or just use
  is{Input,Forward,...}Range!R along with is(ElementType!R : dchar) or
  isSomeChar!(ElementType!R).

Anything that doesn't fall into these 2 categories shouldn't have dedicated sig constraints (i.e., named public templates like isSomeString), IMO, but should just explicitly list their constraints.

I'm aware that there are currently functions that for whatever reason require the argument to be some kind of array of characters.  I question whether the argument being an array is a *necessary* requirement. Maybe there are a few functions that wouldn't make sense otherwise, I'm not sure, but IMO most of them ought to be generalizable to one of the above two generic categories (i.e., accept any range).  If the current implementation only works with actual arrays, that's an implementation detail; it should be handled thus:

	auto myFunc(R)(R r)
		if (isRangeOfChar!R)
	{
		static if (is(R : C[], C))
			/* current implementation *?
		else assert(0, "Not implemented yet");
	}

I.e., what the user sees should be the most generic API that makes sense relative to this function.  For cases where the implementation can't (yet) handle, a helpful error message is given, and the docs should also indicate the present limitations.


4) Implicit conversions:  this is a tricky one, esp. once you factor in alias this.  Am I right to assume that this is mostly coming from functions that used to take strings explicitly, but later were templatized, thus losing some of the original implicit conversions?  If so, I'm inclined to do this:

	auto myFunc(R)(R r)
		/* N.B. no sig constraints */
	{
		static if (isConvertibleTo!(R,x))
			...
		... /* handle dirty laundry cases internally here */
	}

then document in the ddocs exactly what is expected of the incoming type. I.e., accept everything, then sort out the different cases internall as an implementation detail.


5) What to do about the whole mess that is isSomeString, isAutodecodableString, isImplicitlyConvertibleToString, etc.: I'm tempted to say most of these variants ought to be private to Phobos, and users should not even see them.  In fact, I'd even go as far as saying Phobos internal code should, as much as possible, use explicit constraints (i.e., spell out is(ElementType!R : dchar) rather than hide it behind yet another similar-but-subtly-different name like isSomeString).  Phobos maintainers ought to be able to work with complex explicit constraints, one would hope.  But none of this should be visible to the user.


--T
March 13, 2017
On 3/13/17 8:08 PM, H. S. Teoh via Digitalmars-d wrote:
> Ugh.  What a horrible mess!
>
> I think, instead of wading through the specifics and losing sight of the
> forest for the myriad trees, we should take a step back and consider the
> big picture.
>
[snip excellent answer]

This is IMHO the right way forward. We (Phobos maintainers) created the mess, now it's time to cleanup but not at the expense of the user.
All the isSomeString, isSomeOtherString can just be a reminiscent of the old days that is internal to the implementation.

---
Dmitry Olshansky


March 13, 2017
On Monday, March 13, 2017 12:08:12 H. S. Teoh via Digitalmars-d wrote:
> 2) The /* user-readable constraints */  ought to be one of a small number of self-describing templates that tell the user exactly what the *intent* of the function is (note, *intent*, as in, implementation limitations should not be a factor).
>
> The current mess of isSomeString, isConvertibleToString, isNarrowString, etc., really should be internal to Phobos, and should not be exposed to the user at all. At the most, I'd say just ONE template that works for all strings and string-like ranges (and whatever else we wish to include) ought to be exposed to the user.  What should be done within Phobos itself (in private std.* space) is another matter -- we do need to handle the dirty details like auto-decoding, narrow strings, etc., here. But the point is that this should be *internal* to Phobos, not exposed to the user like dirty laundry.

Simplifying public-facing template constraints makes a lot of sense. However, that doesn't mean that all of the various traits should be private or hidden within Phobos - just that the public functions themselves should make their template constraints as general as possible. Remember that there's nothing special about what Phobos is doing. It's just that it happens to be the standard library, so it's used by a lot of folks. The rest of the D ecosystem has all of these same concerns, and so even if 3rd party libraries and programs also make their public-facing template constraints as simple as possible, they're still going to need to use all of these traits internally in the same way that Phobos does. These complexities can be hidden on some level at the API level, but anyone writing APIs or writing code that uses APIs but needs to do stuff that the APIs don't do for them is still going to have to worry about these complexities.

> For the rest of the cases, my question is, is there any reason to *not* accept any arbitrary range of characters?

I'm not convinced that it never makes sense to have a function just operates on arrays and not ranges. Certainly, lots of programs will be written that way, because it takes more effort to make something work with generic ranges than it does to work with arrays, and the extra effort often isn't worth it if you're not creating a library to distribute to other programmers.

For the most part, something like Phobos should be using ranges, not just arrays, but they're still a real thing for the D ecosystem at large - especially if the code is mixing range behavior with appending. In particular, that's not an uncommon thing to do in string code. So, while I do think that we should be very wary of adding code to Phobos that operates on arrays and not ranges, we should also remember that that's not always the best way to do things in actual programs that are written to do a specific thing - potentially on a tight schedule - rather than being generic and put out in the wild for everyone to use.

> 4) Implicit conversions:  this is a tricky one, esp. once you factor in alias this.  Am I right to assume that this is mostly coming from functions that used to take strings explicitly, but later were templatized, thus losing some of the original implicit conversions?  If so, I'm inclined to do this:
>
>   auto myFunc(R)(R r)
>       /* N.B. no sig constraints */
>   {
>       static if (isConvertibleTo!(R,x))
>           ...
>       ... /* handle dirty laundry cases internally here */
>   }
>
> then document in the ddocs exactly what is expected of the incoming type. I.e., accept everything, then sort out the different cases internall as an implementation detail.

As I pointed on in my original post (though with how large it is, it probably isn't hard to forget), this doesn't actually work properly with implicit conversions - at least not in the general case. In particular, if you have a function that takes a string or takes an array and is templated on character type, it will currently accept static arrays, enums, and user-defined types that implicitly convert to string. For static arrays, and user-defined types that convert to strings in a similar manner to static arrays (i.e. they're returning a slice of memory that will not be valid when the object being converted goes out of scope), you have a safety problem if do the implicit conversion inside the function. It needs to happen at the call site, and that means _not_ doing something like

auto foo(T)(T t)
    if(isConvertibleToString!T)
{...}

- at least not in the general case. The problem is when any portion of the original range is returned from the function. For instance, with this ridiculously simple example

auto foo(inout(char)[] str)
{
    return str;
}

char[10] sa;
auto bar = foo(sa);

you have no safety problem so long as no slice of bar is returned from the function that it's in. However, if you templatize foo

auto foo(R)(R range)
    if(isForwardRange!R && isSomeChar!(ElementType!R))
{
    static if(isConvertibleToString!R)
        return foo!(StringTypeOf!R)(range);
    else
        return str.save;
}

char[10] sa;
auto bar = foo(sa);

then sa is copied into foo and _then_ sliced, so bar ends up now referring to memory inside of foo - which is out of scope and therefore invalid. You can make it somewhat safer by doing

auto foo(R)(R range)
    if(!isConvertibleToString &&
       isForwardRange!R &&
       isSomeChar!(ElementType!R))
{
    return str.save;
}

auto foo(T)(auto ref T convertible)
    if(isConvertibleToString)
{
    return foo!(StringTypeOf!R)(range);
}

but as soon as the value being passed to be implicitly converted is an rvalue rather than an lvalue, you have the same problem. That's less likely with a static array (though still possible), but it's also quite possible with a user-defined type. e.g.

struct S
{
    char[20] str;
    const(char)[] func() { return str; }
    alias str func;
}

The only way for this to be safe is for the implicit conversion to take place at the call site like it did originally, and the only way that I can come up with to guarantee that the conversion takes place at the call site is to templatize on the character type (or element type if we're talking about arrays in general and not just strings), in which case we get something like

auto foo(R)(R range)
    if(!isSomeString!R &&
       isForwardRange!R &&
       isSomeChar!(ElementType!R))
{
    return _fooImpl(range);
}

auto foo(C)(C[] str)
    if(isSomeChar!C)
{
    return _fooImpl(range);
}

private auto _fooImpl(R range)
    if(isForwardRange!R && isSomeChar!(ElementType!R))
{
    return range.save;
}

Now, in this case, the code is so short that you might as well just duplicate the functionality rather than create a helper function, but in general, you'd want the helper function to avoid code duplication.

Regardless, this solution correctly and safely converts the original function from one that takes string to one that works with generic ranges of characters without introducing any safety problems. But it means that two overloads are required, and it means that we can't deprecate the implicit conversion functionality like Jack Stouffer suggested (because both strings and the types that convert to them take the same overload). This complication is a great example of why we do _not_ want to be writing new, generic functions that take take types that implicitly convert to work with the function. However, we're stuck if we want to templatize an existing function so that it works on generic ranges rather than just arrays - or generic ranges of characters rather than just strings.

- Jonathan M Davis

March 13, 2017
On Monday, March 13, 2017 23:40:55 Dmitry Olshansky via Digitalmars-d wrote:
> On 3/13/17 8:08 PM, H. S. Teoh via Digitalmars-d wrote:
> > Ugh.  What a horrible mess!
> >
> > I think, instead of wading through the specifics and losing sight of the forest for the myriad trees, we should take a step back and consider the big picture.
>
> [snip excellent answer]
>
> This is IMHO the right way forward. We (Phobos maintainers) created the
> mess, now it's time to cleanup but not at the expense of the user.
> All the isSomeString, isSomeOtherString can just be a reminiscent of the
> old days that is internal to the implementation.

That makes sense for the APIs of public-facing functions. However, at least some portion of programmers who are not writing Phobos functions are going to need to use all of those same traits to write their own code. In most cases, we can clean up the APIs so that they hide the specializations, but anyone who needs to write specializations in their own code will need the same traits that Phobos uses. So, we can reduce the complications and negative impact that come from having to deal with all of these traits in generic code, but we can't actually eliminate it. Best case, the folks that aren't going to bother writing any generic code of their own don't have to worry about it, but anyone writing their own generic code - especially if it involves ranges of characters - is still stuck with the problem in their own code. So, we really can't make the various traits internal - at least not without basically saying that everyone needs to rewrite them in their own code, which is just going to create a different set of problems.

Overall though, I completely agree that we should move towards simplifying the template constraints on the public-facing APIs so that the code that doesn't need to use all of the various traits in its public API doesn't.

- Jonathan M Davis

March 14, 2017
On Mon, Mar 13, 2017 at 04:00:57PM -0700, Jonathan M Davis via Digitalmars-d wrote:
> On Monday, March 13, 2017 23:40:55 Dmitry Olshansky via Digitalmars-d wrote:
[...]
> > This is IMHO the right way forward. We (Phobos maintainers) created the mess, now it's time to cleanup but not at the expense of the user.  All the isSomeString, isSomeOtherString can just be a reminiscent of the old days that is internal to the implementation.
> 
> That makes sense for the APIs of public-facing functions. However, at least some portion of programmers who are not writing Phobos functions are going to need to use all of those same traits to write their own code.

In which case even the more we need to clean up our act.  I would dread user code starting to use isSomeString, isImplicitlyConvertibleToString, and all of those messy same-but-different templates, introducing subtle corner cases that the user may not be fully aware of.


> In most cases, we can clean up the APIs so that they hide the specializations, but anyone who needs to write specializations in their own code will need the same traits that Phobos uses. So, we can reduce the complications and negative impact that come from having to deal with all of these traits in generic code, but we can't actually eliminate it. Best case, the folks that aren't going to bother writing any generic code of their own don't have to worry about it, but anyone writing their own generic code - especially if it involves ranges of characters - is still stuck with the problem in their own code.

To be quite honest, in *my* own code I avoid Phobos templates of the isSomeString category like the plague.  Part of this whole mess came from autodecoding (and here we again see it rear its ugly head), without which many of these templates wouldn't need to exist in the first place. In user code, I'd say either work directly with char[] and its variants, or byGrapheme if you care about visual "characters", and avoid wstring and dstring like the plague.  At the most I'd just say:

	auto myFunc(R)(R range)
		if (isInputRange!R && is(ElementType!R : dchar))
	...

and let it be.  This will handle any string, autodecoding or not, user-defined char/wchar/dchar ranges, and whatever else you throw at it. If it's truly generic code, it really shouldn't care what it is.  And on this point, I find Phobos code excessively convoluted -- arguably necessitated by the historical blunder of autodecoding -- supposedly generic code really *shouldn't* distinguish between a range of char that happens to be an array vs. a range of char that isn't.

Actually, come to think of it, the only time isNarrowString, et al are *necessary* is in the autodecoding parts of Phobos.  I can't think of any other places where generic code should even care whether something is a narrow string, or an array-based char range or a non-array-based char range.  Didn't we agree to move Phobos away from autodecoding, and only those parts that are truly necessary for backward compatibility should retain the autodecoding code?  If so, I'd hardly say user code should even *know* about isNarrowString and friends.


> So, we really can't make the various traits internal - at least not without basically saying that everyone needs to rewrite them in their own code, which is just going to create a different set of problems.
[...]

On the contrary, we need to make the *current* traits internal. Providing users with similar traits -- a much cleaner, orthogonal set of traits that has none of the current mess -- is a different question. In principle I agree with doing that, but I don't agree that users should be using any of the current almost-the-same-but-subtly-different traits.


T

-- 
MASM = Mana Ada Sistem, Man!
March 14, 2017
On Tuesday, March 14, 2017 11:38:21 H. S. Teoh via Digitalmars-d wrote:
> On Mon, Mar 13, 2017 at 04:00:57PM -0700, Jonathan M Davis via
Digitalmars-d wrote:
> > On Monday, March 13, 2017 23:40:55 Dmitry Olshansky via Digitalmars-d
wrote:
> [...]
>
> > > This is IMHO the right way forward. We (Phobos maintainers) created the mess, now it's time to cleanup but not at the expense of the user.  All the isSomeString, isSomeOtherString can just be a reminiscent of the old days that is internal to the implementation.
> >
> > That makes sense for the APIs of public-facing functions. However, at least some portion of programmers who are not writing Phobos functions are going to need to use all of those same traits to write their own code.
>
> In which case even the more we need to clean up our act.  I would dread user code starting to use isSomeString, isImplicitlyConvertibleToString, and all of those messy same-but-different templates, introducing subtle corner cases that the user may not be fully aware of.
>
> > In most cases, we can clean up the APIs so that they hide the specializations, but anyone who needs to write specializations in their own code will need the same traits that Phobos uses. So, we can reduce the complications and negative impact that come from having to deal with all of these traits in generic code, but we can't actually eliminate it. Best case, the folks that aren't going to bother writing any generic code of their own don't have to worry about it, but anyone writing their own generic code - especially if it involves ranges of characters - is still stuck with the problem in their own code.
>
> To be quite honest, in *my* own code I avoid Phobos templates of the isSomeString category like the plague.  Part of this whole mess came from autodecoding (and here we again see it rear its ugly head), without which many of these templates wouldn't need to exist in the first place. In user code, I'd say either work directly with char[] and its variants, or byGrapheme if you care about visual "characters", and avoid wstring and dstring like the plague.  At the most I'd just say:
>
>   auto myFunc(R)(R range)
>       if (isInputRange!R && is(ElementType!R : dchar))
>   ...
>
> and let it be.  This will handle any string, autodecoding or not, user-defined char/wchar/dchar ranges, and whatever else you throw at it. If it's truly generic code, it really shouldn't care what it is.  And on this point, I find Phobos code excessively convoluted -- arguably necessitated by the historical blunder of autodecoding -- supposedly generic code really *shouldn't* distinguish between a range of char that happens to be an array vs. a range of char that isn't.
>
> Actually, come to think of it, the only time isNarrowString, et al are *necessary* is in the autodecoding parts of Phobos.  I can't think of any other places where generic code should even care whether something is a narrow string, or an array-based char range or a non-array-based char range.  Didn't we agree to move Phobos away from autodecoding, and only those parts that are truly necessary for backward compatibility should retain the autodecoding code?  If so, I'd hardly say user code should even *know* about isNarrowString and friends.

The reality of the matter is that auto-decoding is here to stay, because we simply do not have a way to remove it without breaking a large percentage of the D programs currently in existence. We need to ensure that Phobos work well with arbitrary ranges of characters rather than assuming that they're going to be dealing with strings or assuming that any range of characters is going to be a range of dchar. But we're still stuck with the fact that narrow strings are auto-decoded by the range primitives.

In some cases, making a function work with arbitrary ranges of characters means not caring about narrow strings, because it's going to have to decode them anyway, but if the function doesn't need to do any decoding, and we don't specialize it on narrow strings, then it's going to incur a performance hit with strings, and string processing is a _very_ common programming task. Having byCodeUnit helps, but that results in a new type, which can be a problem, and it's a given that many folks are simply going to be using strings without something like byCodeUnit or byGrapheme. So, if we really care about efficiency (and we should with the standard library), then we're often going to have to specialize on narrow strings because of that.

It's also a concern, because not specializing on narrow strings will sometimes result in having a wrapper range rather than a slice of the original string, which can make incur a performance hit, depending on what you do with the result (e.g. if you need a string, and you get wrapper range, you're forced to copy it into a string, and if specilializng on narrow strings avoided that, then _not_ doing that specialization is causing a performance hit for code using the function). So, ignoring narrow strings is going to hurt us.

And yes, this is less of a concern for many D programs than it is for
Phobos, because if you're just writing something to get something specific
done and not writing a library for others to use, you can take shortcuts -
either by not making the code particularly generic or by not caring about
some of the performance issues unless they end up being a large enough
performance hit for your particular program that you can't ignore them.
So, to some extent, D programs can just ignore narrow strings, and Phobos at
least will be efficient with them. But some programs _will_ care, and anyone
writing a library to distribute for others to use (e.g. on code.dlang.org)
is in exactly the same positition as Phobos. If they don't take narrow
strings into account, then there's going to be a performance hit, and it
will affect other people's code.

By doing what we can to maximize how much the Phobos APIs work with arbitrary ranges of characters and minimizing how much auto-decoding is involved, we can reduce how much your average D program has to deal with these problems - and much of this can be hidden inside of the implementation with static if rather than put in the top-level constraints - but ultimately, third party code has all of the same problems as Phobos does if it cares about being generic or efficient. And we can't fix that or hide it.

> > So, we really can't make the various traits internal - at least not without basically saying that everyone needs to rewrite them in their own code, which is just going to create a different set of problems.
>
> [...]
>
> On the contrary, we need to make the *current* traits internal. Providing users with similar traits -- a much cleaner, orthogonal set of traits that has none of the current mess -- is a different question. In principle I agree with doing that, but I don't agree that users should be using any of the current almost-the-same-but-subtly-different traits.

I definitely think that isAutodecodableString needs to go, and if we can, I'd like to see isConvertibleToString go - though without something like it, you can't deprecate implicit conversions when templatizing a function that took string like Jack Stouffer wants to do. Still, that only works in cases where none of the original string was returned from the function, since AFAIK, the only way to templatize the function safely otherwise is to templatize on character type (in which case, deprecating it would mean deprecating the function for strings as well, which obviously is unacceptable). Overall though, I tend to favor just leaving the implicit conversions in place for those functions (though not adding them for new ones) and then just having the overload that templatizes on character type. And if we do that, we can get rid of isConvertibleToString.

As for isSomeString, the only problem I see with it is that it accepts enums. Otherwise, it seems like a perfectly sensible trait to have, and its name fits in with other traits in std.traits such that I don't know what we'd call it instead other than isExactSomeString (which std.conv does internally, but it's not exactly an ideal name). Regardless, I completely disagree about trying to get rid of it or hiding it. If it doesn't need to be in a template constraint, then it shouldn't be, but it's still a useful trait to have.

And as for isNarrowString, it's going to continue to be critical for any code that wants to efficiently deal with strings. Unless we can actually get rid of auto-decoding (and I don't see how without major breakage), as much as it sucks, we're forever stuck dealing with it. Some code can just operate on ranges of characters without caring about the efficiency cost - or even operate on strings without caring about the efficiency cost - but there _is_ an efficiency cost as well as problems related to the type changing, because you can't slice a narrow string in range-based code without specializing on it. So, it really doesn't make sense to ignore narrow strings or to try and get rid of or hide isNarrowString.

As such, beyond getting rid of isAutodecodableString or isConvertibleToString, I really don't see anything that makes sense to change with the string-related traits. It would be nice to fix isSomeString so that it doesn't accept enums, but that would break code, and it's been around for so long that adding a trait that didn't have that problem would simply mean having yet another trait (which arguably makes the problem worse). And isSomeString is just fine as it is in any code that doesn't accept enums anyway.

So, I'm in favor of getting rid of isAutodecodableString and isConvertibleToString, but I don't agree at all with the idea that we're going to be able to act like narrow strings and auto-decoding are just a Phobos problem or that isSomeString or isNarrowString is something that should only be used in Phobos.

Now, if someone by some miracle can come up with a way to remove auto-decoding without breaking tons of code, then I am all for it. And at that point, isNarrowString becomes unnecessary, and we can have much cleaner range-based code. But I don't see how that's possible. At best, we can reduce the impact that that mistake has. Fixing it just seems impossible, and IMHO, hiding the helper stuff that allows code to be efficient in spite of it is a mistake, much as the necessity of that helper stuff really sucks.

- Jonathan M Davis

March 15, 2017
On Tue, Mar 14, 2017 at 07:21:05PM -0700, Jonathan M Davis via Digitalmars-d wrote:
> On Tuesday, March 14, 2017 11:38:21 H. S. Teoh via Digitalmars-d wrote:
[...]
> > Actually, come to think of it, the only time isNarrowString, et al are *necessary* is in the autodecoding parts of Phobos.  I can't think of any other places where generic code should even care whether something is a narrow string, or an array-based char range or a non-array-based char range.  Didn't we agree to move Phobos away from autodecoding, and only those parts that are truly necessary for backward compatibility should retain the autodecoding code?  If so, I'd hardly say user code should even *know* about isNarrowString and friends.
> 
> The reality of the matter is that auto-decoding is here to stay, because we simply do not have a way to remove it without breaking a large percentage of the D programs currently in existence. We need to ensure that Phobos work well with arbitrary ranges of characters rather than assuming that they're going to be dealing with strings or assuming that any range of characters is going to be a range of dchar. But we're still stuck with the fact that narrow strings are auto-decoded by the range primitives.

None of that negates the fact that we don't need to expose traits specific to autodecoding to the user.  Obviously the autodecoding will still be there, but as far as the API is concerned, the user ought not to be concerned about it: Phobos should handle it as transparently as possible.  Internally, of course, we have to support it for the sake of backward compatibility.  But that's no reason to expose unnecessary details in the public API.


> In some cases, making a function work with arbitrary ranges of characters means not caring about narrow strings, because it's going to have to decode them anyway, but if the function doesn't need to do any decoding, and we don't specialize it on narrow strings, then it's going to incur a performance hit with strings, and string processing is a _very_ common programming task. Having byCodeUnit helps, but that results in a new type, which can be a problem, and it's a given that many folks are simply going to be using strings without something like byCodeUnit or byGrapheme. So, if we really care about efficiency (and we should with the standard library), then we're often going to have to specialize on narrow strings because of that.

Of course we will need to specialize on narrow strings in Phobos.  But that does not mean we must expose all the dirty details on the public API.  This specialization is an implementation detail that should not affect the public API (to the extent it's possible).


> It's also a concern, because not specializing on narrow strings will sometimes result in having a wrapper range rather than a slice of the original string, which can make incur a performance hit, depending on what you do with the result (e.g. if you need a string, and you get wrapper range, you're forced to copy it into a string, and if specilializng on narrow strings avoided that, then _not_ doing that specialization is causing a performance hit for code using the function). So, ignoring narrow strings is going to hurt us.

I think you misunderstood me.  I didn't mean that we should delete isNarrowString altogether -- obviously we don't want any performance regressions, or, for that matter, semantic regressions by no longer distinguishing between narrow strings.  What I mean is that isNarrowString should be a *private* symbol in Phobos.  It's part of what I call "implementation details" that the user should not need to worry about.  Especially so in documentation, because right now the unreadable overloads of user-facing functions is partly caused by the need to distinguish between isNarrowString and !isNarrowString, which doubles the number of overloads. But from the user's POV, all they really ought to know is that function F accepts a range of some kind. How Phobos will handle different specific range types is none of their business -- they don't *want* to know about it, and they shouldn't need to.


> And yes, this is less of a concern for many D programs than it is for Phobos, because if you're just writing something to get something specific done and not writing a library for others to use, you can take shortcuts - either by not making the code particularly generic or by not caring about some of the performance issues unless they end up being a large enough performance hit for your particular program that you can't ignore them.  So, to some extent, D programs can just ignore narrow strings, and Phobos at least will be efficient with them. But some programs _will_ care, and anyone writing a library to distribute for others to use (e.g. on code.dlang.org) is in exactly the same positition as Phobos. If they don't take narrow strings into account, then there's going to be a performance hit, and it will affect other people's code.

I'd argue that the same reasoning applies to all D libraries, not just Phobos: implementation details like optimizing for narrow strings ought to be hidden from the public API.  It's the library author's problem to figure out how to optimize his library -- Phobos code, after all, is publicly available for perusal and its license terms permit copying Phobos snippets for use in their own library code, if they find that they need, e.g., some form of isNarrowString.  But none of this should be a factor in the design of the user-facing API.  Phobos needs to set the example here.

And to counter the argument that library authors might want to use Phobos' version of isNarrowString: I'd say that it's better for them to copy the definition of isNarrowString into their own code, because the current mess of isNarrowString, isSomeString, et al, means that there's a high likelihood we'd want to revise their definitions, which in turn means that if they're being used in 3rd party library code, subtle breakages may be introduced into said libraries because the definition has changed from when the library author originally looked at it.

That's why, IMO, traits like isNarrowString that are only useful for library implementation details ought NOT to be exported by Phobos. Unless we are 100% sure that it has a rock-solid definition that's well-designed and fits well with other standard traits -- which excludes most (all?) of the current messy string-related traits.


[...]
> > On the contrary, we need to make the *current* traits internal. Providing users with similar traits -- a much cleaner, orthogonal set of traits that has none of the current mess -- is a different question. In principle I agree with doing that, but I don't agree that users should be using any of the current almost-the-same-but-subtly-different traits.
> 
> I definitely think that isAutodecodableString needs to go, and if we can, I'd like to see isConvertibleToString go - though without something like it, you can't deprecate implicit conversions when templatizing a function that took string like Jack Stouffer wants to do.
[...]

Is it possible to hide this stuff behind a "template wall"? I.e., rename the current overload set to xxxImpl() with whatever sig constraints are necessary to achieve what we want, and have the original xxx() be just a generic template that forwards the said overloads?  E.g.:

Change:

	auto xxx(R)(R r) if (isConvertibleToString!R) { ... }
	auto xxx(R)(R r) if (isAutodecodableString!R) { ... }
	auto xxx(R)(R r) if (/* etc. */) { ... }

To:

	auto xxxImpl(R)(R r) if (isConvertibleToString!R) { ... }
	auto xxxImpl(R)(R r) if (isAutodecodableString!R) { ... }
	auto xxxImpl(R)(R r) if (/* etc. */) { ... }

	auto xxx(R)(R r) if (isInputRange!R) { return xxxImpl(r); }

Or does that extra level of syntactic indirection cause problems with implicit conversions?


> As for isSomeString, the only problem I see with it is that it accepts enums. Otherwise, it seems like a perfectly sensible trait to have, and its name fits in with other traits in std.traits such that I don't know what we'd call it instead other than isExactSomeString (which std.conv does internally, but it's not exactly an ideal name). Regardless, I completely disagree about trying to get rid of it or hiding it. If it doesn't need to be in a template constraint, then it shouldn't be, but it's still a useful trait to have.

I don't like the name, because it's unclear what exactly it means and what exactly it includes. The fact that it accepts enums where other traits don't is a further code smell IMO.

I think what we really need to do is to sit down and work out a well-designed, orthogonal set of traits that might be useful for user code, independently of what the current traits are, and then (and only then) worry about how to map the current mess to them.

If I were to look at the current traits from a user's POV, I'd be going, "What, why does Phobos have isNarrowString, isSomeString, isAutodecodableString...  How many different kinds of strings does D have?! and how exactly do any of these traits map to string, wstring, dstring? WTH does "Some" in "isSomeString" refer to? Is it different from "isAnyString" and "isString"? What kind of crappy ad hoc trait design is this anyway?!" Not the kind of reaction we'd want newcomers to have with code that's supposed to set the standard for what D code should look like.


> And as for isNarrowString, it's going to continue to be critical for any code that wants to efficiently deal with strings. Unless we can actually get rid of auto-decoding (and I don't see how without major breakage), as much as it sucks, we're forever stuck dealing with it. Some code can just operate on ranges of characters without caring about the efficiency cost - or even operate on strings without caring about the efficiency cost - but there _is_ an efficiency cost as well as problems related to the type changing, because you can't slice a narrow string in range-based code without specializing on it. So, it really doesn't make sense to ignore narrow strings or to try and get rid of or hide isNarrowString.

We can't get rid of isNarrowString, but I argue that we should "hide" it. Or, at the very least, it should not be used in any user-facing Phobos API (though it can, and currently needs to, be used internally for all the reasons you stated).


[...]
> So, I'm in favor of getting rid of isAutodecodableString and isConvertibleToString, but I don't agree at all with the idea that we're going to be able to act like narrow strings and auto-decoding are just a Phobos problem or that isSomeString or isNarrowString is something that should only be used in Phobos.

The first step towards removing autodecoding (in the distant future) is to reduce dependence of user code on its details.  I argue that isNarrowString ought to be used only within Phobos itself, and should not be exposed to user code.  If user code is concerned about performance, there's already byCodeUnit to bypass autodecoding, and byCodePoint when the user wants to ensure he's dealing with dchar.


> Now, if someone by some miracle can come up with a way to remove auto-decoding without breaking tons of code, then I am all for it. And at that point, isNarrowString becomes unnecessary, and we can have much cleaner range-based code. But I don't see how that's possible. At best, we can reduce the impact that that mistake has. Fixing it just seems impossible, and IMHO, hiding the helper stuff that allows code to be efficient in spite of it is a mistake, much as the necessity of that helper stuff really sucks.
[...]

It's clear that autodecoding can't be *removed* outright, at least not at this point. But we *can* reduce dependence on it by making it a Phobos internal implementation detail rather than exposing it everywhere on the public API with traits like isNarrowString.

Let's not conflate cleaning up Phobos' public API with removing Phobos internal code or changing Phobos semantics (e.g., the "helper stuff"). The two ought to be more-or-less independent, and where they are not, it's a sign of weakness in the design of the API, and is where we need to change things to improve the situation.


T

-- 
Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn
March 16, 2017
On Wednesday, March 15, 2017 10:17:41 H. S. Teoh via Digitalmars-d wrote:
> On Tue, Mar 14, 2017 at 07:21:05PM -0700, Jonathan M Davis via
Digitalmars-d wrote:
> I'd argue that the same reasoning applies to all D libraries, not just Phobos: implementation details like optimizing for narrow strings ought to be hidden from the public API.  It's the library author's problem to figure out how to optimize his library -- Phobos code, after all, is publicly available for perusal and its license terms permit copying Phobos snippets for use in their own library code, if they find that they need, e.g., some form of isNarrowString.  But none of this should be a factor in the design of the user-facing API.  Phobos needs to set the example here.
>
> And to counter the argument that library authors might want to use Phobos' version of isNarrowString: I'd say that it's better for them to copy the definition of isNarrowString into their own code, because the current mess of isNarrowString, isSomeString, et al, means that there's a high likelihood we'd want to revise their definitions, which in turn means that if they're being used in 3rd party library code, subtle breakages may be introduced into said libraries because the definition has changed from when the library author originally looked at it.
>
> That's why, IMO, traits like isNarrowString that are only useful for library implementation details ought NOT to be exported by Phobos. Unless we are 100% sure that it has a rock-solid definition that's well-designed and fits well with other standard traits -- which excludes most (all?) of the current messy string-related traits.

I completely agree that we should try and make the template constraints on public functions in Phobos - and other libraries be simple - and try and restrict a lot of the more complicated stuff to internals. However, I do not agree for a second that we should be making all of those traits private and telling folks to copy-paste them if they want to use them. If a trait is bad enough that we think that it should not exist, then we should look at deprecating it, but if it's truly useful in Phobos, then it's going to be useful in 3rd party code, and it should be publicly available - even more so if it's currently publicly available. Telling folks that we want to deprecate something to make it private and that they should copy-paste it if they want it just seems rude to me.

The reality of the matter though is that if you're talking about traits like isSomeString and isNarrowString, they've been around as long as ranges - if not longer - and I don't think that it's particularly reasonable to deprecate them at this point. There's going to be a _lot_ of code out there using them, and most of it is likely using them correctly. The only real problem with them is that they accept enums, so if you use them with non-range based code, and you're not careful, then you could have a problem if someone tries to use an enum with them. But while that's definitely a problem, for a lot of code, it simply won't matter. So, if we deprecated isSomeString, we'd be telling a lot of folks to change their code when it's perfectly fine as-is. I agree that it would be nice to fix isSomeString, but I just don't think that it's reasonable to deprecate it at this point.

isAutodecodableString and isConvertibleToString, on the other hand, are rather new. So, not much code is using them - either in Phobos or in 3rd party code - and we won't affect much if we remove them. They're also highly questionable in general (whereas aside from the enum issue, isSomeString and isNarrowString are fine). Using either of them is an @safety risk. So, those two should definitely go IMHO.

In general though, I don't think that Phobos is anything special beyond the fact that almost everyone is going to be using it and that ideally, it would be an example of how to write good D code. Any public D library is in a boat similar to Phobos with regards to what it needs to do with templates, and if a trait is useful in Phobos in general, it makes more sense to make it generally available than to hide it. And if we're talking about something that's already public, I don't think that it makes sense to make it private. If it's bad enough for that to be a real consideration, then not even Phobos should be using it.

> Is it possible to hide this stuff behind a "template wall"? I.e., rename the current overload set to xxxImpl() with whatever sig constraints are necessary to achieve what we want, and have the original xxx() be just a generic template that forwards the said overloads?  E.g.:
>
> Change:
>
>   auto xxx(R)(R r) if (isConvertibleToString!R) { ... }
>   auto xxx(R)(R r) if (isAutodecodableString!R) { ... }
>   auto xxx(R)(R r) if (/* etc. */) { ... }
>
> To:
>
>   auto xxxImpl(R)(R r) if (isConvertibleToString!R) { ... }
>   auto xxxImpl(R)(R r) if (isAutodecodableString!R) { ... }
>   auto xxxImpl(R)(R r) if (/* etc. */) { ... }
>
>   auto xxx(R)(R r) if (isInputRange!R) { return xxxImpl(r); }
>
> Or does that extra level of syntactic indirection cause problems with implicit conversions?

While it will work in some cases to let the conversion take place inside the function, in others, it's wrong code in comparison to the original version that took a string. In particular, if the function returns a slice of the original string, with the original code, if a static array was passed to it, then it would have been sliced at the call site, and the string that was returned from the function was a slice of the original static array. However, if you templatize it and do the conversion inside the function, then you end up slicing the copy inside the function and returning a string that refers to memory inside the function that just returned (and is thus invalid). Using auto ref reduces the problem but doesn't fix it. e.g.

auto foo(inout(char)[] str)
{
    return str[0 .. 5];
}

char[10] sa;
auto arr = foo(sa);

becomes wrong code if you do

auto foo(R)(R range)
    if(isRandomAccessRange!R && hasSlicing!R && isSomeChar!(ElementType!R))
{
    return range[0 .. 5];
}

auto foo(T)(T convertible)
    if(isConvertibleToString!T)
{
    // This will now return a slice of a variable local to this function
    // whereas before the conversion occured at the call site, and the
    // memory that was referred to was from outside this function.
    return foo!(StringTypeOf!T)(convertible);
}

char[10] sa;
auto arr = foo(sa);

auto ref reduces the problem, but you're just as screwed as soon as an rvalue is passed instead of an lvalue. The only way to have the same semantics and avoid the problem is to force the conversion at the call site, and the only way that I know how to do that is to templatize on the character type. e.g.

auto foo(R)(R range)
    if(!isSomeString!R &&
       isRandomAccessRange!R &&
       hasSlicing!R &&
       isSomeChar!(ElementType!R))
{
    return _fooImpl(range);
}

auto foo(C)(C[] str)
    if(isSomeChar!C)
{
    return _fooImpl(str);
}

private auto _fooImpl(R)(R range)
{
    return range[0 .. 5];
}

If you templatize on the full type like you normally would, then the conversion takes place inside the function, and it doesn't work. In order to fix that, we'd need some way to put a constraint on the argument type and yet have the parameter type be something different - e.g. require that the argument by convertible to string but have the actual parameter type be something like inout(char)[] or have its own template type so that it could be something like C[] and take an arbitrary character type. Regardless, templates simply don't work like that in D in general, let alone with IFTI.

All of this isn't a problem in the same way if the function being templatized does not return a slice of the original string, but without auto ref, it does still result in additional copies in comparison to the original, and it would be pretty easy to screw it up. I expect that Walter's @safety changes will make finding the @safetr problems with such implicit conversions easier, but the problem remains that in the general case, you really need to be doing the conversion at the call site in order to avoid @safety problems.

> > As for isSomeString, the only problem I see with it is that it accepts enums. Otherwise, it seems like a perfectly sensible trait to have, and its name fits in with other traits in std.traits such that I don't know what we'd call it instead other than isExactSomeString (which std.conv does internally, but it's not exactly an ideal name). Regardless, I completely disagree about trying to get rid of it or hiding it. If it doesn't need to be in a template constraint, then it shouldn't be, but it's still a useful trait to have.
>
> I don't like the name, because it's unclear what exactly it means and what exactly it includes. The fact that it accepts enums where other traits don't is a further code smell IMO.
>
> I think what we really need to do is to sit down and work out a well-designed, orthogonal set of traits that might be useful for user code, independently of what the current traits are, and then (and only then) worry about how to map the current mess to them.
>
> If I were to look at the current traits from a user's POV, I'd be going, "What, why does Phobos have isNarrowString, isSomeString, isAutodecodableString...  How many different kinds of strings does D have?! and how exactly do any of these traits map to string, wstring, dstring? WTH does "Some" in "isSomeString" refer to? Is it different from "isAnyString" and "isString"? What kind of crappy ad hoc trait design is this anyway?!" Not the kind of reaction we'd want newcomers to have with code that's supposed to set the standard for what D code should look like.

If it were just isSomeString and isNarrowString, I don't think that it would be that big a deal. While isNarrowString is a new concept, it's pretty integral to ranges at this point, and isSomeString really isn't a new concept. So, with just the two of them, I don't think that there's really a problem (which is where we used to be).

As for the name, is isSomeString is not the only trait in std.traits with
some in the name, and it makes sense given that isString would imply
is(T == string). isAnyString might have been fine too, but that's not what
we ended up with. If we want to replace isSomeString with a trait that
didn't accept enums, then isAnyString would make sense, but then it really
would be increasing the confusion, and it wouldn't fit with the current
naming scheme in std.traits. If anything, the issues with the number of
string-related traits implies that we should simply document the issue with
isSomeString and not introduce a replacement. And given how old it is, a lot
of code is likely to exist which uses it, so I think that it's quite
questionable that we can replace it.

The real problem is that we have isAutodecodableString and isConvertibleToString, and as I said before, I think that those should go. They're new enough that we won't break a lot of code when we deprecate them, and they're risky to keep. And if we get rid of them, we're down to only two string-specific traits again, and I don't think that that's all that bad, particularly when they're both useful and embody concepts that anyone serious about programming with ranges in D is going to need to understand. The reality of the matter is that even with stuff like byCodeUnit and byGrapheme, the fact that strings auto-decode by default requires that your average D programmer be at least somewhat familiar with those concepts, which then makes isSomeString and isNarrowString quite straightforward.

> It's clear that autodecoding can't be *removed* outright, at least not at this point. But we *can* reduce dependence on it by making it a Phobos internal implementation detail rather than exposing it everywhere on the public API with traits like isNarrowString.
>
> Let's not conflate cleaning up Phobos' public API with removing Phobos internal code or changing Phobos semantics (e.g., the "helper stuff"). The two ought to be more-or-less independent, and where they are not, it's a sign of weakness in the design of the API, and is where we need to change things to improve the situation.

I think that we should start off by

1. deprecating isAutodecodableString and isConvertibleToString and fixing
   Phobos so that it doesn't use them.

2. documenting isSomeString and isNarrowString to make it clear that the
   fact that they match enums is a problem.

3. one by one going through the various functions in Phobos and turning
   overloads into internal static ifs where appropriate.

Doing that won't outright fix the problems with the string traits, but it gets us going back to just two string traits while making the problems with them clear. And of course, we've already discussed how it would be better to turn the overloads into static ifs or overloads of internal functions in order to simplify the public facing constraints - regardless of what's going on with the string-specific constraints.

I don't agree at all though that any of the currently public-facing traits should be made private. Making it so that they're not used in public-facing template constraints when they don't need to be makes a lot of sense, but I see no reason to hamstring 3rd party libraries just because we don't want to put the traits in Phobos' public-facing template constraits if we can avoid it.

- Jonathan M Davis

March 16, 2017
On Monday, 13 March 2017 at 19:08:12 UTC, H. S. Teoh wrote:
> What the user sees should be the most generic API that makes sense relative to this function.  For cases where the implementation can't (yet) handle, a helpful error message is given, and the docs should also indicate the present limitations.

phobos each, and probably other functions too, is sometimes hard to use because template constrains make them disappear when you enter something faulty to them. Thus the error message only says something like: No function overload "each" for arguments (...) found.

I found that it is easier to use a home-made each, without constrains. It still tries to implement the code, thus telling why the argument does not work. Perhaps phobos each should do the same. Of course with an error message saying that the argument wasn't iterable, so that the template does not appear buggy. Is'nt that what you were suggesting anyway?

I also agree that static ifs inside the template are preferable to multiple constrained templates doing essentially the same thing in different ways.
March 16, 2017
On Thursday, 16 March 2017 at 07:44:59 UTC, Jonathan M Davis wrote:
> So, if we deprecated isSomeString, we'd be telling a lot of folks to change their code when it's perfectly fine as-is. I agree that it would be nice to fix isSomeString, but I just don't think that it's reasonable to deprecate it at this point.

If you can't deprecate, what about only deprecating the enum behaviour (e.g. insert a static assert if anybody uses an enum on it)?
« First   ‹ Prev
1 2