ranges of characters and the overabundance of traits that go with them - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » ranges of characters and the overabundance of traits that go with them

Thread overview

ranges of characters and the overabundance of traits that go with them
Mar 12, 2017 Jonathan M Davis
Mar 12, 2017 Jack Stouffer
Mar 12, 2017 Jonathan M Davis
Mar 12, 2017 Jack Stouffer
Mar 12, 2017 Jonathan M Davis
Mar 12, 2017 Jonathan M Davis
Mar 13, 2017 Andrea Fontana

March 11, 2017

ranges of characters and the overabundance of traits that go with them

Posted by Jonathan M Davis

Jonathan M Davis

Okay, I'm sorry, but this is a wall of text, and I don't know how to actually make it short. If you really want the TLDR version, look for the =='s at the bottom which mark off the conclusion, but you're not going to understand the reasoning if you skip the wall of text. Now, to the post...

Recently, there was a PR submitted for Phobos which tried to add isStringLike to std.traits. There was some debate about what the name should be, but ultimately, Walter pointed out that we really needed to step back and examine all of the various traits that we have for strings and related types in Phobos before we add anything like this, regardless of the name. So, I've been mulling over this and looking at the current situation and thought I should post about it.

The PR in question: https://github.com/dlang/phobos/pull/5137

First off, I think that the concept of anything "string-like" should be tabled for now. String-like implies that it acts like a string and is therefore at least attempting to be a drop-in replacement. Nothing in Phobos currently tries to do that, and I think that discussions on that sort of thing are better left for when we look at adding something like Andrei's RCString to druntime or Phobos. What we're concerned with at the moment is how strings, things that convert to strings, and ranges of characters are dealt with in templated code.

The first thing to notice when when looking at strings and traits in Phobos
is that we have too many traits that are very similiar but not quite the
same. Each of them was added over time to solve a particular problem that
the existing traits didn't solve well enough or simply because it was a
concept that the submitter thought was one that we'd need to reuse. The
result is a mish-mash of traits with varying - and often confusing uses. We
have isSomeString, isNarrowString, isAutodecodableString, and
isConvertibleToString. And of course, related to those, we have isSomeChar,
ElementType, ElementEncodingType, isDynamicArray, isInputRange,
isForwardRange, isBidirectionalRange, isRandomAccessRange, hasSlicing,
isDynamicArray, isAggregateType, isStaticArray, is(T == enum),
is(T : const char[]), etc. all of which can be used in combination with or
instead of the main string traits. Exactly which combination of all of those
traits get used depends on what the function in question is trying to do,
and they're often used inconsistently. But I think that the main concern is
the various is*String traits. So,

isSomeString: True for any dynamic array of any character type as well as any enum type whose base type is a dynamic array of any character type.

isNarrowString: The same as isSomeString except that arrays of dchar and enum types with a base type that's a dynamic array of dchar do not qualify as narrow strings.

isAutodecodableString: The same as isNarrowString except that it also accepts user-defined types which implicitly convert to a dynamic array of char or wchar.

isConvertibleToString: Any type that is _not_ a dynamic array of characters but implicitly converts to a dynamic array of characters - so enums with that as their base type, user-defined types which implicitly convert to such dynamic arrays, and static arrays which implicitly convert to such dynamic arrays.

Note that these are all similar but subtly different. Of particular note is how this affects implicit conversions. Implicit conversions in generic code are _bad_. It's trivial for a type to pass a template constraint due to an implicit conversion and then not actually compile with the template - or worse, to compile but have subtly wrong behavior. So, pretty much any time that an implicit conversion is allowed with a template constraint, that conversion needs to be forced. Once it's converted, then it can function as the targetted type, because it _is_ the targetted type.

Honestly, I'm inclined to think that in most cases that conversion should be done explicitly by the programmer rather than happening automatically, because that's far less error-prone (e.g. this is what we do when we require that you slice a static array to pass it to a range-based function rather than having the range-based function do it for you). Unfortunately, that's not always an option - e.g. when templatizing a function that previously accepted string, if you don't want to break code, then you have to accept anything that implicitly converts to string.

So, the first mistake here is that isSomeString accepts enums. Annoyingly,
that was actually my fault (Kenji actually fixed isSomeString years ago, and
at the time, I argued that it should accept enums, which in hindsight,
was dumb of me). :(

In some cases, having a templated function accept both strings and enums _can_ work, but in most cases, it's going to be wrong. You can examine an enum with a base type of string, but you can't mutate it or assign it without risking either not compiling or ending up with a variable of an enum type with an invalid value, and enums don't pass isInputRange even if their base types is a range, so unlike strings, you can't use range-based functions with them. So, having code like

auto foo(S)(S str)
    if(isSomeString!S)
{...}

is actually a recipe for disaster. It'll work with strings, but it likely won't work properly with enums, and most programmers will likely forget to test it with enums and won't catch the problem. Fortunately, I think that isSomeString is used primarily in range-based code to differentiate strings from other ranges, but it _is_ a concern that isSomeString accepts enums. As such, I think that code using isSomeString in a context where an enum would pass needs to either be changed so that the function accepts only strings and uses !is(S == enum) to prevent enums from passing _or_, it should be templatized in a manner which accepts anything which implicitly converts to a string type while simultaneously forcing the conversion. e.g.

auto foo(C)(C[] str)
    if(isSomeChar!C)
{...}

That gives you essentially the same semantics as

auto foo(string str)
{...}

except that it accepts any character type (as well as any constness for the character type), and the conversion is done cleanly without needing any overloads.

Ideally, isSomeString would be fixed to not accept enums, but that risks breaking code, and I'd argue that in most cases where the fact that a template accepts enums would be a problem, templatizing on the character type is a better solution anyway, because it works with more code and forces the conversion without needing any extra code.

Now, most of the code that would break if we changed isSomeString is probably buggy anyway, but we would be risking breaking code if we made the change. So, if we started from scratch, I would definitely do it, but I don't know that we can reasonably do so now. We could of course do what std.conv does internally and add isExactSomeString, which is what isSomeString _should_ be, but that would mean adding yet another trait, whereas ideally we'd be consolidating them. However, it's probably a better approach than changing isSomeString if we don't want to risk breaking code. Regardless, isSomeString's documentation should be updated to make this problem clearer.

isNarrowString has the same problem as isSomeString, but AFAIK, its only real use is in range-based code where the fact that it accepts enums isn't a problem. So, the risk of breaking code when making it not accept enums is probably much lower than doing that with isSomeString, but the benefit is also lower, since the risk of it being used in a buggy manner is less. Aside from the enum issue though, I think that isNarrowString is fine as-is.

However, I think that isAutodecodableString should definitely be deprecated, and I think that we should at least consider deprecating isConvertibleToString. But I'll go into that below, since to understand that, I think that we first need to look at what we're trying to accomplish with these traits.

In general, when you templatize a function involving strings, you're doing one of

- create a generic function that works on any range (which just so happens
  to include strings)
- create a function that specifically operates on strings (but is templated
  so that it can operate on different constness or operate on different
  character types)
- create a function that specifically operates on a range of characters,
  which happens to include strings
- create a function that operates on strings or ranges of characters or
  operates on types that implicitly convert to a string type
- templatize a function that used to take string (or took any dynamic array
  of characters) and make it take any range of characters

If you're creating a generic function, it may not need to do anything special for strings or arbitrary ranges of characters - this is especially true if byCodeUnit has been used, and no decoding is taking place - but sometimes, it may be necessary to specialize on narrow strings to avoid unnecessary decoding (and the function may or may not then do any decoding explicitly, depending on what it's trying to do). I think that that's largely covered by isNarrowString. The constraint usually makes sure that you're dealing with a range, and ideally, the overload dealing with isNarrowString would be inside a static if rather than at the top level. If it's at the top level, then there's some risk of an enum getting through, because the constraint assumed that isNarrowString was enough to guarantee that the type was a range, but that goes back to whether we should make isSomeString and isNarrowString accept enums, and switching to a simpler, top-level constraint and making the isNarrowString bit be internal with a static if largely solves the problem in this case. So, I don't think that this use case requires any changes to any of the string-related traits right now.

As for functions that specifically operate on strings. As I said above when discussing isSomeString, I think that the correct thing to do in their case is to just templatize on the character type

auto foo(C)(C[] str)
    if(isSomeChar!C)
{...}

That does mean that the function will accept types that implicit convert to strings in addition to strings, but if you're looking for it to specifically operate on strings, that should be just as okay as it would be if the function explictly took string instead of being templated. Any code that wants to avoid that though can do

auto foo(S)(S str)
    if(isSomeString!S && !(S == enum))
{...}

but there usually isn't any point in doing that instead of templatizing on the character type. It just makes the function work with less code.

As for functions that specifically operate on ranges of characters (and operate on strings as part of that), I think that they're pretty much in the same boat as generic range-based functions. They have isNarrowString to check if they want to specialize on narrow strings, but they also have byCodeUnit, which they can use without checking isNarrowString, completely avoiding having to check for narrow strings, though whether using byCodeUnit is appropriate or not depends on what the function is doing. Still, I don't think that anything special needs to be done with the existing traits for this use case.

The last two cases are essentially the same (in both cases, you end up with a function that accepts types that implicitly convert to a string type in addition to accepting arbitrary ranges of characters). The primary difference is whether you're creating a new function or templatizing an existing one. The reason that I separated them is because I'm inclined to argue that in most cases, when you're creating a range-based function, you simply shouldn't be making it accept implicit conversions. It's clearer and less error-prone for the caller to do the conversion. So, I don't think that we normally should be creating such functions. However, we _do_ need to templatize existing functions that take string, and what we need to do to make that work is pretty much the same thing that we'd need to do when creating a templated function that accepts types that implicitly convert to a string type in addition to accepting arbitrary ranges of characters.

isConvertibleToString was created for this case. Basically, you have a function that accepts string

auto foo(string str) {..}

and you want to templatize it, so you create an overload for ranges of characters and another for types that implicitly convert to string. e.g.

auto foo(R)(R range)
    is(isInputRange!R &&
       isSomeChar!(ElementType!R) &&
       !isConvertibleToString!S)
{...}

auto foo(T)(auto ref T convertible)
    is(isConvertibleToString!T)
{
    return foo!(StringTypeOf!T)(convertible);
}

Then the types that work as-is are accepted by the first overload, and those that need to be converted are converted by the second overload and then passed to the first.

Note the auto ref on the second overload. This is so that if a static array is passed as a variable, then it won't be copied. And since we don't want the auto ref on the overload taking ranges (since that would potentially break code), we're forced to have two separate functions rather having a single function that has an internal static if.

What I would _like_ to see done here is to have the convertible overload be something more like

auto foo(C)(C[] str)
    if(isSomeChar!C)
{...}

This makes the conversion automatic and negates the need for isConvertibleToString. However, this overload would also accept strings, which means that you either have to duplicate code between the functions, or you need a helper function. e.g.

auto foo(R)(R range)
    is(!isSomeString!R &&
       isInputRange!R &&
       isSomeChar!(ElementType!R))
{
    return _fooImpl(range);
}

auto foo(C)(C[] str)
    if(isSomeChar!C)
{
    return _fooImpl(str);
}

private auto _fooImpl(R)(R range)
    is(isInputRange!R &&
       isSomeChar!(ElementType!R))
{...}

And because one of the overloads is templated on character type, whereas the other is templated on the entire type, there's no way to combine them. If anything, it's harder to combine them than in the case where isConvertibleToString is used. So, unless someone sees something I'm missing here or comes up with a great idea that has escaped me (which is quite possible), I think that we're stuck with isConvertibleToString so that we can make functions that accept string accept arbitrary ranges of characters without breaking code.

Alternatively, we _could_ deprecate the overload which takes isConvertibleToString and ultimately make the function require explicit conversions to be used like we would normally do with a range-based function that we were writing from scratch, but you'd still need isConvertibleToString during the deprecation phase. And I'm not sure that it's worth going through that deprecation phase anyway.

Unfortunately, however, using this technique with a function that returns a portion of the range is going to be wrong when it's passed a static array by value (or if it's passed a user-defined type that implicitly converts to a string but does so in a manner similar to slicing static arrays such that the original variable can't go out of scope before the string does without having the string refer to memory that is no longer valid). In the case where you have

auto foo(C)(C[] str)
    if(isSomeChar!C)
{...}

the conversion takes place at the callsite and thus the return value is potentially usable before the original goes out of scope - which is the exact same situtation that we had when the function was not templated and explicitly took string, whereas with

auto foo(T)(auto ref T convertible)
    is(isConvertibleToString!T)
{
    return foo!(StringTypeOf!T)(convertible);
}

unless it's passed by reference, it's going to go out of scope too early. So, the safe thing to do would be to change it to using the three function solution discussed above:

auto foo(R)(R range)
    is(!isSomeString!R &&
       isInputRange!R &&
       isSomeChar!(ElementType!R))
{
    return _fooImpl(range);
}

auto foo(C)(C[] str)
    if(isSomeChar!C)
{
    return _fooImpl(str);
}

private auto _fooImpl(R)(R range)
    is(isInputRange!R &&
       isSomeChar!(ElementType!R))
{...}

in which case, the only reason to keep isConvertibleToString is that in the case where you know that the return value is not a piece of the original range, then the two function solution works safely. But given how easy it is to screw up with the two function solution (e.g. there's been confusion on github about std.file functions that use this idiom and why they have auto ref), I'm inclined to argue that the 3 function solution is what should become the "official" way to templatize a function that accepts string and make it accept arbitrary ranges of characters in addition to what it accepted before. And in that case, it would make sense to deprecate isConvertibleToString and thus reduce the number of stray string traits that programmers have to understand use correctly.

So, that leaves us with isAutodecodableString. Notice that no use cases have used it. The only use case I'm aware that makes any sense is byCodeUnit, which does not work with static arrays (presumably, because they're not ranges and slicing them for byCodeUnit is likely to be unsafe - though how unsafe it is exactly the same as what we've been discussing for templatizing functions that take string). And actually, I'm inclined to think that byCodeUnit shouldn't have accepted types that implicitly converted to string and should have required the conversion just like other templated code normally does, but changing it now risks breaking code, and in the case of something like an enum, it's probably handy, since byCodeUnit is the sort of function that's going to be used at the beginning of a chain of ranges. That being said, if byCodeUnit were to use the three function solution that I just talked about for templatizing string functions, it could use static arrays just as safely as any function that has a string parameter and returns a slice of that string. So, isAutodecodable wouldn't be needed.

And if we didn't want to make that change,

isNarrowString!T || (isConvertibleToString&T && !isStaticArray!T)

could be used instead (the main reason that isConvertibleToString wasn't used with byCodeUnit in the first place was that isConvertibleToString was added later), or if we do decide to get rid of isConvertibleToString but don't want byCodeUnit to accept static arrays, it would simple enough to make its template constraint do the right thing without either isAutodecodableString or isConvertibleToString.

In general though, I think that templated functions operating on ranges of characters should either be accepting only ranges of characters and not stuff that implicitly converts to them (as we typically do with range-based functions), or they should accept arbitrary ranges of characters plus everything that a function accepting strings explicitly would. So, I don't think that using isAutodecodableString with byCodeUnit ultimately is the right thing to do, particularly since it's used almost nowhere else. Aside from isAutodecodableString's unit tests and byCodeUnit, the only other place in Phobos that isAutodecodableString is used is a static if inside of std.algorithm.comparison.equal which could have just as easily used isNarrowString.

As such, I think that isAutodecodableString should definitely be deprecated, and while it's more questionable with isConvertibleToString, given the safety concerns, I think that we'd be better of deprecating it as well. And fortunately, neither of them is particularly old, so not much code outside of Phobos is going to be using them yet.

As for adding a new trait, which is what started this whole mess, the
issue that we have that a new trait would help solve is that any range-based
code that specifically operates on strings is going to end up with something
like

isInputRange!R && isSomeChar!(ElementType!R)

So, that could be repeated a fair bit in templated code. We _could_ shorten that to something like isSomeCharRange!R, but as soon as you need a forward range or greater, you're going to end up with something like

isForwardRange!R && isSomeCharRange!R

whereas right now it would be

isForwardRange!R && isSomeChar!(ElementType!R)

And that really doesn't add much value. Alternatively, it could also require finite ranges (which is what the original PR proposed), but to make that clear, you'd probably need something like isSomeFiniteCharRange rather than isSomeCharRange, which eats away at its value even further (since it's even longer), and it's often the case that we want to accept infinite ranges, because we generally try and make algorithms lazy as much as we can.

So, I'm not at all convinced that we want to add a new trait for arbitrary ranges of characters. And it's not like we're looking at adding something like isIntegralRange. That sort of thing just doesn't scale, and beyond code that only accepts basic input ranges, it doesn't even tend to make template constraints much shorter. Regardless, as stated before, I think that talking about a range of characters as being "string-like" is incorrect, because strings are a lot more than finite, basic input ranges of characters.

============================================================================

So, in conclusion, with regards to isSomeString, isNarrowString, isAutodecodableString, and isConvertibleToString, I propose that we

1. Deprecate isAutodecodableString and make the few places in Phobos that
   currently use it use isNarrowString or isConvertibleToString instead.

2. Strongly consider deprecating isConvertibleToString (given its safety
   concerns with static arrays) and instead use overloads that are
   templated on character type to accept implicit conversions to string
   rather than being templated on the entire type (the overload taking
   arbitrary ranges would of course still be templated on the entire type).

3. Consider changing isSomeString and isNarrowString so that they are false
   for enums - since they really should be - but we probably can't do that
   without breaking at least some code (albeit mostly code that's going to
   be buggy anyway). If we don't make that change, we should consider adding
   isExactSomeString to std.traits rather than just having it as internal to
   std.conv, and then for any non-range based code, we can tell folks to use
   isExactSomeString and avoid isSomeString if they really don't want to
   accept implicit conversions (whereas code accepting implicit conversions
   can just templatize on character type). Regardless, isSomeString and
   isNarrowString's documentation should be updated to make the problem
   with enums clear.

If it were entirely up to me, I would deprecate isAutodecodableString and isConvertibleToString as soon as Phobos were fixed to not use them, but I'm divided on whether I would fix isSomeString, since I don't know of a way to do that without risking breaking code - even if it is likely to be code that's buggy, and we'd prefer to not be breaking code without any kind of deprecation phase.

And with regards to adding a new trait specifically for ranges of characters, I don't think that it adds enough value to be worth it. It would shorten some constraints, but there are many it would not, and it even the ones that it would shorten wouldn't be shortened all that much.

- Jonathan M Davis

March 12, 2017

Re: ranges of characters and the overabundance of traits that go with them

Posted by Jack Stouffer
in reply to Jonathan M Davis

Jack Stouffer

Posted in reply to Jonathan M Davis

On Sunday, 12 March 2017 at 03:23:21 UTC, Jonathan M Davis wrote:
> ...

First off, I'd like to point out that creating specific overloads for alias this-ed structs is a bad idea, because you just have to ask the user to do this

MyCustomType { string b; alias b this; }

MyCustomType a;
foo!(string)(a);

instead of

MyCustomType a;
foo(a);

And I think making the conversion on the user is better, because once we go down the rabbit hole of offering custom template support for user defined alias this-ed types, there's almost no bottom. You have to do the same thing to every templated function that uses traits like isNumeric, isPointer, etc.

So with that in mind, your proposal for alias this overloads of

> auto foo(C)(C[] str)
>     if(isSomeChar!C)
> {...}

Isn't really helpful for either because it still clutters code with an extra overload. I think the problem we need to accept is that alias this and template will always be sort of oil and water.

I don't know if we should change the current overloads, as people complained with std.file was range-ified and their code using DirEntry broke. That's honestly what started this whole mess, and honestly in hindsight we should have marked the issue as won't fix and told them to use the above method. But people (including me) wanted the regression fixed which is totally understandable.

Maybe we should deprecate the isConvertibleToString overloads, maybe not. But we really shouldn't add more.

>As such, I think that code using isSomeString in a context where an
>enum would pass needs to either be changed so that the function
>accepts only strings and uses !is(S == enum) to prevent enums from passing

How much longer until we're at "Effective D: 55 things you shouldn't do because it's buggy and you don't know it".

*sigh* I know what my PRs are going to be for the next two weeks.

> So, that could be repeated a fair bit in templated code. We _could_
> shorten that to something like isSomeCharRange!R, but as soon as
> you need a forward range or greater, you're going to end up with
> something like

> isForwardRange!R && isSomeCharRange!R
> whereas right now it would be
> isForwardRange!R && isSomeChar!(ElementType!R)

Well, that kills that idea, doesn't it?

> Consider changing isSomeString and isNarrowString so that they are false
> for enums - since they really should be - but we probably can't do that
> without breaking at least some code (albeit mostly code that's going to
> be buggy anyway).

Can't agree here. We don't know how much code we'd be breaking. The move towards deprecation of old behavior is the best for users, and that's not really possible here. The best thing to do is make the docs very explicit.

> If we don't make that change, we should consider adding
> isExactSomeString to std.traits

Also disagree. Making more traits on top of an already confusing situation is a non-starter for me. Again, I think we just need to make the docs better here.

March 12, 2017

Re: ranges of characters and the overabundance of traits that go with them

Posted by Jonathan M Davis
in reply to Jack Stouffer

Jonathan M Davis

Posted in reply to Jack Stouffer

On Sunday, March 12, 2017 09:02:53 Jack Stouffer via Digitalmars-d wrote:
> On Sunday, 12 March 2017 at 03:23:21 UTC, Jonathan M Davis wrote:
> > ...
>
> First off, I'd like to point out that creating specific overloads for alias this-ed structs is a bad idea, because you just have to ask the user to do this
>
> MyCustomType { string b; alias b this; }
>
> MyCustomType a;
> foo!(string)(a);
>
> instead of
>
> MyCustomType a;
> foo(a);
>
> And I think making the conversion on the user is better, because once we go down the rabbit hole of offering custom template support for user defined alias this-ed types, there's almost no bottom. You have to do the same thing to every templated function that uses traits like isNumeric, isPointer, etc.
>
> So with that in mind, your proposal for alias this overloads of
>
> > auto foo(C)(C[] str)
> >     if(isSomeChar!C)
> > {...}
>
> Isn't really helpful for either because it still clutters code with an extra overload. I think the problem we need to accept is that alias this and template will always be sort of oil and water.
>
> I don't know if we should change the current overloads, as people complained with std.file was range-ified and their code using DirEntry broke. That's honestly what started this whole mess, and honestly in hindsight we should have marked the issue as won't fix and told them to use the above method. But people (including me) wanted the regression fixed which is totally understandable.
>
> Maybe we should deprecate the isConvertibleToString overloads, maybe not. But we really shouldn't add more.

I completely agree that in general, we should avoid implicit conversions with functions. I thought that I made that clear. Allowing implicit conversions with templates is error-prone and should be generally avoided.

The issue is when you templatize a function that already exist. If you take a function that accepts string, and you try and make it so that it accepts arbitrary ranges of characters, you have to still accept all of the implicit conversions that the original function did, or you break code. So, it becomes a question of the best way to write the templatized version of the function.

As it stands, using isConvertibleToString is the solution that seems to have been introduced to Phobos to fix that problem, and as I pointed out, that's a risky solution at best and an outright wrong solution at worst, because it does the conversion inside the function rather that at the call point, which creates safety issues. And in those cases, I think that we should be templatizing on character type like with

auto foo(R)(R range)
    if(!isSomeString!R &&
       isInputRange!R &&
       isSomeChar!(ElementType!R))
{
    return _fooImpl(range);
}

auto foo(C)(C[] str)
    if(isSomeChar!C)
{
    return _fooImpl(str);
}

private auto _fooImpl(R)(R range)
    if(isInputRange!R && isSomeChar!(ElementType!R))
{...}

because that's safer. If we're talking about creating a new function rather than templatizing an existing function, then I don't think that it makes sense to add that extra overload. Just make it accept ranges of characters and be done with it. But the problem of templatizing existing functions still exists, and in that case, we're stuck doing _something_ to accept the implicit conversions, or we break code.

- Jonathan M Davis

March 12, 2017

Re: ranges of characters and the overabundance of traits that go with them

Posted by Jonathan M Davis

Jonathan M Davis

On Sunday, March 12, 2017 01:26:08 Jonathan M Davis via Digitalmars-d wrote:
> I completely agree that in general, we should avoid implicit conversions with functions.

In case it wasn't clear from the context, I meant specifically with templated functions.

- Jonathan M Davis

March 12, 2017

Re: ranges of characters and the overabundance of traits that go with them

Posted by Jack Stouffer
in reply to Jonathan M Davis

Jack Stouffer

Posted in reply to Jonathan M Davis

On Sunday, 12 March 2017 at 09:26:08 UTC, Jonathan M Davis wrote:
> and in that case, we're stuck doing _something_ to accept the implicit conversions, or we break code.

Sorry wrote that at 4 in the morning. Should have been clearer.

I'm saying we should, at best, offer an overload for alias this-ed structs only temporarily as a deprecated function. As I mentioned, I don't like these hacks, and in reality Phobos requires a lot more of them in order to be regression free. They also clutter up the code and docs.

March 12, 2017

Re: ranges of characters and the overabundance of traits that go with them

Posted by Jonathan M Davis
in reply to Jack Stouffer

Jonathan M Davis

Posted in reply to Jack Stouffer

On Sunday, March 12, 2017 17:22:09 Jack Stouffer via Digitalmars-d wrote:
> On Sunday, 12 March 2017 at 09:26:08 UTC, Jonathan M Davis wrote:
> > and in that case, we're stuck doing _something_ to accept the implicit conversions, or we break code.
>
> Sorry wrote that at 4 in the morning. Should have been clearer.
>
> I'm saying we should, at best, offer an overload for alias this-ed structs only temporarily as a deprecated function. As I mentioned, I don't like these hacks, and in reality Phobos requires a lot more of them in order to be regression free. They also clutter up the code and docs.

Yes, they clutter up code, but I don't know how reasonable it is to keep deprecating behavior in Phobos like that every time that we get around to templatizing a string-based function so that it's range based. To make matters worse, I don't think that it's actually possible to deprecate the implicit conversions _and_ have it be safe in the case where a portion of the range is returned from the function.

If isConvertibleToString is used - or any template constraint that is templated on the whole type - then when the implicit conversion is done, it's done inside the function, which in the case of static arrays and some user-defined types would mean returning a reference to memory that is local to the function (and thus invalid), whereas if the function is templated on the character type, then the implict conversion occurs at the call site - so it acts as it did when the function wasn't templated, and it's safe. But if the function is templated on character type, then it's going to accept strings, not just the implicit conversions. So, you can't deprecate it. So, it looks to me like we're faced with either allowing deprecation and being unsafe or being safe and not being able to do any deprecation. Certainly, I can't think of any other way to accept an implicit conversion and have the conversion be done at the call site, and without that, you have a safety problem.

As such, even if the resulting breakage is acceptable, I don't see how we be safely templatizing these functions and then deprecating the overloads that deal with implicit conversions. In some cases, we could get away with it, because no portion of the original range is returned, but in many cases, it is returned, and then AFAIK, templatizing on the character type is the only way to make it safe.

This whole problem would be reduced if we could actually get the implicit conversion of static arrays to dynamic arrays deprecated and then removed from the language, but last time I checked, Walter didn't like that idea, and even if we did that, we'd still have the potential of something unsafe happening with user-defined types implicitly converting to string where they work correctly when the function does the conversion at the call site but do not work correctly if the conversion is done internally.

Honestly, this issue feels a lot to me like the issue with isSomeString. Ideally, we'd just fix it and deal with the broken code, as annoying as that would be. But if we did that, we'd be breaking code. We can make a choice that makes the language or library better in the long run, but it's going to cost us in the short term.

In _some_ cases we could go through a deprecation phase with the implicit conversions and avoid immediately breaking code, but it's likely to result in a number of complaints if we do, and in many cases, we simply can't do it safely - not unless someone has a bright idea about how we can force implicit conversions to take place at the call site while still having the function be templated on the entire type rather than just the character type.

- Jonathan M Davis

March 13, 2017

Re: ranges of characters and the overabundance of traits that go with them

Posted by Andrea Fontana
in reply to Jonathan M Davis

Andrea Fontana

Posted in reply to Jonathan M Davis

On Sunday, 12 March 2017 at 03:23:21 UTC, Jonathan M Davis wrote:
> Okay, I'm sorry, but this is a wall of text, and I don't know how to actually make it short. If you really want the TLDR version, look for the =='s at the bottom which mark off the conclusion, but you're not going to understand the reasoning if you skip the wall of text. Now, to the post...
> [...]

Very well-written and interesting post!

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation