November 30, 2018
On Thursday, 29 November 2018 at 14:58:49 UTC, Steven Schveighoffer wrote:
> On 11/29/18 9:11 AM, Nicholas Wilson wrote:
>> I think deprecating the auto decoding free functions with a message to use `.byChar`, `.byDchar`,`.representation` etc. is really the _only_ practical way forward on this. Yes its slow but it won't cause any breakage. Certainly not silent breakage, deprecation being rather noisy.
>
> When I have free time, I'm going to do all this shit. I'm so sick of autodecoding, and I have a feeling it will break so little code*, that we are wringing our hands over nothing.
>
> -Steve
>
> * The big irony of autodecoding is that literally most of Phobos SPECIFICALLY avoids autodecoding carefully because it's such a performance problem. Removing autodecoding will still work with such workarounds.

Hooray!

I think adding a version like `Phobos_NoAutodecode` so it could be tested with https://git.ikeran.org/dhasenan/dubautotester and people can opt in to it, in parallel with deprecating the auto decoding `front/popFront`.
November 30, 2018
On Thursday, 29 November 2018 at 11:11:19 UTC, Walter Bright wrote:
> I'd remove autodecode tomorrow if I could wave the magic wand. The problem, though, as I point out to you again, is removing it will silently break a great deal of code.
>
> You cannot have it be a deal-breaker both ways.
>
> I'll also emphasize that you can avoid autodecode in your own projects by using .byChar. You are not forced to suffer its depredations in your own code.

A number of people in the D community have made the argument to me that autodecoding difficulties have been largely mitigated by the availability of byChar and byCodeUnit.

My assessment is different. These are valuable primitives. I think they suffice if developing small applications, or applications which need high performance string handling only in limited parts of the code. But I don't think they are sufficient if building larger applications that generally need high performance string handling throughout.

Part my viewpoint is from the perspective of development in team environments. Litmus test questions are things like "how much time will be spent in code review checking if autodecoding has been engaged/avoided correctly?"

A specific issue with the byChar/byCodeUnit approach is that the disabling of autodecoding gets dropped in common cases, for example, materializing the range as an array. (An aside, but I have wondered if having a character array type that obeyed/preserved the no-autodecode property would be a material help.)

I'm not trying to suggest how to weigh and tradeoff backward compatibility vs improvements to or elimination of autodecoding. Just trying to shed some light on why some people may have a different assessment of the significance of autodecoding issues.

--Jon
November 30, 2018
On Friday, 30 November 2018 at 00:31:55 UTC, Nicholas Wilson wrote:
> I think adding a version like `Phobos_NoAutodecode` so it could be tested with https://git.ikeran.org/dhasenan/dubautotester and people can opt in to it..

Please don't! This will likely cause some third-party libraries to rely on the switch, while some still rely on autodecoding, making them incompatible!

On the other hand, if we can find a way to similarily mark a single module to not autodecode while others still could, then it's a brilliant idea.

Otherwise, I quess we just have to plain deprectate using char arrays as ranges at all, or leave the whole thing as is.
November 30, 2018
On Friday, 30 November 2018 at 11:22:04 UTC, Dukc wrote:
> [snip]
>
> Please don't! This will likely cause some third-party libraries to rely on the switch, while some still rely on autodecoding, making them incompatible!
>
> On the other hand, if we can find a way to similarily mark a single module to not autodecode while others still could, then it's a brilliant idea.
>
> Otherwise, I quess we just have to plain deprectate using char arrays as ranges at all, or leave the whole thing as is.

Fair point.

Perhaps UDAs? I can't seem to get it to work with static ifs (below). Same thing with template ifs. I tried to do two different versions of foo, one as void foo(T)(@noAutodecode T x) and the other with same signature as below. The problem is with overloads. I can't do void foo(T)(!@noAutodecode T x) and get the right behavior. Support for UDAs on function parameters might need to get improved to make it convenient, but it seems like a natural evolution of the recent change in 2.082.

import std.traits : hasUDA, isNarrowString;
import std.stdio : writeln;

struct noAutodecode {};

void foo(T)(T x)
    if (isNarrowString!T)
{
    static if (hasUDA!(x, noAutodecode)) {
        writeln(hasUDA!(x, noAutodecode));
    } else {
        writeln(hasUDA!(x, noAutodecode));
    }
}

void main() {
    string a = "xyz";
    @noAutodecode string b = "zyx";
    foo(a); //prints false
    foo(b); //prints false, expected true
}
November 30, 2018
On Friday, 30 November 2018 at 16:05:18 UTC, jmh530 wrote:
> import std.traits : hasUDA, isNarrowString;
> import std.stdio : writeln;
>
> struct noAutodecode {};
>
> void foo(T)(T x)
>     if (isNarrowString!T)
> {
>     static if (hasUDA!(x, noAutodecode)) {
>         writeln(hasUDA!(x, noAutodecode));
>     } else {
>         writeln(hasUDA!(x, noAutodecode));
>     }
> }
>
> void main() {
>     string a = "xyz";
>     @noAutodecode string b = "zyx";
>     foo(a); //prints false
>     foo(b); //prints false, expected true
> }

UDAs attach to symbols, not values, so x will never have a @noAutodecode attribute, no matter what arguments you pass to foo.
November 30, 2018
On Friday, 30 November 2018 at 11:22:04 UTC, Dukc wrote:
> On Friday, 30 November 2018 at 00:31:55 UTC, Nicholas Wilson wrote:
>> I think adding a version like `Phobos_NoAutodecode` so it could be tested with https://git.ikeran.org/dhasenan/dubautotester and people can opt in to it..
>
> Please don't! This will likely cause some third-party libraries to rely on the switch, while some still rely on autodecoding, making them incompatible!
>
> On the other hand, if we can find a way to similarily mark a single module to not autodecode while others still could, then it's a brilliant idea.
>
> Otherwise, I quess we just have to plain deprectate using char arrays as ranges at all, or leave the whole thing as is.

Just thinking loud: if the libraries are incompatible and you get an error while trying to use them together, that would be a good thing. There is no silent breakage. The third party library authors can be be contacted and asked to update their libraries.

Kind regards
Andre
November 30, 2018
On Friday, 30 November 2018 at 18:16:37 UTC, Andre Pany wrote:
> Just thinking loud: if the libraries are incompatible and you get an error while trying to use them together, that would be a good thing. There is no silent breakage. The third party library authors can be be contacted and asked to update their libraries.

In that way, yes, a good thing. But I completely disagree with it being a good idea nonetheless. The problem isn't that the libraries need to be updated, the problem is that they need to do so immediately.

For library upkeepers, such global versioning would be effectively as bad as just removing autodecoding overnight without any deprectation period. Or perhaps even worse, since they would probably have to leave behind compatibility with autodecoding version, as there are always libraries that won't migrate quickly enough.

Of course, we could say that the libraries, when they migrate, need to keep backwards compatibility, so late migrators will keep working. But I can hardly imagine that every library keeper will bother and remember to test both versions.
November 30, 2018
On 11/30/18 1:41 PM, Dukc wrote:
> On Friday, 30 November 2018 at 18:16:37 UTC, Andre Pany wrote:
>> Just thinking loud: if the libraries are incompatible and you get an error while trying to use them together, that would be a good thing. There is no silent breakage. The third party library authors can be be contacted and asked to update their libraries.
> 
> In that way, yes, a good thing. But I completely disagree with it being a good idea nonetheless. The problem isn't that the libraries need to be updated, the problem is that they need to do so immediately.
> 
> For library upkeepers, such global versioning would be effectively as bad as just removing autodecoding overnight without any deprectation period. Or perhaps even worse, since they would probably have to leave behind compatibility with autodecoding version, as there are always libraries that won't migrate quickly enough.
> 
> Of course, we could say that the libraries, when they migrate, need to keep backwards compatibility, so late migrators will keep working. But I can hardly imagine that every library keeper will bother and remember to test both versions.

There are going to be many cases I think where it just works, no matter what you care about auto-decoding.

For example searching for a string in a string doesn't matter whether the string uses auto-decoding or not.

For low-level code, you need to pick autodecoding or not autodecoding, and we need a deprecation period, Like Nick suggested. This means that for a period of time, a string won't be a range(!), you have to select byCodeUnit or byCodePoint (or similar). I think to make things easier, we can provide convenience aliases (like bcp or bcu), so it's not as painful. We will likely have workarounds all throughout phobos, that will then be removed once the deprecation period is over.

But painful, it will be. However, mostly for low-level code (i.e. code that uses string.front and string.popFront). May want low-level code conveniences too (frontCodeUnit, frontCodePoint?)

However, the end result is after the deprecation period, things can go back to being reasonable, and autodecoding-free.

We will see how bad it is, once it's tried. I'm hoping not very bad.

-Steve
November 30, 2018
On Fri, Nov 30, 2018 at 02:08:28PM -0500, Steven Schveighoffer via Digitalmars-d wrote: [...]
> There are going to be many cases I think where it just works, no matter what you care about auto-decoding.
> 
> For example searching for a string in a string doesn't matter whether the string uses auto-decoding or not.
> 
> For low-level code, you need to pick autodecoding or not autodecoding, and we need a deprecation period, Like Nick suggested.
[...]
> However, the end result is after the deprecation period, things can go back to being reasonable, and autodecoding-free.
> 
> We will see how bad it is, once it's tried. I'm hoping not very bad.
[...]

There have been offers to run an experiment on various CI's and code.dlang.org, which will give us concrete data on just how bad this will be.  Care to throw together a Phobos PR that can be used as a yardstick?

Basically, I can see multiple runs of the experiment, i.e., PRs that do slightly different things, to probe various aspects of this:

1) Silent breakage: just turn off autodecoding silently, run the CI's, see how many unittests break. (Hopefully not zero! Otherwise that means most D code sux. :D)  Examine breakages to assess ease level of fix. E.g., if we just have to add .byCodeUnit or .byCodePoint, it would count as an easy fix, whereas if the algorithm needs to be rewritten, then it's a complex fix.  Tally both kinds of fixes and see how the numbers compare.

2) Deprecation breakage: add a deprecation to all entry points to autodecoding. See how much code breaks. Examine breakages to assess ease of fix. Should be similar to silent breakage. If numbers are much bigger, that means most D code sux. :D

3) Make strings non-ranges: this will probably basically break *everything*.  Subtract the numbers from (1) and (2) to get an idea of how much extra work will be required during the deprecation period. Essentially, this tells us how much work it will be to add .byCodeUnit / .byCodePoint to every iteration over string.  Expect pretty bad numbers here. (But if we get a pleasant surprise here, this may be an indication that removing autodecoding isn't as fearful as we thought!)

Of course, all of the above depends on a base PR that fixes Phobos to be passing CI tests first.  Alternatively, we could perform the above steps on Phobos first, to assess how much work it will be to make Phobos autodecoding-free, before trying it on user code.


T

-- 
Life begins when you can spend your spare time programming instead of watching television. -- Cal Keegan
November 30, 2018
On Friday, 30 November 2018 at 19:08:28 UTC, Steven Schveighoffer wrote:
> For low-level code, you need to pick autodecoding or not autodecoding, and we need a deprecation period, Like Nick suggested. This means that for a period of time, a string won't be a range(!).

If that's what he intended, better. Libraries still need to adapt, but at least won't become incompatible with each other overnight.

>
> But painful, it will be. However, mostly for low-level code (i.e. code that uses string.front and string.popFront). May want low-level code conveniences too (frontCodeUnit, frontCodePoint?)

I think it's not only low-level code. When I'm just coding away some user input handling, I don't always bother to add .byCodeUnit everywhere. For many, including me, the change will be mainly a good thing, since there will be no accidental decoding anymore -I'll happily migrate. But there are, without doubt, codebases in production that have thousands of lines full of auto-decoding, maintained by people who hardly time to update.

> We will see how bad it is, once it's tried. I'm hoping not very bad.

The good news is, I think, that we can try this without knowing in advance whether it can be done. We can deprectate autodecoding, but if it proves to be too tough to adapt, we simply won't remove it after the period. Of course, we should say in advance that we're only considering removing it, otherwise we might scare some users away.