The Case Against Autodecode (page 15)

On Monday, 30 May 2016 at 16:34:49 UTC, Jack Stouffer wrote: > On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote: >> D1 -> D2 was a vastly more disruptive change than getting rid of auto-decoding would be. > > Don't be so sure. All string handling code would become broken, even if it appears to work at first. Assuming silent breakage is on the table, what would be broken, really? Code that must intentionally count or otherwise operate code points, sure. But how much of all string handling code is like that? Perhaps it would be worth trying to silently remove autodecoding and seeing how much of Phobos breaks, as an experiment. Has this been tried before? (Not saying this is a route we should take, but it doesn't seem to me that it will break "all string handling code" either.)

On 5/30/2016 8:34 AM, Marc Schütz wrote: > In an ideal world, we'd also want to change the way `length` and `opIndex` work, Why? strings are arrays of code units. All the trouble comes from erratically pretending otherwise.

On 5/30/2016 11:25 AM, Adam D. Ruppe wrote: > I don't agree on changing those. Indexing and slicing a char[] is really useful > and actually not hard to do correctly (at least with regard to handling code > units). Yup. It isn't hard at all to use arrays of codeunits correctly.

May 31, 2016

Re: The Case Against Autodecode

Posted by Marco Leise
in reply to ag0aep6g

Permalink

Marco Leise

Posted in reply to ag0aep6g

Permalink

Am Fri, 27 May 2016 15:47:32 +0200
schrieb ag0aep6g <anonymous@example.com>:

> On 05/27/2016 03:32 PM, Andrei Alexandrescu wrote:
> >>> However the following do require autodecoding:
> >>>
> >>> s.walkLength
> >>> s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation
> >>> s.count!(c => c >= 32) // non-control characters
> >>>
> >>> Currently the standard library operates at code point level even though inside it may choose to use code units when admissible. Leaving such a decision to the library seems like a wise thing to do.
> >>
> >> But how is the user supposed to know without being a core contributor to Phobos?
> >
> > Misunderstanding. All examples work properly today because of autodecoding. -- Andrei
> 
> They only work "properly" if you define "properly" as "in terms of code points". But working in terms of code points is usually wrong. If you want to count "characters", you need to work with graphemes.
> 
> https://dpaste.dzfl.pl/817dec505fd2

1: Auto-decoding shall ALWAYS do the proper thing
2: Therefor humans shall read text in units of code points
3: OS X is an anomaly and must be purged from this planet
4: Indonesians shall be converted to a sane alphabet
5: He who useth combining diacritics shall burn in hell
6: We shall live in peace and harmony forevermore
Let's give this a rest.

-- 
Marco

On Monday, 30 May 2016 at 21:39:14 UTC, Vladimir Panteleev wrote: > On Monday, 30 May 2016 at 16:34:49 UTC, Jack Stouffer wrote: >> On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote: >>> D1 -> D2 was a vastly more disruptive change than getting rid of auto-decoding would be. >> >> Don't be so sure. All string handling code would become broken, even if it appears to work at first. > > Assuming silent breakage is on the table, what would be broken, really? > > Code that must intentionally count or otherwise operate code points, sure. But how much of all string handling code is like that? > > Perhaps it would be worth trying to silently remove autodecoding and seeing how much of Phobos breaks, as an experiment. Has this been tried before? > > (Not saying this is a route we should take, but it doesn't seem to me that it will break "all string handling code" either.) 132 lines in Phobos use auto-decoding - that should be fixable ;-) See them: http://sprunge.us/hUCL More details: https://github.com/dlang/phobos/pull/4384

On 05/30/2016 04:30 PM, Timon Gehr wrote: > > In D, enum does not mean enumeration, const does not mean constant, pure > is not pure, lazy is not lazy, and char does not mean character. > My new favorite quote :)

On Monday, 30 May 2016 at 21:39:14 UTC, Vladimir Panteleev wrote: > Perhaps it would be worth trying to silently remove autodecoding and seeing how much of Phobos breaks, as an experiment. Has this been tried before? Did it, the results are a large number of phobos modules fail to compile because of template constraints that test for is(Unqual!(ElementType!S2) == dchar). As a result, anything that imports std.format or std.uni fails to compile. Also, I see some errors caused by the fact that is(string.front == immutable) now. Is hard to find specifics because D halts execution after one test failure.

On 5/30/16 6:00 PM, Walter Bright wrote: > On 5/30/2016 11:25 AM, Adam D. Ruppe wrote: >> I don't agree on changing those. Indexing and slicing a char[] is >> really useful >> and actually not hard to do correctly (at least with regard to >> handling code >> units). > > Yup. It isn't hard at all to use arrays of codeunits correctly. Trouble is, it isn't hard at all to use arrays of codeunits incorrectly, too. -- Andrei

Forums