May 30, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jack Stouffer | On Monday, 30 May 2016 at 16:34:49 UTC, Jack Stouffer wrote:
> On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote:
>> D1 -> D2 was a vastly more disruptive change than getting rid of auto-decoding would be.
>
> Don't be so sure. All string handling code would become broken, even if it appears to work at first.
Assuming silent breakage is on the table, what would be broken, really?
Code that must intentionally count or otherwise operate code points, sure. But how much of all string handling code is like that?
Perhaps it would be worth trying to silently remove autodecoding and seeing how much of Phobos breaks, as an experiment. Has this been tried before?
(Not saying this is a route we should take, but it doesn't seem to me that it will break "all string handling code" either.)
|
May 30, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Marc Schütz | On 5/30/2016 8:34 AM, Marc Schütz wrote:
> In an ideal world, we'd also want to change the way `length` and `opIndex` work,
Why? strings are arrays of code units. All the trouble comes from erratically pretending otherwise.
|
May 30, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
> I don't agree on changing those. Indexing and slicing a char[] is really useful
> and actually not hard to do correctly (at least with regard to handling code
> units).
Yup. It isn't hard at all to use arrays of codeunits correctly.
|
May 31, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to ag0aep6g | Am Fri, 27 May 2016 15:47:32 +0200 schrieb ag0aep6g <anonymous@example.com>: > On 05/27/2016 03:32 PM, Andrei Alexandrescu wrote: > >>> However the following do require autodecoding: > >>> > >>> s.walkLength > >>> s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation > >>> s.count!(c => c >= 32) // non-control characters > >>> > >>> Currently the standard library operates at code point level even though inside it may choose to use code units when admissible. Leaving such a decision to the library seems like a wise thing to do. > >> > >> But how is the user supposed to know without being a core contributor to Phobos? > > > > Misunderstanding. All examples work properly today because of autodecoding. -- Andrei > > They only work "properly" if you define "properly" as "in terms of code points". But working in terms of code points is usually wrong. If you want to count "characters", you need to work with graphemes. > > https://dpaste.dzfl.pl/817dec505fd2 1: Auto-decoding shall ALWAYS do the proper thing 2: Therefor humans shall read text in units of code points 3: OS X is an anomaly and must be purged from this planet 4: Indonesians shall be converted to a sane alphabet 5: He who useth combining diacritics shall burn in hell 6: We shall live in peace and harmony forevermore Let's give this a rest. -- Marco |
May 31, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Marco Leise | > 4: Indonesians* shall be converted to a sane alphabet *Correction: Koreans (2-4 Hangul syllables (code points) form each letter) -- Marco |
May 31, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | A relevant thread in the Rust bug tracker I remember from three years ago: https://github.com/rust-lang/rust/issues/7043 May it be of inspiration. -- Marco |
May 30, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | On Monday, 30 May 2016 at 21:39:14 UTC, Vladimir Panteleev wrote: > On Monday, 30 May 2016 at 16:34:49 UTC, Jack Stouffer wrote: >> On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote: >>> D1 -> D2 was a vastly more disruptive change than getting rid of auto-decoding would be. >> >> Don't be so sure. All string handling code would become broken, even if it appears to work at first. > > Assuming silent breakage is on the table, what would be broken, really? > > Code that must intentionally count or otherwise operate code points, sure. But how much of all string handling code is like that? > > Perhaps it would be worth trying to silently remove autodecoding and seeing how much of Phobos breaks, as an experiment. Has this been tried before? > > (Not saying this is a route we should take, but it doesn't seem to me that it will break "all string handling code" either.) 132 lines in Phobos use auto-decoding - that should be fixable ;-) See them: http://sprunge.us/hUCL More details: https://github.com/dlang/phobos/pull/4384 |
May 30, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | On 05/30/2016 04:30 PM, Timon Gehr wrote:
>
> In D, enum does not mean enumeration, const does not mean constant, pure
> is not pure, lazy is not lazy, and char does not mean character.
>
My new favorite quote :)
|
May 31, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | On Monday, 30 May 2016 at 21:39:14 UTC, Vladimir Panteleev wrote:
> Perhaps it would be worth trying to silently remove autodecoding and seeing how much of Phobos breaks, as an experiment. Has this been tried before?
Did it, the results are a large number of phobos modules fail to compile because of template constraints that test for is(Unqual!(ElementType!S2) == dchar). As a result, anything that imports std.format or std.uni fails to compile. Also, I see some errors caused by the fact that is(string.front == immutable) now.
Is hard to find specifics because D halts execution after one test failure.
|
May 31, 2016 Re: The Case Against Autodecode | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 5/30/16 6:00 PM, Walter Bright wrote:
> On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
>> I don't agree on changing those. Indexing and slicing a char[] is
>> really useful
>> and actually not hard to do correctly (at least with regard to
>> handling code
>> units).
>
> Yup. It isn't hard at all to use arrays of codeunits correctly.
Trouble is, it isn't hard at all to use arrays of codeunits incorrectly, too. -- Andrei
|
Copyright © 1999-2021 by the D Language Foundation