The Case Against Autodecode (page 32)

On 02.06.2016 22:07, Walter Bright wrote: > On 6/2/2016 12:05 PM, Andrei Alexandrescu wrote: >> * s.all!(c => c == 'ö') works only with autodecoding. It returns >> always false >> without. > > The o is inferred as a wchar. The lamda then is inferred to return a > wchar. No, the lambda returns a bool. > The algorithm can check that the input is char[], and is being > tested against a wchar. Therefore, the algorithm can specialize to do > the decoding itself. > > No autodecoding necessary, and it does the right thing. It still would not be the right thing. The lambda shouldn't compile. It is not meaningful to compare utf-8 and utf-16 code units directly.

June 02, 2016

Re: The Case Against Autodecode

Posted by Andrei Alexandrescu
in reply to tsbockman

Permalink

Andrei Alexandrescu

Posted in reply to tsbockman

Permalink

On 06/02/2016 03:34 PM, tsbockman wrote:
> On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote:
>> Pretty much everything. Consider s and s1 string variables with
>> possibly different encodings (UTF8/UTF16).
>> ...
>
> Your 'ö' examples will NOT work reliably with auto-decoded code points,
> and for nearly the same reason that they won't work with code units; you
> would have to use byGrapheme.

They do work per spec: find this code point. It would be surprising if 'ö' were found but the string were positioned at a different code point.

> The fact that you still don't get that, even after a dozen plus attempts
> by the community to explain the difference, makes you unfit to direct
> Phobos' Unicode support.

Well there's gotta be a reason why my basic comprehension is under constant scrutiny whereas yours is safe.

> Please, either go study Unicode until you
> really understand it, or delegate this issue to someone else.

Would be happy to. To whom would I delegate?


Andrei

On 06/02/2016 03:34 PM, deadalnix wrote: > On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote: >> Pretty much everything. Consider s and s1 string variables with >> possibly different encodings (UTF8/UTF16). >> >> * s.all!(c => c == 'ö') works only with autodecoding. It returns >> always false without. >> > > False. True. "Are all code points equal to this one?" -- Andrei

On 02.06.2016 22:13, Andrei Alexandrescu wrote: > On 06/02/2016 03:34 PM, deadalnix wrote: >> On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote: >>> Pretty much everything. Consider s and s1 string variables with >>> possibly different encodings (UTF8/UTF16). >>> >>> * s.all!(c => c == 'ö') works only with autodecoding. It returns >>> always false without. >>> >> >> False. > > True. "Are all code points equal to this one?" -- Andrei > > I.e. you are saying that 'works' means 'operates on code points'.

On Thursday, 2 June 2016 at 20:13:14 UTC, Andrei Alexandrescu wrote: > On 06/02/2016 03:34 PM, tsbockman wrote: >> [...] > > They do work per spec: find this code point. It would be surprising if 'ö' were found but the string were positioned at a different code point. > >> [...] > > Well there's gotta be a reason why my basic comprehension is under constant scrutiny whereas yours is safe. > >> [...] > > Would be happy to. To whom would I delegate? > > > Andrei If there were to be a unicode lieutenant, Dmitry seems to be the obvious choice (if he's interested).

On Thursday, 2 June 2016 at 20:13:52 UTC, Andrei Alexandrescu wrote: > On 06/02/2016 03:34 PM, deadalnix wrote: >> On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote: >>> Pretty much everything. Consider s and s1 string variables with >>> possibly different encodings (UTF8/UTF16). >>> >>> * s.all!(c => c == 'ö') works only with autodecoding. It returns >>> always false without. >>> >> >> False. > > True. "Are all code points equal to this one?" -- Andrei A:“We should decode to code points” B:“No, decoding to code points is a stupid idea.” A:“No it's not!” B:“Can you show a concrete example where it does something useful?” A:“Sure, look at that!” B:“This isn't working at all, look at all those counter-examples!” A:“It may not work for your examples but look how easy it is to find code points!” *Sigh*

On 06/02/2016 10:13 PM, Andrei Alexandrescu wrote: > They do work per spec: find this code point. It would be surprising if > 'ö' were found but the string were positioned at a different code point. The "spec" here is how the range primitives for narrow strings are defined, right? I.e., the spec says auto-decode code units to code points. The discussion is about whether the spec is good or bad. No one is arguing that there are bugs in the decoding to code points. People are arguing that auto-decoding to code points is not useful.

On Thursday, 2 June 2016 at 20:01:54 UTC, Timon Gehr wrote: > > Doesn't work. Shouldn't compile. (char and wchar shouldn't be comparable.) > In Andrei's original post, he says that s is a string variable. He doesn't say it's a char. I find the weirder thing to be that t below is false, per deadalnix's point. import std.algorithm : all; import std.stdio : writeln; void main() { string s = "ö"; auto t = s.all!(c => c == 'ö'); writeln(t); //prints false } I could imagine getting frustrated that something like the code below throws errors. import std.algorithm : all; import std.stdio : writeln; void main() { import std.uni : byGrapheme; string s = "ö"; auto s2 = s.byGrapheme; auto t2 = s2.all!(c => c == 'ö'); writeln(t2); }

On 06/02/2016 04:01 PM, Timon Gehr wrote: > Doesn't work. Shouldn't compile. (char and wchar shouldn't be comparable.) That would be another language design option, which we don't have the luxury to explore. -- Andrei

On 06/02/2016 04:01 PM, Timon Gehr wrote: > assert("ö".all!(c => c == 'ö')); // fails As expected. Different code units for different folks. That's a different matter than walking blindly through code units. -- Andrei

Forums