The Case Against Autodecode (page 33) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » The Case Against Autodecode (page 33)

June 02, 2016

Re: The Case Against Autodecode

Posted by Andrei Alexandrescu
in reply to Timon Gehr

Andrei Alexandrescu

Posted in reply to Timon Gehr

On 06/02/2016 04:01 PM, Timon Gehr wrote:
> Basically all of those still don't work with UTF-32 (assuming your goal
> is to operate on characters).

The goal is to operate on code units. -- Andrei

June 02, 2016

Re: The Case Against Autodecode

Posted by Walter Bright
in reply to deadalnix

Walter Bright

Posted in reply to deadalnix

On 6/2/2016 12:34 PM, deadalnix wrote:
> On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote:
>> Pretty much everything. Consider s and s1 string variables with possibly
>> different encodings (UTF8/UTF16).
>>
>> * s.all!(c => c == 'ö') works only with autodecoding. It returns always false
>> without.
>>
>
> False. Many characters can be represented by different sequences of codepoints.
> For instance, ê can be ê as one codepoint or ^ as a modifier followed by e. ö is
> one such character.

There are 3 levels of Unicode support. What Andrei is talking about is Level 1.

http://unicode.org/reports/tr18/tr18-5.1.html

I wonder what rationale there is for Unicode to have two different sequences of codepoints be treated as the same. It's madness.

June 02, 2016

Re: The Case Against Autodecode

Posted by Andrei Alexandrescu
in reply to Walter Bright

Andrei Alexandrescu

Posted in reply to Walter Bright

On 06/02/2016 04:07 PM, Walter Bright wrote:
> On 6/2/2016 12:05 PM, Andrei Alexandrescu wrote:
>> * s.all!(c => c == 'ö') works only with autodecoding. It returns
>> always false
>> without.
>
> The o is inferred as a wchar. The lamda then is inferred to return a
> wchar.

The lambda returns bool. -- Andrei

June 02, 2016

Re: The Case Against Autodecode

Posted by Andrei Alexandrescu
in reply to Timon Gehr

Andrei Alexandrescu

Posted in reply to Timon Gehr

On 06/02/2016 04:12 PM, Timon Gehr wrote:
> It is not meaningful to compare utf-8 and utf-16 code units directly.

But it is meaningful to compare Unicode code points. -- Andrei

June 02, 2016

Re: The Case Against Autodecode

Posted by Andrei Alexandrescu
in reply to Timon Gehr

Andrei Alexandrescu

Posted in reply to Timon Gehr

On 06/02/2016 04:17 PM, Timon Gehr wrote:
> I.e. you are saying that 'works' means 'operates on code points'.

Affirmative. -- Andrei

June 02, 2016

Re: The Case Against Autodecode

Posted by Andrei Alexandrescu
in reply to cym13

Andrei Alexandrescu

Posted in reply to cym13

On 06/02/2016 04:22 PM, cym13 wrote:
>
> A:“We should decode to code points”
> B:“No, decoding to code points is a stupid idea.”
> A:“No it's not!”
> B:“Can you show a concrete example where it does something useful?”
> A:“Sure, look at that!”
> B:“This isn't working at all, look at all those counter-examples!”
> A:“It may not work for your examples but look how easy it is to
>     find code points!”

With autodecoding all of std.algorithm operates correctly on code points. Without it all it does for strings is gibberish. -- Andrei

June 02, 2016

Re: The Case Against Autodecode

Posted by Andrei Alexandrescu
in reply to ag0aep6g

Andrei Alexandrescu

Posted in reply to ag0aep6g

On 06/02/2016 04:23 PM, ag0aep6g wrote:
> People are arguing that auto-decoding to code points is not useful.

And want to return to the point where char[] is but an indiscriminated array, which would take std.algorithm back to the stone age. -- Andrei

June 02, 2016

Re: The Case Against Autodecode

Posted by Andrei Alexandrescu
in reply to Andrei Alexandrescu

Andrei Alexandrescu

Posted in reply to Andrei Alexandrescu

On 06/02/2016 04:26 PM, Andrei Alexandrescu wrote:
> On 06/02/2016 04:01 PM, Timon Gehr wrote:
>> Basically all of those still don't work with UTF-32 (assuming your goal
>> is to operate on characters).
>
> The goal is to operate on code units. -- Andrei

s/units/points/

June 02, 2016

Re: The Case Against Autodecode

Posted by Andrei Alexandrescu
in reply to Walter Bright

Andrei Alexandrescu

Posted in reply to Walter Bright

On 06/02/2016 04:27 PM, Walter Bright wrote:
> On 6/2/2016 12:34 PM, deadalnix wrote:
>> On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote:
>>> Pretty much everything. Consider s and s1 string variables with possibly
>>> different encodings (UTF8/UTF16).
>>>
>>> * s.all!(c => c == 'ö') works only with autodecoding. It returns
>>> always false
>>> without.
>>>
>>
>> False. Many characters can be represented by different sequences of
>> codepoints.
>> For instance, ê can be ê as one codepoint or ^ as a modifier followed
>> by e. ö is
>> one such character.
>
> There are 3 levels of Unicode support. What Andrei is talking about is
> Level 1.
>
> http://unicode.org/reports/tr18/tr18-5.1.html

Apparently I'm not the only idiot. -- Andrei

June 02, 2016

Re: The Case Against Autodecode

Posted by Walter Bright
in reply to Kagamin

Walter Bright

Posted in reply to Kagamin

On 6/2/2016 8:50 AM, Kagamin wrote:
> It outright deprecated popFront - that's not the first step in the migration.

That's right. It's going about things backwards.

The first step is to adjust Phobos implementations and documentation so they do not rely on autodecoding.

This will take some time and care, particularly with algorithms that support mixed codeunit argument types. (Or perhaps mixed codeunit argument types can be deprecated.)

This is not so simple, as they have to be dealt with one by one.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation