June 02, 2016
On 06/02/2016 04:01 PM, Timon Gehr wrote:
> Basically all of those still don't work with UTF-32 (assuming your goal
> is to operate on characters).

The goal is to operate on code units. -- Andrei
June 02, 2016
On 6/2/2016 12:34 PM, deadalnix wrote:
> On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote:
>> Pretty much everything. Consider s and s1 string variables with possibly
>> different encodings (UTF8/UTF16).
>>
>> * s.all!(c => c == 'ö') works only with autodecoding. It returns always false
>> without.
>>
>
> False. Many characters can be represented by different sequences of codepoints.
> For instance, ê can be ê as one codepoint or ^ as a modifier followed by e. ö is
> one such character.

There are 3 levels of Unicode support. What Andrei is talking about is Level 1.

http://unicode.org/reports/tr18/tr18-5.1.html

I wonder what rationale there is for Unicode to have two different sequences of codepoints be treated as the same. It's madness.
June 02, 2016
On 06/02/2016 04:07 PM, Walter Bright wrote:
> On 6/2/2016 12:05 PM, Andrei Alexandrescu wrote:
>> * s.all!(c => c == 'ö') works only with autodecoding. It returns
>> always false
>> without.
>
> The o is inferred as a wchar. The lamda then is inferred to return a
> wchar.

The lambda returns bool. -- Andrei

June 02, 2016
On 06/02/2016 04:12 PM, Timon Gehr wrote:
> It is not meaningful to compare utf-8 and utf-16 code units directly.

But it is meaningful to compare Unicode code points. -- Andrei

June 02, 2016
On 06/02/2016 04:17 PM, Timon Gehr wrote:
> I.e. you are saying that 'works' means 'operates on code points'.

Affirmative. -- Andrei
June 02, 2016
On 06/02/2016 04:22 PM, cym13 wrote:
>
> A:“We should decode to code points”
> B:“No, decoding to code points is a stupid idea.”
> A:“No it's not!”
> B:“Can you show a concrete example where it does something useful?”
> A:“Sure, look at that!”
> B:“This isn't working at all, look at all those counter-examples!”
> A:“It may not work for your examples but look how easy it is to
>     find code points!”

With autodecoding all of std.algorithm operates correctly on code points. Without it all it does for strings is gibberish. -- Andrei
June 02, 2016
On 06/02/2016 04:23 PM, ag0aep6g wrote:
> People are arguing that auto-decoding to code points is not useful.

And want to return to the point where char[] is but an indiscriminated array, which would take std.algorithm back to the stone age. -- Andrei

June 02, 2016
On 06/02/2016 04:26 PM, Andrei Alexandrescu wrote:
> On 06/02/2016 04:01 PM, Timon Gehr wrote:
>> Basically all of those still don't work with UTF-32 (assuming your goal
>> is to operate on characters).
>
> The goal is to operate on code units. -- Andrei

s/units/points/
June 02, 2016
On 06/02/2016 04:27 PM, Walter Bright wrote:
> On 6/2/2016 12:34 PM, deadalnix wrote:
>> On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote:
>>> Pretty much everything. Consider s and s1 string variables with possibly
>>> different encodings (UTF8/UTF16).
>>>
>>> * s.all!(c => c == 'ö') works only with autodecoding. It returns
>>> always false
>>> without.
>>>
>>
>> False. Many characters can be represented by different sequences of
>> codepoints.
>> For instance, ê can be ê as one codepoint or ^ as a modifier followed
>> by e. ö is
>> one such character.
>
> There are 3 levels of Unicode support. What Andrei is talking about is
> Level 1.
>
> http://unicode.org/reports/tr18/tr18-5.1.html

Apparently I'm not the only idiot. -- Andrei

June 02, 2016
On 6/2/2016 8:50 AM, Kagamin wrote:
> It outright deprecated popFront - that's not the first step in the migration.

That's right. It's going about things backwards.

The first step is to adjust Phobos implementations and documentation so they do not rely on autodecoding.

This will take some time and care, particularly with algorithms that support mixed codeunit argument types. (Or perhaps mixed codeunit argument types can be deprecated.)

This is not so simple, as they have to be dealt with one by one.