May 20, 2016 Re: D's Auto Decoding and You | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jack Stouffer | On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote: Related discussion https://trello.com/c/4XmFdcp6/163-rediscuss-redundant-utf-8-string-validation. |
June 02, 2016 Re: D's Auto Decoding and You | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jack Stouffer | On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
>
> If you think there should be any more information included in the article, please let me know so I can add it.
I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7.
import std.algorithm : canFind;
void main()
{
string s = "cassé";
assert(s.canFind!(x => x == 'é'));
}
|
June 02, 2016 Re: D's Auto Decoding and You | ||||
---|---|---|---|---|
| ||||
Posted in reply to jmh530 | On 6/2/16 5:21 PM, jmh530 wrote:
> On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
>>
>> If you think there should be any more information included in the
>> article, please let me know so I can add it.
>
> I was a little confused by something in the main autodecoding thread, so
> I read your article again. Unfortunately, I don't think my confusion is
> resolved. I was trying one of your examples (full code I used below).
> You claim it works, but I keep getting assertion failures. I'm just
> running it with rdmd on Windows 7.
>
>
> import std.algorithm : canFind;
>
> void main()
> {
> string s = "cassé";
>
> assert(s.canFind!(x => x == 'é'));
> }
If that é above is an e followed by a combining character, then you will get the error. This is because autodecoding does not auto normalize as well -- the code points have to match exactly.
-Steve
|
June 02, 2016 Re: D's Auto Decoding and You | ||||
---|---|---|---|---|
| ||||
Posted in reply to jmh530 | On Thursday, 2 June 2016 at 21:21:50 UTC, jmh530 wrote:
> I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7.
>
>
> import std.algorithm : canFind;
>
> void main()
> {
> string s = "cassé";
>
> assert(s.canFind!(x => x == 'é'));
> }
Your browser is turning the é in the string into two code points via normalization whereas it should be one. Try using \u00E9 instead.
|
June 02, 2016 Re: D's Auto Decoding and You | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 6/2/16 5:27 PM, Steven Schveighoffer wrote: > On 6/2/16 5:21 PM, jmh530 wrote: >> On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote: >>> >>> If you think there should be any more information included in the >>> article, please let me know so I can add it. >> >> I was a little confused by something in the main autodecoding thread, so >> I read your article again. Unfortunately, I don't think my confusion is >> resolved. I was trying one of your examples (full code I used below). >> You claim it works, but I keep getting assertion failures. I'm just >> running it with rdmd on Windows 7. >> >> >> import std.algorithm : canFind; >> >> void main() >> { >> string s = "cassé"; >> >> assert(s.canFind!(x => x == 'é')); >> } > > If that é above is an e followed by a combining character, then you will > get the error. This is because autodecoding does not auto normalize as > well -- the code points have to match exactly. > > -Steve Indeed. FWIW I just copied OP's code from Thunderbird into Chrome (on OSX) and it worked: https://dpaste.dzfl.pl/09b9188d87a5 Should I assume some normalization occurred on the way? Andrei |
June 03, 2016 Re: D's Auto Decoding and You | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jack Stouffer | On Thursday, 2 June 2016 at 21:31:39 UTC, Jack Stouffer wrote:
> On Thursday, 2 June 2016 at 21:21:50 UTC, jmh530 wrote:
>> I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7.
>>
>>
>> import std.algorithm : canFind;
>>
>> void main()
>> {
>> string s = "cassé";
>>
>> assert(s.canFind!(x => x == 'é'));
>> }
>
> Your browser is turning the é in the string into two code points via normalization whereas it should be one. Try using \u00E9 instead.
That doesn't cause an assert to fail, but when I do writeln('\u00E9') I get é. So there might still be something wonky going on. I looked up \u00E9 online and I don't think there's an error with that.
|
June 03, 2016 Re: D's Auto Decoding and You | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Thursday, 2 June 2016 at 21:33:02 UTC, Andrei Alexandrescu wrote:
>
> Should I assume some normalization occurred on the way?
>
I'm just looking over std.uni's section on normalization and realizing that I had basically no idea what it is or what's going on. The wikipedia page on unicode equivalence is a bit clearer.
I'm definitely nowhere near qualified to have an opinion on these issues.
|
June 03, 2016 Re: D's Auto Decoding and You | ||||
---|---|---|---|---|
| ||||
Posted in reply to jmh530 | On Fri, Jun 3, 2016 at 5:16 AM, jmh530 via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> wrote: > On Thursday, 2 June 2016 at 21:33:02 UTC, Andrei Alexandrescu wrote: >> >> >> Should I assume some normalization occurred on the way? >> > > I'm just looking over std.uni's section on normalization and realizing that I had basically no idea what it is or what's going on. The wikipedia page on unicode equivalence is a bit clearer. > > I'm definitely nowhere near qualified to have an opinion on these issues. This dpaste shows a couple of issues with combining chars in D. https://dpaste.dzfl.pl/4b006959c5c0 The compiler actually can't handle a combining character literal either. see line 10. R |
June 03, 2016 Re: D's Auto Decoding and You | ||||
---|---|---|---|---|
| ||||
Posted in reply to jmh530 | On Friday, 3 June 2016 at 03:16:33 UTC, jmh530 wrote: > I'm just looking over std.uni's section on normalization and realizing that I had basically no idea what it is or what's going on. The wikipedia page on unicode equivalence is a bit clearer. This might help a bit, as well: https://dpaste.dzfl.pl/2ffb22b02842 |
June 03, 2016 Re: D's Auto Decoding and You | ||||
---|---|---|---|---|
| ||||
Posted in reply to Rory McGuire | On Friday, 3 June 2016 at 06:37:59 UTC, Rory McGuire wrote:
> This dpaste shows a couple of issues with combining chars in D.
>
> https://dpaste.dzfl.pl/4b006959c5c0
>
> The compiler actually can't handle a combining character literal either. see line 10.
Your paste behaves as expected: the "character" types in D are defined as single Unicode code units. By definition, the NFD form of "é" is not a single code unit. You would need to use a Grapheme or [w|d]string for that.
(Of course, one might reasonably question how useful our built-in character types actually are compared to ubyte/ushort/uint.)
|
Copyright © 1999-2021 by the D Language Foundation