Thread overview | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
November 23, 2016 Switch ignores case (?) | ||||
---|---|---|---|---|
| ||||
Only one of the two cases is considered. What am I doing wrong? ` import std.array; import std.conv; import std.stdio; void main() { auto tokens = to!(dchar[][])(["D"d, "’"d, "Addario"d, "'"d]); // Or use this below: //~ dstring[] tokens = ["D"d, "’"d, "Addario"d, "'"d]; while (!tokens.empty) { switch (tokens[0]) { case "\u2019"d: writeln("Apostrophe smart " ~ tokens[0]); break; case "\u0027"d: writeln("Apostrophe straight " ~ tokens[0]); break; default: writeln("Other " ~ tokens[0]); break; } tokens = tokens[1..$]; } } ` Prints: Other D Apostrophe smart ’ Other Addario Other ' I would have expected: Other D Apostrophe smart ’ Other Addario Apostrophe straight ' <== expected |
November 23, 2016 Re: Switch ignores case (?) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On 11/23/16 11:28 AM, Chris wrote:
> Only one of the two cases is considered. What am I doing wrong?
>
> `
> import std.array;
> import std.conv;
> import std.stdio;
>
> void main()
> {
> auto tokens = to!(dchar[][])(["D"d, "’"d, "Addario"d, "'"d]);
> // Or use this below:
> //~ dstring[] tokens = ["D"d, "’"d, "Addario"d, "'"d];
> while (!tokens.empty)
> {
> switch (tokens[0])
> {
> case "\u2019"d:
> writeln("Apostrophe smart " ~ tokens[0]);
> break;
> case "\u0027"d:
> writeln("Apostrophe straight " ~ tokens[0]);
> break;
> default:
> writeln("Other " ~ tokens[0]);
> break;
> }
> tokens = tokens[1..$];
> }
> }
> `
> Prints:
>
> Other D
> Apostrophe smart ’
> Other Addario
> Other '
>
> I would have expected:
>
> Other D
> Apostrophe smart ’
> Other Addario
> Apostrophe straight ' <== expected
I tested this locally with different ideas. This definitely looks like a codegen bug.
I was able to reduce it to:
void main()
{
switch("'"d)
{
case "'"d:
writeln("a");
break;
case "’"d:
writeln("b");
break;
default:
writeln("default");
}
}
prints "default"
What seems to fix it is removing the case statement for the "smart apostrophe". So I'd look there for where the bug is triggering.
-Steve
|
November 23, 2016 Re: Switch ignores case (?) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Wednesday, 23 November 2016 at 17:33:04 UTC, Steven Schveighoffer wrote:
>
> I tested this locally with different ideas. This definitely looks like a codegen bug.
>
> I was able to reduce it to:
>
> void main()
> {
> switch("'"d)
> {
> case "'"d:
> writeln("a");
> break;
> case "’"d:
> writeln("b");
> break;
> default:
> writeln("default");
> }
> }
>
> prints "default"
>
> What seems to fix it is removing the case statement for the "smart apostrophe". So I'd look there for where the bug is triggering.
>
> -Steve
Yep, removing one of the two cases works. I tried it with different versions of dmd back to 2.070.0, and it always gives me the same (wrong) result. I haven't tried ldc yet.
|
November 23, 2016 Re: Switch ignores case (?) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On 11/23/16 1:03 PM, Chris wrote: > On Wednesday, 23 November 2016 at 17:33:04 UTC, Steven Schveighoffer wrote: > >> >> I tested this locally with different ideas. This definitely looks like >> a codegen bug. >> >> I was able to reduce it to: >> >> void main() >> { >> switch("'"d) >> { >> case "'"d: >> writeln("a"); >> break; >> case "’"d: >> writeln("b"); >> break; >> default: >> writeln("default"); >> } >> } >> >> prints "default" >> >> What seems to fix it is removing the case statement for the "smart >> apostrophe". So I'd look there for where the bug is triggering. >> > > Yep, removing one of the two cases works. I tried it with different > versions of dmd back to 2.070.0, and it always gives me the same (wrong) > result. I haven't tried ldc yet. > Please file here: https://issues.dlang.org/enter_bug.cgi I think this has been there forever. Happens in 2.040 too (the earliest dmd I have on my system). -Steve |
November 23, 2016 Re: Switch ignores case (?) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Wednesday, 23 November 2016 at 18:34:28 UTC, Steven Schveighoffer wrote: > > Please file here: https://issues.dlang.org/enter_bug.cgi > > I think this has been there forever. Happens in 2.040 too (the earliest dmd I have on my system). > > -Steve Here it is: https://issues.dlang.org/show_bug.cgi?id=16739 (I think I marked it as "P1" (default option), maybe it's "P2", didn't know what to choose) It has something to do with the smart quote, e.g.: import std.array; import std.conv; import std.stdio; void main() { auto tokens = to!(dchar[][])(["D"d, "’"d, "Addario"d, "'"d, "."d]); // Or use this below: //~ dstring[] tokens = ["D"d, "’"d, "Addario"d, "'"d]; while (!tokens.empty) { switch (tokens[0]) { case "\u2019"d: writeln("Apostrophe smart " ~ tokens[0]); break; case "\u0027"d: writeln("Apostrophe straight " ~ tokens[0]); break; case "D"d: writeln("Letter 'D'"); break; case "."d: writeln("fullstop " ~ tokens[0]); break; default: writeln("Other " ~ tokens[0]); break; } tokens = tokens[1..$]; } } prints: Letter 'D' Other ’ // <== not expected Other Addario Apostrophe straight ' fullstop . Both LDC and DMD produce the same error. I discovered this while writing a tokenizer. It is a bit upsetting, because it is really an essential thing. The workaround for now would be if (token[0] == "\u2019"d) goto Wherever; which works, by the way. |
November 23, 2016 Re: Switch ignores case (?) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Wednesday, November 23, 2016 19:07:49 Chris via Digitalmars-d-learn wrote:
> (I think I marked it as "P1" (default option), maybe it's "P2",
> didn't know what to choose)
AFAIK, there are no devs that pay any attention to those. Some attention is paid to regression vs enhancement vs etc. But the priority field really doesn't mean anything for how D bugs get fixed or who fixes what when.
- Jonathan M Davis
|
November 23, 2016 Re: Switch ignores case (?) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Wednesday, 23 November 2016 at 19:07:49 UTC, Chris wrote:
> It has something to do with the smart quote, e.g.:
it is wrong binary search in `_d_switch_string()`.
strings for switch are lexically sorted, and compiler calls `_d_switch_string()` to select one. the thing is that comparison in `_d_switch_string()` is done with `memcmp()`. still not clear? ok, let's see how cases are encoded:
body _d_switch_dstring()
'U0027' (ca)
table[0] = 1, 'U0027'
table[1] = 1, 'U2019'
or, in memory:
table[0] = 1, 0x27, 0x00
table[1] = 1, 0x19, 0x20
so, memcmp for `table[1]` returns... 1! 'cause 0x27 is greater than 0x19. and binsearch is broken from here on. the same is true for `_d_switch_ustring()`, of course.
this can be fixed either by using slow char-by-char comparisons in druntime, or by fixing codegen, so it would sort strings as byte arrays.
|
November 23, 2016 Re: Switch ignores case (?) | ||||
---|---|---|---|---|
| ||||
Posted in reply to ketmar | On Wednesday, 23 November 2016 at 19:30:01 UTC, ketmar wrote:
> On Wednesday, 23 November 2016 at 19:07:49 UTC, Chris wrote:
>> It has something to do with the smart quote, e.g.:
>
> it is wrong binary search in `_d_switch_string()`.
>
> strings for switch are lexically sorted, and compiler calls `_d_switch_string()` to select one. the thing is that comparison in `_d_switch_string()` is done with `memcmp()`. still not clear? ok, let's see how cases are encoded:
>
> body _d_switch_dstring()
> 'U0027' (ca)
> table[0] = 1, 'U0027'
> table[1] = 1, 'U2019'
>
> or, in memory:
>
> table[0] = 1, 0x27, 0x00
> table[1] = 1, 0x19, 0x20
>
> so, memcmp for `table[1]` returns... 1! 'cause 0x27 is greater than 0x19. and binsearch is broken from here on. the same is true for `_d_switch_ustring()`, of course.
>
> this can be fixed either by using slow char-by-char comparisons in druntime, or by fixing codegen, so it would sort strings as byte arrays.
Actually, I came across a compiler message that gave me something like \x19\x20 which I found odd. This sure needs fixing. After all, it's quite a basic feature. So it's back to the old `if` again (for now).
|
November 23, 2016 Re: Switch ignores case (?) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Wednesday, 23 November 2016 at 20:49:01 UTC, Chris wrote:
> Actually, I came across a compiler message that gave me something like \x19\x20 which I found odd. This sure needs fixing. After all, it's quite a basic feature. So it's back to the old `if` again (for now).
yeah, until this is fixed, `switch` over wstring/dstring can be considered completely broken, and better be avoided. `switch` over normal string is unaffected, of course.
|
November 23, 2016 Re: Switch ignores case (?) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | quickfix: diff --git a/src/rt/switch_.d b/src/rt/switch_.d index ec124e3..83572fe 100644 --- a/src/rt/switch_.d +++ b/src/rt/switch_.d @@ -27,6 +27,32 @@ private import core.stdc.string; extern (C): +import core.stdc.wchar_ : wchar_t, wmemcmp; + + +static if (wchar_t.sizeof == wchar.sizeof) { + alias memcmpw = wmemcmp; + int memcmpd (const(void)* s0, const(void)* s1, size_t len) { + while (len--) { + if (int r = *cast(const(int)*)s0-*cast(const(int)*)s1) return (r < 0 ? -1 : 1); + s0 += dchar.sizeof; + s1 += dchar.sizeof; + } + return 0; + } +} else static if (wchar_t.sizeof == dchar.sizeof) { + int memcmpw (const(void)* s0, const(void)* s1, size_t len) { + while (len--) { + if (int r = *cast(const(ushort)*)s0-*cast(const(ushort)*)s1) return (r < 0 ? -1 : 1); + s0 += wchar.sizeof; + s1 += wchar.sizeof; + } + return 0; + } + alias memcmpd = wmemcmp; +} else static assert(0, "wut?!"); + + int _d_switch_string(char[][] table, char[] ca) in { @@ -189,7 +215,7 @@ in { int c; - c = memcmp(table[j - 1].ptr, table[j].ptr, len1 * wchar.sizeof); + c = memcmpw(table[j - 1].ptr, table[j].ptr, len1); assert(c < 0); // c==0 means a duplicate } } @@ -205,7 +231,7 @@ out (result) for (auto i = 0u; i < table.length; i++) { if (table[i].length == ca.length) - { c = memcmp(table[i].ptr, ca.ptr, ca.length * wchar.sizeof); + { c = memcmpw(table[i].ptr, ca.ptr, ca.length); assert(c != 0); } } @@ -218,7 +244,7 @@ out (result) assert(i < table.length); if (table[i].length == ca.length) { - c = memcmp(table[i].ptr, ca.ptr, ca.length * wchar.sizeof); + c = memcmpw(table[i].ptr, ca.ptr, ca.length); if (c == 0) { assert(i == result); @@ -253,7 +279,7 @@ body auto c = cast(sizediff_t)(ca.length - pca.length); if (c == 0) { - c = memcmp(ca.ptr, pca.ptr, ca.length * wchar.sizeof); + c = memcmpw(ca.ptr, pca.ptr, ca.length); if (c == 0) { //printf("found %d\n", mid); return cast(int)mid; @@ -317,7 +343,7 @@ in assert(len1 <= len2); if (len1 == len2) { - auto c = memcmp(table[j - 1].ptr, table[j].ptr, len1 * dchar.sizeof); + auto c = memcmpd(table[j - 1].ptr, table[j].ptr, len1); assert(c < 0); // c==0 means a duplicate } } @@ -331,7 +357,7 @@ out (result) for (auto i = 0u; i < table.length; i++) { if (table[i].length == ca.length) - { auto c = memcmp(table[i].ptr, ca.ptr, ca.length * dchar.sizeof); + { auto c = memcmpd(table[i].ptr, ca.ptr, ca.length); assert(c != 0); } } @@ -344,7 +370,7 @@ out (result) assert(i < table.length); if (table[i].length == ca.length) { - auto c = memcmp(table[i].ptr, ca.ptr, ca.length * dchar.sizeof); + auto c = memcmpd(table[i].ptr, ca.ptr, ca.length); if (c == 0) { assert(i == result); @@ -379,7 +405,7 @@ body auto c = cast(sizediff_t)(ca.length - pca.length); if (c == 0) { - c = memcmp(ca.ptr, pca.ptr, ca.length * dchar.sizeof); + c = memcmpd(ca.ptr, pca.ptr, ca.length); if (c == 0) { //printf("found %d\n", mid); return cast(int)mid; -- 2.9.2 |
Copyright © 1999-2021 by the D Language Foundation