Switch ignores case (?) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » Switch ignores case (?)

Thread overview

Switch ignores case (?)
Nov 23, 2016 Chris
Nov 23, 2016 Steven Schveighoffer
Nov 23, 2016 Chris
Nov 23, 2016 Steven Schveighoffer
Nov 23, 2016 Chris
Nov 23, 2016 Jonathan M Davis
Nov 23, 2016 ketmar
Nov 23, 2016 Chris
Nov 23, 2016 ketmar
Nov 23, 2016 ketmar
Nov 23, 2016 Steven Schveighoffer
Nov 23, 2016 Steven Schveighoffer
Nov 23, 2016 ketmar
Nov 23, 2016 Steven Schveighoffer
Nov 23, 2016 ketmar
Nov 24, 2016 Chris
Nov 24, 2016 ketmar
Nov 24, 2016 Chris
Nov 24, 2016 Antonio Corbi
Nov 24, 2016 ketmar
Nov 25, 2016 Steven Schveighoffer
Nov 23, 2016 ketmar

November 23, 2016

Switch ignores case (?)

Posted by Chris

Chris

Only one of the two cases is considered. What am I doing wrong?

`
import std.array;
import std.conv;
import std.stdio;

void main()
{
  auto tokens = to!(dchar[][])(["D"d, "’"d, "Addario"d, "'"d]);
  // Or use this below:
  //~ dstring[] tokens = ["D"d, "’"d, "Addario"d, "'"d];
  while (!tokens.empty)
  {
    switch (tokens[0])
    {
      case "\u2019"d:
        writeln("Apostrophe smart " ~ tokens[0]);
        break;
      case "\u0027"d:
        writeln("Apostrophe straight " ~ tokens[0]);
        break;
      default:
        writeln("Other " ~ tokens[0]);
        break;
    }
    tokens = tokens[1..$];
  }
}
`
Prints:

Other D
Apostrophe smart ’
Other Addario
Other '

I would have expected:

Other D
Apostrophe smart ’
Other Addario
Apostrophe straight '  <== expected

November 23, 2016

Re: Switch ignores case (?)

Posted by Steven Schveighoffer
in reply to Chris

Steven Schveighoffer

Posted in reply to Chris

On 11/23/16 11:28 AM, Chris wrote:
> Only one of the two cases is considered. What am I doing wrong?
>
> `
> import std.array;
> import std.conv;
> import std.stdio;
>
> void main()
> {
>   auto tokens = to!(dchar[][])(["D"d, "’"d, "Addario"d, "'"d]);
>   // Or use this below:
>   //~ dstring[] tokens = ["D"d, "’"d, "Addario"d, "'"d];
>   while (!tokens.empty)
>   {
>     switch (tokens[0])
>     {
>       case "\u2019"d:
>         writeln("Apostrophe smart " ~ tokens[0]);
>         break;
>       case "\u0027"d:
>         writeln("Apostrophe straight " ~ tokens[0]);
>         break;
>       default:
>         writeln("Other " ~ tokens[0]);
>         break;
>     }
>     tokens = tokens[1..$];
>   }
> }
> `
> Prints:
>
> Other D
> Apostrophe smart ’
> Other Addario
> Other '
>
> I would have expected:
>
> Other D
> Apostrophe smart ’
> Other Addario
> Apostrophe straight '  <== expected

I tested this locally with different ideas. This definitely looks like a codegen bug.

I was able to reduce it to:

void main()
{
    switch("'"d)
    {
    case "'"d:
        writeln("a");
        break;
    case "’"d:
        writeln("b");
        break;
    default:
        writeln("default");
    }
}

prints "default"

What seems to fix it is removing the case statement for the "smart apostrophe". So I'd look there for where the bug is triggering.

-Steve

November 23, 2016

Re: Switch ignores case (?)

Posted by Chris
in reply to Steven Schveighoffer

Chris

Posted in reply to Steven Schveighoffer

On Wednesday, 23 November 2016 at 17:33:04 UTC, Steven Schveighoffer wrote:

>
> I tested this locally with different ideas. This definitely looks like a codegen bug.
>
> I was able to reduce it to:
>
> void main()
> {
>     switch("'"d)
>     {
>     case "'"d:
>         writeln("a");
>         break;
>     case "’"d:
>         writeln("b");
>         break;
>     default:
>         writeln("default");
>     }
> }
>
> prints "default"
>
> What seems to fix it is removing the case statement for the "smart apostrophe". So I'd look there for where the bug is triggering.
>
> -Steve

Yep, removing one of the two cases works. I tried it with different versions of dmd back to 2.070.0, and it always gives me the same (wrong) result. I haven't tried ldc yet.

November 23, 2016

Re: Switch ignores case (?)

Posted by Steven Schveighoffer
in reply to Chris

Steven Schveighoffer

Posted in reply to Chris

On 11/23/16 1:03 PM, Chris wrote:
> On Wednesday, 23 November 2016 at 17:33:04 UTC, Steven Schveighoffer wrote:
>
>>
>> I tested this locally with different ideas. This definitely looks like
>> a codegen bug.
>>
>> I was able to reduce it to:
>>
>> void main()
>> {
>>     switch("'"d)
>>     {
>>     case "'"d:
>>         writeln("a");
>>         break;
>>     case "’"d:
>>         writeln("b");
>>         break;
>>     default:
>>         writeln("default");
>>     }
>> }
>>
>> prints "default"
>>
>> What seems to fix it is removing the case statement for the "smart
>> apostrophe". So I'd look there for where the bug is triggering.
>>
>
> Yep, removing one of the two cases works. I tried it with different
> versions of dmd back to 2.070.0, and it always gives me the same (wrong)
> result. I haven't tried ldc yet.
>

Please file here: https://issues.dlang.org/enter_bug.cgi

I think this has been there forever. Happens in 2.040 too (the earliest dmd I have on my system).

-Steve

November 23, 2016

Re: Switch ignores case (?)

Posted by Chris
in reply to Steven Schveighoffer

Chris

Posted in reply to Steven Schveighoffer

On Wednesday, 23 November 2016 at 18:34:28 UTC, Steven Schveighoffer wrote:

>
> Please file here: https://issues.dlang.org/enter_bug.cgi
>
> I think this has been there forever. Happens in 2.040 too (the earliest dmd I have on my system).
>
> -Steve

Here it is:

https://issues.dlang.org/show_bug.cgi?id=16739

(I think I marked it as "P1" (default option), maybe it's "P2", didn't know what to choose)

It has something to do with the smart quote, e.g.:

import std.array;
import std.conv;
import std.stdio;

void main()
{
  auto tokens = to!(dchar[][])(["D"d, "’"d, "Addario"d, "'"d, "."d]);
  // Or use this below:
  //~ dstring[] tokens = ["D"d, "’"d, "Addario"d, "'"d];
  while (!tokens.empty)
  {
    switch (tokens[0])
    {
      case "\u2019"d:
        writeln("Apostrophe smart " ~ tokens[0]);
        break;
      case "\u0027"d:
        writeln("Apostrophe straight " ~ tokens[0]);
        break;
      case "D"d:
        writeln("Letter 'D'");
        break;
      case "."d:
        writeln("fullstop " ~ tokens[0]);
        break;
      default:
        writeln("Other " ~ tokens[0]);
        break;
    }
    tokens = tokens[1..$];
  }
}

prints:

Letter 'D'
Other ’  // <== not expected
Other Addario
Apostrophe straight '
fullstop .

Both LDC and DMD produce the same error. I discovered this while writing a tokenizer. It is a bit upsetting, because it is really an essential thing. The workaround for now would be

if (token[0] == "\u2019"d)
 goto Wherever;

which works, by the way.

November 23, 2016

Re: Switch ignores case (?)

Posted by Jonathan M Davis
in reply to Chris

Jonathan M Davis

Posted in reply to Chris

On Wednesday, November 23, 2016 19:07:49 Chris via Digitalmars-d-learn wrote:
> (I think I marked it as "P1" (default option), maybe it's "P2",
> didn't know what to choose)

AFAIK, there are no devs that pay any attention to those. Some attention is paid to regression vs enhancement vs etc. But the priority field really doesn't mean anything for how D bugs get fixed or who fixes what when.

- Jonathan M Davis

November 23, 2016

Re: Switch ignores case (?)

Posted by ketmar
in reply to Chris

ketmar

Posted in reply to Chris

On Wednesday, 23 November 2016 at 19:07:49 UTC, Chris wrote:
> It has something to do with the smart quote, e.g.:

it is wrong binary search in `_d_switch_string()`.

strings for switch are lexically sorted, and compiler calls `_d_switch_string()` to select one. the thing is that comparison in `_d_switch_string()` is done with `memcmp()`. still not clear? ok, let's see how cases are encoded:

body _d_switch_dstring()
       'U0027' (ca)
table[0] = 1, 'U0027'
table[1] = 1, 'U2019'

or, in memory:

table[0] = 1, 0x27, 0x00
table[1] = 1, 0x19, 0x20

so, memcmp for `table[1]` returns... 1! 'cause 0x27 is greater than 0x19. and binsearch is broken from here on. the same is true for `_d_switch_ustring()`, of course.

this can be fixed either by using slow char-by-char comparisons in druntime, or by fixing codegen, so it would sort strings as byte arrays.

November 23, 2016

Re: Switch ignores case (?)

Posted by Chris
in reply to ketmar

Chris

Posted in reply to ketmar

On Wednesday, 23 November 2016 at 19:30:01 UTC, ketmar wrote:
> On Wednesday, 23 November 2016 at 19:07:49 UTC, Chris wrote:
>> It has something to do with the smart quote, e.g.:
>
> it is wrong binary search in `_d_switch_string()`.
>
> strings for switch are lexically sorted, and compiler calls `_d_switch_string()` to select one. the thing is that comparison in `_d_switch_string()` is done with `memcmp()`. still not clear? ok, let's see how cases are encoded:
>
> body _d_switch_dstring()
>        'U0027' (ca)
> table[0] = 1, 'U0027'
> table[1] = 1, 'U2019'
>
> or, in memory:
>
> table[0] = 1, 0x27, 0x00
> table[1] = 1, 0x19, 0x20
>
> so, memcmp for `table[1]` returns... 1! 'cause 0x27 is greater than 0x19. and binsearch is broken from here on. the same is true for `_d_switch_ustring()`, of course.
>
> this can be fixed either by using slow char-by-char comparisons in druntime, or by fixing codegen, so it would sort strings as byte arrays.

Actually, I came across a compiler message that gave me something like \x19\x20 which I found odd. This sure needs fixing. After all, it's quite a basic feature. So it's back to the old `if` again (for now).

November 23, 2016

Re: Switch ignores case (?)

Posted by ketmar
in reply to Chris

ketmar

Posted in reply to Chris

On Wednesday, 23 November 2016 at 20:49:01 UTC, Chris wrote:
> Actually, I came across a compiler message that gave me something like \x19\x20 which I found odd. This sure needs fixing. After all, it's quite a basic feature. So it's back to the old `if` again (for now).

yeah, until this is fixed, `switch` over wstring/dstring can be considered completely broken, and better be avoided. `switch` over normal string is unaffected, of course.

November 23, 2016

Re: Switch ignores case (?)

Posted by ketmar
in reply to Chris

ketmar

Posted in reply to Chris

quickfix:


diff --git a/src/rt/switch_.d b/src/rt/switch_.d
index ec124e3..83572fe 100644
--- a/src/rt/switch_.d
+++ b/src/rt/switch_.d
@@ -27,6 +27,32 @@ private import core.stdc.string;

 extern (C):

+import core.stdc.wchar_ : wchar_t, wmemcmp;
+
+
+static if (wchar_t.sizeof == wchar.sizeof) {
+  alias memcmpw = wmemcmp;
+  int memcmpd (const(void)* s0, const(void)* s1, size_t len) {
+    while (len--) {
+      if (int r = *cast(const(int)*)s0-*cast(const(int)*)s1) return (r < 0 ? -1 : 1);
+      s0 += dchar.sizeof;
+      s1 += dchar.sizeof;
+    }
+    return 0;
+  }
+} else static if (wchar_t.sizeof == dchar.sizeof) {
+  int memcmpw (const(void)* s0, const(void)* s1, size_t len) {
+    while (len--) {
+      if (int r = *cast(const(ushort)*)s0-*cast(const(ushort)*)s1) return (r < 0 ? -1 : 1);
+      s0 += wchar.sizeof;
+      s1 += wchar.sizeof;
+    }
+    return 0;
+  }
+  alias memcmpd = wmemcmp;
+} else static assert(0, "wut?!");
+
+
 int _d_switch_string(char[][] table, char[] ca)
 in
 {
@@ -189,7 +215,7 @@ in
         {
             int c;

-            c = memcmp(table[j - 1].ptr, table[j].ptr, len1 * wchar.sizeof);
+            c = memcmpw(table[j - 1].ptr, table[j].ptr, len1);
             assert(c < 0);  // c==0 means a duplicate
         }
     }
@@ -205,7 +231,7 @@ out (result)
         for (auto i = 0u; i < table.length; i++)
         {
             if (table[i].length == ca.length)
-            {   c = memcmp(table[i].ptr, ca.ptr, ca.length * wchar.sizeof);
+            {   c = memcmpw(table[i].ptr, ca.ptr, ca.length);
                 assert(c != 0);
             }
         }
@@ -218,7 +244,7 @@ out (result)
             assert(i < table.length);
             if (table[i].length == ca.length)
             {
-                c = memcmp(table[i].ptr, ca.ptr, ca.length * wchar.sizeof);
+                c = memcmpw(table[i].ptr, ca.ptr, ca.length);
                 if (c == 0)
                 {
                     assert(i == result);
@@ -253,7 +279,7 @@ body
         auto c = cast(sizediff_t)(ca.length - pca.length);
         if (c == 0)
         {
-            c = memcmp(ca.ptr, pca.ptr, ca.length * wchar.sizeof);
+            c = memcmpw(ca.ptr, pca.ptr, ca.length);
             if (c == 0)
             {   //printf("found %d\n", mid);
                 return cast(int)mid;
@@ -317,7 +343,7 @@ in
         assert(len1 <= len2);
         if (len1 == len2)
         {
-            auto c = memcmp(table[j - 1].ptr, table[j].ptr, len1 * dchar.sizeof);
+            auto c = memcmpd(table[j - 1].ptr, table[j].ptr, len1);
             assert(c < 0);  // c==0 means a duplicate
         }
     }
@@ -331,7 +357,7 @@ out (result)
         for (auto i = 0u; i < table.length; i++)
         {
             if (table[i].length == ca.length)
-            {   auto c = memcmp(table[i].ptr, ca.ptr, ca.length * dchar.sizeof);
+            {   auto c = memcmpd(table[i].ptr, ca.ptr, ca.length);
                 assert(c != 0);
             }
         }
@@ -344,7 +370,7 @@ out (result)
             assert(i < table.length);
             if (table[i].length == ca.length)
             {
-                auto c = memcmp(table[i].ptr, ca.ptr, ca.length * dchar.sizeof);
+                auto c = memcmpd(table[i].ptr, ca.ptr, ca.length);
                 if (c == 0)
                 {
                     assert(i == result);
@@ -379,7 +405,7 @@ body
         auto c = cast(sizediff_t)(ca.length - pca.length);
         if (c == 0)
         {
-            c = memcmp(ca.ptr, pca.ptr, ca.length * dchar.sizeof);
+            c = memcmpd(ca.ptr, pca.ptr, ca.length);
             if (c == 0)
             {   //printf("found %d\n", mid);
                 return cast(int)mid;
--
2.9.2

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation