Thread overview
Auto-casting in range based functions?
May 13, 2012
Andrew Stanton
May 13, 2012
Artur Skawina
May 13, 2012
Jonathan M Davis
May 13, 2012
I have been playing around with D as a scripting tool and have been running into the following issue:

-----------------------------------
import std.algorithm;

struct Delim {
    char delim;
    this(char d) {
        delim = d;
    }
}

void main() {
    char[] d = ['a', 'b', 'c'];
    auto delims = map!Delim(d);
}

/*
Compiling gives me the following error:
/usr/include/d/dmd/phobos/std/algorithm.d(382): Error: constructor test.Delim.this (char d) is not callable using argument types (dchar)
/usr/include/d/dmd/phobos/std/algorithm.d(382): Error: cannot implicitly convert expression ('\U0000ffff') of type dchar to char

*/

-----------------------------------

As someone who most of the time doesn't need to handle unicode, is there a way I can convince these functions to not upcast char to dchar?  I can't think of a way to make the code more explicit in its typing.
May 13, 2012
On 05/13/12 19:49, Andrew Stanton wrote:
> I have been playing around with D as a scripting tool and have been running into the following issue:
> 
> -----------------------------------
> import std.algorithm;
> 
> struct Delim {
>     char delim;
>     this(char d) {
>         delim = d;
>     }
> }
> 
> void main() {
>     char[] d = ['a', 'b', 'c'];
>     auto delims = map!Delim(d);
> }
> 
> /*
> Compiling gives me the following error:
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error: constructor test.Delim.this (char d) is not callable using argument types (dchar)
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error: cannot implicitly convert expression ('\U0000ffff') of type dchar to char
> 
> */
> 
> -----------------------------------
> 
> As someone who most of the time doesn't need to handle unicode, is there a way I can convince these functions to not upcast char to dchar?  I can't think of a way to make the code more explicit in its typing.

Well, if you don't want/need utf8 at all:

   alias ubyte ascii;

   int main() {
       ascii[] d = ['a', 'b', 'c'];
       auto delims = map!Delim(d);
       //...

and if you want to avoid utf8 just for this case (ie you "know" 'd[]' contains just ascii) something like this should work:

    char[] d = ['a', 'b', 'c'];
    auto delims = map!((c){assert(c<128); return Delim(cast(char)c);})(d);

(it's probably more efficient when written as

    auto delims = map!Delim(cast(ascii[])d);

but you loose the safety checks)

artur
May 13, 2012
On Sunday, May 13, 2012 19:49:00 Andrew Stanton wrote:
> I have been playing around with D as a scripting tool and have been running into the following issue:
> 
> -----------------------------------
> import std.algorithm;
> 
> struct Delim {
>      char delim;
>      this(char d) {
>          delim = d;
>      }
> }
> 
> void main() {
>      char[] d = ['a', 'b', 'c'];
>      auto delims = map!Delim(d);
> }
> 
> /*
> Compiling gives me the following error:
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error:
> constructor test.Delim.this (char d) is not callable using
> argument types (dchar)
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error: cannot
> implicitly convert expression ('\U0000ffff') of type dchar to char
> 
> */
> 
> -----------------------------------
> 
> As someone who most of the time doesn't need to handle unicode, is there a way I can convince these functions to not upcast char to dchar?  I can't think of a way to make the code more explicit in its typing.

_All_ string types are considered ranges of dchar and treated as such. That means that narrow strings (e.g. arrays of char or wchar) are not random-access ranges and have no length property as far as range-based functions are concerned. So, you can _never_ have char[] treated as a range of char by any Phobos functions. char[] is UTF-8 by definition, and range-based functions in Phobos operates on code points, not code units.

If you want a char[] to be treated as a range of char, then you're going to have to use ubyte[] instead. e.g.

char[] d = ['a', 'b', 'c'];
auto delims = map!Delim(cast(ubyte[])d);

Now, personally, I would argue that you should just use dchar, not char, because regadless of what you are or aren't doing with unicode right now, the odds are that you'll end up processing unicode at some point, and if you're in the habit of using char, you're going to get all kinds of bugs. So, if you just did

struct Delim
{
    dchar delim;

    this(dchar d)
    {
        delim = d;
    }
}

void main()
{
    char[] d = ['a', 'b', 'c'];
    auto delims = map!Delim(d);
}

then it should work just fine. And if you really need a char instead of dchar for some reason, you can always just use std.conv.to - to!char(value) - which will then throw if you're trying to convert a code point that won't fit in a char.

In general, any code which has a variable of char or wchar as a variable rather than an element in an array is a red flag which indicates a likely bug or bad design. In specific circumstances, you may need to do so, but in general, it's just asking for bugs. And you're going to have to be fighting Phobos all the time if you try and use ranges of code units rather than ranges of code points.

- Jonathan M Davis