View mode: basic / threaded / horizontal-split · Log in · Help
November 22, 2011
char[] + utf-8 + canFind == bug?
I've some problems (again) with UTF-8. Try this code:

char[] chars = ['à','è','ì'];
chars.canFind('è');

It doesn't work:
std.utf.UTFException@std/utf.d(644): Invalid UTF-8 sequence (at index 1)

But this one works:

string[] chars = ["à","è","ì"];
chars.canFind("è");

I'm using dmd/druntime/phobos downloaded from github today.
November 22, 2011
Re: char[] + utf-8 + canFind == bug?
On 11/22/11 10:28 AM, Andrea Fontana wrote:
> I've some problems (again) with UTF-8. Try this code:
>
> char[] chars = ['à','è','ì'];

This will truncate the multi-byte characters. It should be a 
compile-time error.

Andrei
November 22, 2011
Re: char[] + utf-8 + canFind == bug?
I guess I should use wchar instead of char. :)

Il giorno mar, 22/11/2011 alle 10.31 -0600, Andrei Alexandrescu ha
scritto:

> On 11/22/11 10:28 AM, Andrea Fontana wrote:
> > I've some problems (again) with UTF-8. Try this code:
> >
> > char[] chars = ['à','è','ì'];
> 
> This will truncate the multi-byte characters. It should be a 
> compile-time error.
> 
> Andrei
November 22, 2011
Re: char[] + utf-8 + canFind == bug?
On Tuesday, November 22, 2011 17:38:36 Andrea Fontana wrote:
> I guess I should use wchar instead of char. :)

Individual characters really should be processed as dchars in the general 
case. There's a simple solution here though:

char[] chars = "àèì";

- Jonathan M Davis
November 22, 2011
Re: char[] + utf-8 + canFind == bug?
dchar works but simple solution doesn't.

code:

char[] chars = "òà";
chars.canFind('à');

It says:

Error: cannot implicitly convert expression ("\xc3\xb2\xc3\xa0") of type
string to char[]
Error: template std.algorithm.canFind(alias pred = "a == b",Range,V) if
(is(typeof(find!(pred)(range,value)))) does not match any function
template declaration
Error: template std.algorithm.canFind(alias pred = "a == b",Range,V) if
(is(typeof(find!(pred)(range,value)))) cannot deduce template function
from argument types !()(char[],wchar)

Il giorno mar, 22/11/2011 alle 08.49 -0800, Jonathan M Davis ha scritto:

> On Tuesday, November 22, 2011 17:38:36 Andrea Fontana wrote:
> > I guess I should use wchar instead of char. :)
> 
> Individual characters really should be processed as dchars in the general 
> case. There's a simple solution here though:
> 
> char[] chars = "àèì";
> 
> - Jonathan M Davis
November 22, 2011
Re: char[] + utf-8 + canFind == bug?
On 11/22/11 11:02 AM, Andrea Fontana wrote:
>
> char[] chars = "òà";

Use string/auto here, or .dup on the string.

I filed http://d.puremagic.com/issues/show_bug.cgi?id=6988 on your 
behalf. Thanks for sharing!


Andrei
November 22, 2011
Re: char[] + utf-8 + canFind == bug?
On 2011-11-22 18:14, Andrei Alexandrescu wrote:
> On 11/22/11 11:02 AM, Andrea Fontana wrote:
>>
>> char[] chars = "òà";
>
> Use string/auto here, or .dup on the string.
>
> I filed http://d.puremagic.com/issues/show_bug.cgi?id=6988 on your
> behalf. Thanks for sharing!
>
>
> Andrei

Hasn't this already been reported?

-- 
/Jacob Carlborg
November 22, 2011
Re: char[] + utf-8 + canFind == bug?
On Tuesday, November 22, 2011 18:02:53 Andrea Fontana wrote:
> dchar works but simple solution doesn't.
> 
> code:
> 
> char[] chars = "òà";
> chars.canFind('à');
> 
> It says:
> 
> Error: cannot implicitly convert expression ("\xc3\xb2\xc3\xa0") of type
> string to char[]
> Error: template std.algorithm.canFind(alias pred = "a == b",Range,V) if
> (is(typeof(find!(pred)(range,value)))) does not match any function
> template declaration
> Error: template std.algorithm.canFind(alias pred = "a == b",Range,V) if
> (is(typeof(find!(pred)(range,value)))) cannot deduce template function
> from argument types !()(char[],wchar)

Ah. Yes. String literals are immutable (at least in Linux). So, you'de need to 
dup it if you want a mutable char[] instead of a string. The normal case is to 
use a string though, so unless you actually want to mutate the characters in 
the array (which is frequently an iffy thing to do with char[], since you have 
to worry about not screwing up the code points), you should use string.

- Jonathan M Davis
Top | Discussion index | About this forum | D home