Thread overview
Weird error on char literal outside UTF-16 or UTF-32 range
Aug 10, 2004
Stewart Gordon
Aug 10, 2004
Arcane Jill
Aug 10, 2004
Walter
August 10, 2004
dchar qwert = '\U00110000';
----------
D:\My Documents\Programming\D\Tests\bugs\utf32overflow.d(1): invalid UTF character \U08x
----------

I don't know if it's intended behaviour to reject UTF-32 codes that are outside the range that's valid so far.  But that error message doesn't exactly make sense.

It's the exact same error for any value above '\U0010FFFF' AFAICT, and also for the 'permanently unassigned' codes ('\U0000FFFF', '\U0000FFFE', '\uFFFF', '\uFFFE')....

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.
August 10, 2004
In article <cfa6p8$1i5a$1@digitaldaemon.com>, Stewart Gordon says...
>
>dchar qwert = '\U00110000';
>----------
>D:\My Documents\Programming\D\Tests\bugs\utf32overflow.d(1): invalid UTF character \U08x
>----------

I'll leave Walter to comment on that error message.



>I don't know if it's intended behaviour to reject UTF-32 codes that are outside the range that's valid so far.

Yes and no. As I understand it, it goes like this:

#    dchar qwert = 0x00110000;    // should succeed
#    dchar qwert = '\U00110000';  // should fail

It's only because you put it inside a character literal that you got problems - and I think that's reasonable, because (as you know), there is no such character as U+110000, but there /is/ such a number as 0x110000.

There are some fancy esoteric reasons why you might want to store noncharacters in a dchar, but only if you /really/ know what you're doing - and in such circumstances you would never pass such a value to a UTF conversion function, because you /know/ it's going to fail to validate.


>But that error message doesn't exactly make sense.

I can't argue with that.

Arcane Jill


August 10, 2004
"Stewart Gordon" <smjg_1998@yahoo.com> wrote in message news:cfa6p8$1i5a$1@digitaldaemon.com...
> dchar qwert = '\U00110000';
> ----------
> D:\My Documents\Programming\D\Tests\bugs\utf32overflow.d(1): invalid UTF
> character \U08x
> ----------
>
> I don't know if it's intended behaviour to reject UTF-32 codes that are outside the range that's valid so far.  But that error message doesn't exactly make sense.

That's 'cuz the format is supposed to be \\U%08x, not \\U08x <g>

> It's the exact same error for any value above '\U0010FFFF' AFAICT, and also for the 'permanently unassigned' codes ('\U0000FFFF', '\U0000FFFE', '\uFFFF', '\uFFFE')....

If you want to use invalid UTF characters, you'll need to do it explicitly:

    dchar qwert = cast(dchar)0x00110000;

Also, all the phobos library functions that deal with UTF strings are only defined to work with valid UTF characters.