June 05, 2004
The following is correct behavior, and is implemented correctly. Nice one! The compiler correctly correctly rejects the following line.

>       char c = 'ß';  // compile error - invalid UTF-8 sequence


However, we see a related bug in the following example:

>       char c = 0xC3; // first byte of a UTF-8 sequence
>       wchar w = c;

This auto-promotion should fail, throwing a runtime exception (because 0xC3 by itself is an invalid UTF-8 sequence). Current behavior is that the cast succeeds, as though c had contained an ISO-8859-1 character instead of a UTF-8 fragment.

Arcane Jill



June 05, 2004
"Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:c9s7nu$1255$1@digitaldaemon.com...
> The following is correct behavior, and is implemented correctly. Nice one!
The
> compiler correctly correctly rejects the following line.
>
> >       char c = 'ß';  // compile error - invalid UTF-8 sequence
>
>
> However, we see a related bug in the following example:
>
> >       char c = 0xC3; // first byte of a UTF-8 sequence
> >       wchar w = c;
>
> This auto-promotion should fail, throwing a runtime exception (because
0xC3 by
> itself is an invalid UTF-8 sequence). Current behavior is that the cast succeeds, as though c had contained an ISO-8859-1 character instead of a
UTF-8
> fragment.

I see what you're saying. Doing such would require a runtime test; not sure about the tradeoffs.