June 05, 2004 UTF-8 bug | ||||
---|---|---|---|---|
| ||||
The following is correct behavior, and is implemented correctly. Nice one! The compiler correctly correctly rejects the following line. > char c = 'ß'; // compile error - invalid UTF-8 sequence However, we see a related bug in the following example: > char c = 0xC3; // first byte of a UTF-8 sequence > wchar w = c; This auto-promotion should fail, throwing a runtime exception (because 0xC3 by itself is an invalid UTF-8 sequence). Current behavior is that the cast succeeds, as though c had contained an ISO-8859-1 character instead of a UTF-8 fragment. Arcane Jill |
June 05, 2004 Re: UTF-8 bug | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | "Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:c9s7nu$1255$1@digitaldaemon.com... > The following is correct behavior, and is implemented correctly. Nice one! The > compiler correctly correctly rejects the following line. > > > char c = 'ß'; // compile error - invalid UTF-8 sequence > > > However, we see a related bug in the following example: > > > char c = 0xC3; // first byte of a UTF-8 sequence > > wchar w = c; > > This auto-promotion should fail, throwing a runtime exception (because 0xC3 by > itself is an invalid UTF-8 sequence). Current behavior is that the cast succeeds, as though c had contained an ISO-8859-1 character instead of a UTF-8 > fragment. I see what you're saying. Doing such would require a runtime test; not sure about the tradeoffs. |
Copyright © 1999-2021 by the D Language Foundation