July 09, 2004 UTF-32 bug | ||||
---|---|---|---|---|
| ||||
It seems incredible, to me, that toUTF8(dchar[]) can ever /return/ invalid UTF-8. But it does, when given invalid input! The following code # import std.utf; # # dchar[] s = [ 0xD800 ]; // Invalid UTF-32 # # int main() # { # printf("1\n"); # char[] t = toUTF8(s); # printf("2\n"); # dchar[] u = toUTF32(t); # printf("3\n"); # return 0; # } Compiles successfully. Output is # 1 # 2 # Error: invalid UTF-8 sequence The problem is that the output SHOULD be... # 1 # Error: invalid UTF-32 sequence Fortunately, the fix is very simple. All you have to do is modify toUTF8(dchar[]) to verify that every dchar in the input returns true from std.utf.isValidDchar(). (Observe that isValidDchar(0xD800) correctly returns false.) Arcane Jill |
Copyright © 1999-2021 by the D Language Foundation