|Posted by Ruslan Nikolaev||PermalinkReply|
Just one more addition: it is possible to have built-in function that converts multibyte (or multiword) char sequence (even though in my proposal it can be of different size) to dchar (UTF-32) character. Again, my only point is that it would be nice to have something similar to TCHAR so that all libraries can use it if they choose not to provide functions for all 3 types.
Yes, programmers do often ignore surrogate pairs in case of UTF-16. But in case of undetermined char size (1 or 2 bytes) they will have to use special builtin conversion functions to dchar unless they want their code to be completely broken.
--- On Tue, 6/8/10, Ruslan Nikolaev <firstname.lastname@example.org> wrote:
> From: Ruslan Nikolaev <email@example.com>
> Subject: Re: Wide characters support in D
> To: "digitalmars.D" <firstname.lastname@example.org>
> Date: Tuesday, June 8, 2010, 3:16 AM
> Ok, ok... that was just a
> suggestion... Thanks, for reply about "Hello world"
> representation. Was postfix "w" and "d" added initially or
> just recently? I did not know about it. I thought D does
> automatic conversion for string literals.
> Yes, templates may help. However, that unnecessary make code bigger (since we have to compile it for every char type). The other problem is that it allows programmer to choose which one to use. He or she may just prefer char as UTF-8 (or wchar as UTF-16). That will be fine on platform that supports this encoding natively (e.g. for file system operations, screen output, etc.), whereas it will cause conversion overhead on the other. Not to say that it's a big overhead, but unnecessary one. Having said this, I do agree that there must be some flexibility (e.g. in Java char is always 2 bytes), however, I don't believe that this flexibility should be available for application programmer.
> I don't think there is any problem with having different size of char. In fact, that would make programs better (since application programmers will have to think in terms of characters as opposed to bytes). System programmers (i.e. OS programmers) may choose to think as they expect it to be (since char width option can be added to compiler). TCHAR in Windows is a good example of it. Whenever you need to determine size of element (e.g. for allocation), you can use 'sizeof'. Again, it does not mean that you're deprived of char/wchar/dchar capability. It still can be supported (e.g. via ubyte/ushort/uint) for the sake of interoperability or some special cases. Special string constants (e.g. ""b, ""w, ""d) can be supported, too. My only point is that it would be good to have universal char type that depends on platform. That, in turns, allows to have unified char for all libraries on this platform.
> In addition, commonly used constants '\n', '\r', '\t' will be the same regardless of char width.
> Anyway, that was just a suggestion. You may disagree with this if you wish.