Thread overview
std.string.toupper/tolower failed with mixture of Engish and Chinese characters
Nov 22, 2005
Shawn Liu
Nov 22, 2005
Kris
Nov 26, 2005
Thomas Kuehne
Nov 22, 2005
Derek Parnell
November 22, 2005
std.string.toupper() and std.string.tolower() give a wrong result when deal with
a mixture of upper/lower English and Chinese characters. e.g.
char[] a = "AbCdÖÐeFgH";
char[] b = std.string.toupper(a);
char[] c = std.string.tolower(a);

The length of a is 11, but the length of b,c is 18 now.


November 22, 2005
"Shawn Liu" <Shawn_member@pathlink.com> wrote...
> std.string.toupper() and std.string.tolower() give a wrong result when
> deal with
> a mixture of upper/lower English and Chinese characters. e.g.
> char[] a = "AbCdÖÐeFgH";
> char[] b = std.string.toupper(a);
> char[] c = std.string.tolower(a);
>
> The length of a is 11, but the length of b,c is 18 now.

Phobos doesn't supports non-ascii conversions/comparisons at this time?


November 22, 2005
On Tue, 22 Nov 2005 02:19:50 +0000 (UTC), Shawn Liu wrote:

> std.string.toupper() and std.string.tolower() give a wrong result when deal with
> a mixture of upper/lower English and Chinese characters. e.g.
> char[] a = "AbCdÖÐeFgH";
> char[] b = std.string.toupper(a);
> char[] c = std.string.tolower(a);
> 
> The length of a is 11, but the length of b,c is 18 now.

If it isn't ASCII then DMD doesn't want to know about it. Try the Mango library for its ICU bindings, I think that might have it.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
22/11/2005 1:33:24 PM
November 26, 2005
[follow up set to: digitalmars.D.bugs]

Kris schrieb am 2005-11-22:
> "Shawn Liu" <Shawn_member@pathlink.com> wrote...
>> std.string.toupper() and std.string.tolower() give a wrong result when
>> deal with
>> a mixture of upper/lower English and Chinese characters. e.g.
>> char[] a = "AbCdÖÐeFgH";
>> char[] b = std.string.toupper(a);
>> char[] c = std.string.tolower(a);
>>
>> The length of a is 11, but the length of b,c is 18 now.
>
> Phobos doesn't supports non-ascii conversions/comparisons at this time?
>

Phobos does, at least the simple conversions. No matter what cases are treated, the untreated data shouldn't get corrupted.

The attached zipped string.d fixes toupper/tolower and extends the unittests. (Yes I know, it isn't the fastest possible algorithm ...)

Thomas