std.string.toupper/tolower failed with mixture of Engish and Chinese characters

Nov 22, 2005

Shawn Liu

Nov 22, 2005

Kris

Nov 26, 2005

Thomas Kuehne

Nov 22, 2005

Derek Parnell

std.string.toupper() and std.string.tolower() give a wrong result when deal with a mixture of upper/lower English and Chinese characters. e.g. char[] a = "AbCdÖÐeFgH"; char[] b = std.string.toupper(a); char[] c = std.string.tolower(a); The length of a is 11, but the length of b,c is 18 now.

"Shawn Liu" <Shawn_member@pathlink.com> wrote... > std.string.toupper() and std.string.tolower() give a wrong result when > deal with > a mixture of upper/lower English and Chinese characters. e.g. > char[] a = "AbCdÖÐeFgH"; > char[] b = std.string.toupper(a); > char[] c = std.string.tolower(a); > > The length of a is 11, but the length of b,c is 18 now. Phobos doesn't supports non-ascii conversions/comparisons at this time?

On Tue, 22 Nov 2005 02:19:50 +0000 (UTC), Shawn Liu wrote: > std.string.toupper() and std.string.tolower() give a wrong result when deal with > a mixture of upper/lower English and Chinese characters. e.g. > char[] a = "AbCdÖÐeFgH"; > char[] b = std.string.toupper(a); > char[] c = std.string.tolower(a); > > The length of a is 11, but the length of b,c is 18 now. If it isn't ASCII then DMD doesn't want to know about it. Try the Mango library for its ICU bindings, I think that might have it. -- Derek (skype: derek.j.parnell) Melbourne, Australia 22/11/2005 1:33:24 PM

November 26, 2005

Re: std.string.toupper/tolower failed with mixture of Engish and Chinese characters

Posted by Thomas Kuehne
in reply to Kris

Permalink

Thomas Kuehne

Posted in reply to Kris

Attachments:

Permalink

[follow up set to: digitalmars.D.bugs]

Kris schrieb am 2005-11-22:
> "Shawn Liu" <Shawn_member@pathlink.com> wrote...
>> std.string.toupper() and std.string.tolower() give a wrong result when
>> deal with
>> a mixture of upper/lower English and Chinese characters. e.g.
>> char[] a = "AbCdÖÐeFgH";
>> char[] b = std.string.toupper(a);
>> char[] c = std.string.tolower(a);
>>
>> The length of a is 11, but the length of b,c is 18 now.
>
> Phobos doesn't supports non-ascii conversions/comparisons at this time?
>

Phobos does, at least the simple conversions. No matter what cases are treated, the untreated data shouldn't get corrupted.

The attached zipped string.d fixes toupper/tolower and extends the unittests. (Yes I know, it isn't the fastest possible algorithm ...)

Thomas

Forums