Thread overview | |||||
---|---|---|---|---|---|
|
February 24, 2006 Submission: updated std.uni module | ||||
---|---|---|---|---|
| ||||
Attachments:
| Attached is an updated std.uni module. publicly visible changes: 1) updated casing and isUniAlpha data to Unicode 5.0.0 2) added "dchar[] toUniLower(dchar[])" and "dchar[] toUniUpper(dchar[])" in order to handle cases like toUniUpper("\u00DF") -> "SS" internal changes: 3) use AAs instead of hardcoded IFs for upper and lower casing (I might expand the extractor to hardcode IFs, if anybody experiences serious performance degration.) Thomas |
February 24, 2006 Re: Submission: updated std.uni module | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kühne | On Fri, 24 Feb 2006 05:58:39 -0500, Thomas Kühne <thomas-dloop@kuehne.cn> wrote:
> Attached is an updated std.uni module.
>
I didn't even know std.uni existed; I could use such functions it provides.
|
February 24, 2006 Re: Submission: updated std.uni module | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kühne Attachments: | Thomas Kühne schrieb am 2006-02-24:
> Attached is an updated std.uni module.
>
> publicly visible changes:
> 1) updated casing and isUniAlpha data to Unicode 5.0.0
>
> 2) added "dchar[] toUniLower(dchar[])" and "dchar[] toUniUpper(dchar[])"
> in order to handle cases like toUniUpper("\u00DF") -> "SS"
>
>
> internal changes:
> 3) use AAs instead of hardcoded IFs for upper and lower casing
> (I might expand the extractor to hardcode IFs, if anybody experiences
> serious performance degration.)
Unicode seems sometimes to be a collection of special cases ;)
Forgot to add:
The following characters aren't mapped correctly.
format: character (condition)
GREEK CAPITAL LETTER SIGMA (Final_Sigma)
==Lithuanian locale==
COMBINING DOT ABOVE (After_Soft_Dotted)
LATIN CAPITAL LETTER I (More_Above)
LATIN CAPITAL LETTER J (More_Above)
LATIN CAPITAL LETTER I WITH OGONEK (More_Above)
LATIN CAPITAL LETTER I WITH GRAVE
LATIN CAPITAL LETTER I WITH ACUTE
LATIN CAPITAL LETTER I WITH TILDE
==Turkish and Azeri locale==
LATIN CAPITAL LETTER I WITH DOT ABOVE
COMBINING DOT ABOVE (After_I)
LATIN CAPITAL LETTER I (Not_Before_Dot)
LATIN SMALL LETTER I
Thomas
|
Copyright © 1999-2021 by the D Language Foundation