June 09, 2004 Re: DMD 0.92 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stewart Gordon | Stewart Gordon wrote: > Hauke Duden wrote: > > <snip> > >> In any case, in Unicode upper and lower case characters do not have a constant offset to each other. That is only true for the ASCII subset. > > > Yes, you do have a point there. What's more, there isn't a 1:1 mapping between uppercase and lowercase characters. You're wrong. the Unicode standard defines 1:1 case mappings (see http://www.unicode.org/Public/UNIDATA/UCD.html). There is also an additional "special casing" with one-to-many mappings but only a handful of characters are effected. It would be nice to support that too, but for everyday work the 1:1 mappings are usually sufficient. And the mappings that there > are aren't language independent. Huh? Casing is not effected by locale. Maybe you are thinking about collation? Hauke |
June 09, 2004 Re: DMD 0.92 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | In article <ca71c4$2b8l$1@digitaldaemon.com>, Arcane Jill says... > >In article <ca54is$2h2r$1@digitaldaemon.com>, David L. Davis says... > >> sStr[ iStrPos ] + 0x20 > >Ah! Now these old ASCII habits really should be dropped. Hauke has written this magnificent charToUpper() routine. It should be used. > >> I feel like a young Skywalker in training, learning how to best use "The Force!" > >Other than that: Impressive - Obi Won has taught you well. (Hope I'm not too >discouraging). :) > >Jill > > Jill: Don't sweat it, all your advice has been encouraging! :) If I wasn't getting any feedback at all from anyone, now that would be "discouraging" in my mind...again thxs for your advice. Afterall, if these functions meet Walter and the "D" forum's approval, they just might become a part of the std.string for everyone to use. After work, I'll check out Hauke's charToLower() function, and see what kind of requirements it has. And if it looks like a good fix, I'll ask Hauke if I may use it...giving him full credit for his work of course. :) David ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!" |
June 09, 2004 Re: DMD 0.92 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Hauke Duden | Hauke Duden wrote: > Stewart Gordon wrote: <snip> >> Yes, you do have a point there. What's more, there isn't a 1:1 mapping between uppercase and lowercase characters. > > You're wrong. the Unicode standard defines 1:1 case mappings (see http://www.unicode.org/Public/UNIDATA/UCD.html). There seems to be a contradiction here. That file indicates that UnicodeData.txt only contains 1:1 mappings. But just as I wondered, there's a 2:1 mapping in 03C2 and 03C3. > There is also an additional "special casing" with one-to-many mappings but only a handful of characters are effected. It would be nice to support that too, but for everyday work the 1:1 mappings are usually sufficient. So, which characters do the one-to-many mappings bring about? >> And the mappings that there are aren't language independent. > > Huh? Casing is not effected by locale. Maybe you are thinking about collation? What do you mean by that? Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit. |
June 09, 2004 Re: DMD 0.92 release (but actually about Unicode) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stewart Gordon | In article <ca7cgp$2svc$1@digitaldaemon.com>, Stewart Gordon says... > >There seems to be a contradiction here. That file indicates that UnicodeData.txt only contains 1:1 mappings. But just as I wondered, there's a 2:1 mapping in 03C2 and 03C3. Look, it's perfectly simple. Everybody's right. And because everybody's right, everybody's accusing everybody else of being wrong. THERE ARE TWO ANSWERS. "Simple casing" is a one to mapping from character to character, and is locale-independent. "Full casing" is a a one to many mapping from string to string, and is ALMOST locale independent, but not quite. Hauke's brilliant library supports simple casing, not full casing. That's why both the input and the output are characters, not strings. >So, which characters do the one-to-many mappings bring about? For example, the German character 'ß' uppercases to "SS" when using full casing, but it stays as 'ß' using simple casing. >>> And the mappings that there are aren't language independent. >> >> Huh? Casing is not effected by locale. Maybe you are thinking about collation? > >What do you mean by that? Full casing (but not simple casing) has localized exceptions ONLY for Tukish, Lithuanian and Azeri. In principle, other exceptions could be added in the future. Simple casing is completely locale independent. Collation is a different kettle of fish, and we currently have no libraries to support it. Arcane Jill |
June 09, 2004 Re: DMD 0.92 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stewart Gordon | Stewart Gordon wrote: > Hauke Duden wrote: > >> Stewart Gordon wrote: > > <snip> > >>> Yes, you do have a point there. What's more, there isn't a 1:1 mapping between uppercase and lowercase characters. >> >> >> You're wrong. the Unicode standard defines 1:1 case mappings (see http://www.unicode.org/Public/UNIDATA/UCD.html). > > > There seems to be a contradiction here. That file indicates that UnicodeData.txt only contains 1:1 mappings. But just as I wondered, there's a 2:1 mapping in 03C2 and 03C3. Where did you get that information? From the data file http://www.unicode.org/Public/UNIDATA/UnicodeData.txt: 03C2;GREEK SMALL LETTER FINAL SIGMA;Ll;0;L;;;;;N;;;03A3;;03A3 03C3;GREEK SMALL LETTER SIGMA;Ll;0;L;;;;;N;;;03A3;;03A3 The interesting entries are the last three. Their format is UPPER;LOWER;TITLE. So both letters have an upper and title mapping to 03A3 and no lower mapping. >> There is also an additional "special casing" with one-to-many mappings >> but only a handful of characters are effected. It would be nice to support that too, but for everyday work the 1:1 mappings are usually sufficient. > > > So, which characters do the one-to-many mappings bring about? An example of a character with special casing is 1FB2 (GREEK SMALL LETTER ALPHA WITH VARIA AND YPOGEGRAMMENI). Its upper case maps to 1FBA + 0399 (GREEK CAPITAL LETTER ALPHA WITH VARIA + GREEK CAPITAL LETTER IOTA). >>> And the mappings that there are aren't language independent. >> >> >> Huh? Casing is not effected by locale. Maybe you are thinking about collation? > > > What do you mean by that? Collation is a locale dependent comparison of strings. I.e. it defines the "phone book" ordering of strings in a particular language. Hauke |
June 09, 2004 Re: DMD 0.92 release (but actually about Unicode) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | Arcane Jill wrote:
> Full casing (but not simple casing) has localized exceptions ONLY for Tukish,
> Lithuanian and Azeri. In principle, other exceptions could be added in the
> future. Simple casing is completely locale independent.
Ouch. I didn't know that. Makes me feel happy that I stayed away from the special casings up to now ;).
Thanks for clearing up the misunderstanding!
Hauke
|
June 09, 2004 Re: DMD 0.92 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to David L. Davis | "David L. Davis" <SpottedTiger@yahoo.com> wrote in message news:ca69ie$1813$1@digitaldaemon.com... > In article <ca5ct6$2vif$1@digitaldaemon.com>, Walter says... > > > >The function uppercases the input string. It shouldn't modify its inputs. > > > > Walter: Third time around is normally the "Charm!" Anywayz, I've been hammering > away at these two functions ifind() and irfind(), and I believe I've make them > much better than before, thanks to both you and Jill for the advice. > Hello, I just wanted to let you know that I wrote those functions awhile ago for a String class that can be found at www.dprogramming.com/stringclass.d . It contains all the free functions, and a few others such as findany(), endswith(), etc; and case insensitive versions. I haven't said much about it because it's completely based off Walter's code, so it belongs to him. The class can be stripped out to just use the functions. If the code isn't good enough, just ignore me; have fun! |
June 09, 2004 Re: DMD 0.92 release | ||||
---|---|---|---|---|
| ||||
Posted in reply to David L. Davis | There's no need to .dup the strings. Just have a loop that looks like this: for (i = 0; i < string1.length; i++) { char c = toupper(string1[i]); if (c != toupper(string2[i])) goto nomatch; } Note that it compares character by character without needing to allocate memory. In fact, just copy the logic in find() and rfind(), replacing memchr and memcmp with case insensitive loops, write some unit tests, and you'll be there. |
June 09, 2004 Re: Bit array slices again | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stewart Gordon | Another option is to only allow bit slicing on byte boundaries, and only allow pointers to bits if they are in bit 0 of a byte. |
June 09, 2004 Re: Bit array slices again | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | In article <ca7qp5$hrl$2@digitaldaemon.com>, Walter says... > >Another option is to only allow bit slicing on byte boundaries, and only allow pointers to bits if they are in bit 0 of a byte. > > That's EXACTLY what my workaround does. You can have the code for free if you want. Jill |
Copyright © 1999-2021 by the D Language Foundation