| Thread overview | |||||||
|---|---|---|---|---|---|---|---|
|
December 14, 2015 [Issue 15440] std.uni outputs \u0069\u0307 as the lower case of \u0130 | ||||
|---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=15440 ag0aep6g@gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ag0aep6g@gmail.com --- Comment #1 from ag0aep6g@gmail.com --- Here are three Unicode documents and what they say about the lowercase of U+0130. (search for "LATIN CAPITAL LETTER I WITH DOT ABOVE"): 1) <http://www.unicode.org/charts/PDF/U0100.pdf> says: "lowercase is 0069 i". 2) <http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt> gives U+0069 as the lowercase, too, if I read it right. 3) <http://www.unicode.org/Public/UCD/latest/ucdxml/ucd.nounihan.grouped.zip> gives 'slc="0069" lc="0069 0307"'. I assume "slc" means "simple lowercase", and "lc" means "lowercase". So it seems that the "simple lowercase" is 'i', but the proper(?) lowercase is "\u0069\u0307". That makes sense when it's supposed to be reversible without assuming a Turkish context. Uppercasing "\u0069\u0307" you get "\u0049\u0307" ('I' + combining dot) which is equivalent to "\u0130". Seems to me that std.uni is playing by the book, and that there's a point in what the book says. But I don't know enough about Unicode to speak with certainty. -- | ||||
January 09, 2016 [Issue 15440] std.uni outputs \u0069\u0307 as the lower case of \u0130 | ||||
|---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=15440 Jack Stouffer <jack@jackstouffer.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED -- | ||||
January 09, 2016 [Issue 15440] std.uni outputs \u0069\u0307 as the lower case of \u0130 | ||||
|---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=15440 ag0aep6g@gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|FIXED |INVALID --- Comment #2 from ag0aep6g@gmail.com --- Changing the resolution to INVALID. As far as I know, the changelog lists all FIXED issues, and this shouldn't be in the changelog, because we didn't actually change anything. -- | ||||
January 11, 2016 [Issue 15440] std.uni outputs \u0069\u0307 as the lower case of \u0130 | ||||
|---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=15440 Ali Cehreli <acehreli@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |acehreli@yahoo.com --- Comment #3 from Ali Cehreli <acehreli@yahoo.com> --- It looks like I am outdated on this issue because I had never heard of the 0069 0307 sequence before H. S. Teoh brought the following change to my attention: https://github.com/D-Programming-Language/phobos/pull/3848 I've learned since then that the two-character sequence should be the default but TR locale should still use just 0069. According to the following quote, Java 7 behaves differently depending on locale: http://grepalex.com/2013/02/14/java-7-and-the-dotted--and-dotless-i/ <quote> CODE LOWER TITLE UPPER LANGUAGE 0130; 0069 0307; 0130; 0130; 0130; 0069; 0130; 0130; tr; 0130; 0069; 0130; 0130; az; Entries with a language take precedence over those without, so in my JVM where the default locale is English, the first row of the mapping is used, which lines-up with the codepoints that we saw outputted in our Java 7 example. Therefore to make Java do the right thing here for Turkish, we need to explicitly specify the Turkish locale (“tr” is the ISO 639 alpha-2 language code for Turkish) to the toLowerCase method </quote> Should std.uni be locale-aware? -- | ||||
January 12, 2016 [Issue 15440] std.uni outputs \u0069\u0307 as the lower case of \u0130 | ||||
|---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=15440 --- Comment #4 from Jack Stouffer <jack@jackstouffer.com> --- (In reply to Ali Cehreli from comment #3) > Should std.uni be locale-aware? Yes, though in what way it would achieve this is an interesting question. I think you should make a seperate bug report for this. -- | ||||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply