Thread overview | ||||||
---|---|---|---|---|---|---|
|
December 09, 2011 [Issue 7084] New: Missing writeln Unicode normalization | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=7084 Summary: Missing writeln Unicode normalization Product: D Version: D2 Platform: x86 OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody@puremagic.com ReportedBy: bearophile_hugs@eml.cc --- Comment #0 from bearophile_hugs@eml.cc 2011-12-09 01:12:59 PST --- In this program the string 'txt1' contains two codepoints: LATIN CAPITAL LETTER A, and COMBINING DIAERESIS. I think a good printing function has to perform Unicode normalization and show a single \U000000C4 (LATIN CAPITAL LETTER A WITH DIAERESIS) glyph. But with DMD 2.057beta it shows two glyphs (on Windows), an 'A' followed by a diaeresis. writeln(txt2) shows what I think is the correct output for writeln(txt1) too: import std.stdio; void main() { dstring txt1 = "\U00000041\U00000308"d; writeln(txt1); dstring txt2 = "\U000000C4"d; writeln(txt2); } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
February 26, 2012 [Issue 7084] Missing writeln Unicode normalization | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile_hugs@eml.cc | http://d.puremagic.com/issues/show_bug.cgi?id=7084 hsteoh@quickfur.ath.cx changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hsteoh@quickfur.ath.cx --- Comment #1 from hsteoh@quickfur.ath.cx 2012-02-25 17:57:14 PST --- IMO this should be an enhancement request. As I understand, Unicode normalization is non-trivial, so we probably should think over how we want to do it. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
February 26, 2012 [Issue 7084] Missing writeln Unicode normalization | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile_hugs@eml.cc | http://d.puremagic.com/issues/show_bug.cgi?id=7084 bearophile_hugs@eml.cc changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement --- Comment #2 from bearophile_hugs@eml.cc 2012-02-26 14:59:46 PST --- (In reply to comment #1) > IMO this should be an enhancement request. As I understand, Unicode normalization is non-trivial, so we probably should think over how we want to do it. OK, now it's an enhancement. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
February 27, 2012 [Issue 7084] Missing writeln Unicode normalization | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile_hugs@eml.cc | http://d.puremagic.com/issues/show_bug.cgi?id=7084 --- Comment #3 from hsteoh@quickfur.ath.cx 2012-02-26 22:22:24 PST --- Here's a link to the relevant part of the Unicode standard for whoever wants to implement normalization: http://unicode.org/reports/tr15/ Note that there are several different normalizations, with NFC probably being the closest to what this bug requires. After scanning through the standard, it seems to me that rather than putting this in std.stdio (or the prospective std.io), we really should put it in std.uni or std.utf, and have different algorithms available for programs to choose the normalization form. The algorithms involved are not trivial, and some people may not want std.stdio to automatically normalize to a particular form when they want specifically to use a different form or a non-normalized output for whatever reason. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation