Thread overview | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
August 09, 2011 [Issue 6458] New: Multibyte char literals shouldn't implicitly convert to char | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=6458 Summary: Multibyte char literals shouldn't implicitly convert to char Product: D Version: D2 Platform: Other OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: DMD AssignedTo: nobody@puremagic.com ReportedBy: clugdbug@yahoo.com.au --- Comment #0 from Don <clugdbug@yahoo.com.au> 2011-08-08 21:43:38 PDT --- The code below should either be rejected, or work correctly. The particularly problematic case is: s[0..2] = 'ä', which looks perfectly reasonable, but creates garbage. I'm a bit confused about non-ASCII char literals, since although they are typed as 'char', they can't be stored in a char... This just seems wrong. ---- int bug6458() { char [] s = "abcdef".dup; s[0] = 'ä'; assert(s == "äcdef"); return 34; } void main() { bug6458(); } Surely this has been reported before, but I can't find it. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
August 09, 2011 [Issue 6458] Multibyte char literals shouldn't implicitly convert to char | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don | http://d.puremagic.com/issues/show_bug.cgi?id=6458 Jonathan M Davis <jmdavisProg@gmx.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jmdavisProg@gmx.com --- Comment #1 from Jonathan M Davis <jmdavisProg@gmx.com> 2011-08-08 21:53:05 PDT --- Personally, I think that all character literals should be typed as dchar, since it's generally a _bad_ idea to operate on individual chars or wchars. Normally, the only places that chars or wchars should be used is in ranges of chars or wchars (which would normally be arrays). But making character literals dchar be default might break too much code at this point. Though, since it should be possible to use range propagation to verify whether a particular code point will fit in a particular code unit, the breakage might be minimal. Regardless, I actually never would have expected s[0 .. 2] = 'ä' to work, since you're assigning a character to multiple characters as far as types go, though I can see why you might think that it would work or why it arguably _should_ work. Obviously though, if the compiler is allowing you to assign a code point to multiple code units like that, it should only compile if it can verify that the code unit will fit exactly in those code units, and if it does compile, it should work correctly rather than generate garbage. So, there are several issues at work here it seems. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
August 09, 2011 [Issue 6458] Multibyte char literals shouldn't implicitly convert to char | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don | http://d.puremagic.com/issues/show_bug.cgi?id=6458 Don <clugdbug@yahoo.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |accepts-invalid --- Comment #2 from Don <clugdbug@yahoo.com.au> 2011-08-08 22:27:32 PDT --- (In reply to comment #1) > Personally, I think that all character literals should be typed as dchar, since it's generally a _bad_ idea to operate on individual chars or wchars. Normally, the only places that chars or wchars should be used is in ranges of chars or wchars (which would normally be arrays). But making character literals dchar be default might break too much code at this point. Though, since it should be possible to use range propagation to verify whether a particular code point will fit in a particular code unit, the breakage might be minimal. Oddly, this passes: static assert('ä'.sizeof == 2); So there's something a bit nonsensical about the whole thing. > Regardless, I actually never would have expected s[0 .. 2] = 'ä' to work, since you're assigning a character to multiple characters as far as types go, It's more subtle. This is block assignment. s[0..4] = 'a'; works, and creates "aaaa". s[0..4] = 'ä' is expected to fill the string with ä, creating "ää". Instead, it fills it with four copies of the first uft8 byte of ä, creating an invalid string. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
August 09, 2011 [Issue 6458] Multibyte char literals shouldn't implicitly convert to char | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don | http://d.puremagic.com/issues/show_bug.cgi?id=6458 --- Comment #3 from Jonathan M Davis <jmdavisProg@gmx.com> 2011-08-08 22:33:20 PDT --- Ah, yes. I forgot that you could assign a single value to every element in an array like that. That being the case, it should just fail to compile given that the code point is not going to fit in each of the elements of the array. But regardless, something odd is definitely going on here given that 'ä'.sizeof == 2. It's probably an edge case which wasn't caught, since the only types which take up multiple elements like that are char and wchar. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
August 09, 2011 [Issue 6458] Multibyte char literals shouldn't implicitly convert to char | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don | http://d.puremagic.com/issues/show_bug.cgi?id=6458 changlon <changlon@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |changlon@gmail.com --- Comment #4 from changlon <changlon@gmail.com> 2011-08-08 23:13:53 PDT --- s[0..3] = 'a'; this should raise an exception ? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
August 09, 2011 [Issue 6458] Multibyte char literals shouldn't implicitly convert to char | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don | http://d.puremagic.com/issues/show_bug.cgi?id=6458 --- Comment #5 from changlon <changlon@gmail.com> 2011-08-08 23:14:35 PDT --- (In reply to comment #4) > s[0..3] = 'a'; > > this should raise an exception ? sorry , I mean s[0..3] = 'ä'; -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
August 09, 2011 [Issue 6458] Multibyte char literals shouldn't implicitly convert to char | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don | http://d.puremagic.com/issues/show_bug.cgi?id=6458 --- Comment #6 from Jonathan M Davis <jmdavisProg@gmx.com> 2011-08-08 23:19:15 PDT --- It shouldn't even compile, because the types don't match. Even with range propagation, the best that you'll do with 'ä' is fit it in a wchar, so it won't fit in a char, and so you _can't_ assign it to each element of s[0 .. 3] like that. s[0 .. 3] = "ä"[] should work, but s[0 .. 3] = 'ä' definitely shouldn't. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
August 09, 2011 [Issue 6458] Multibyte char literals shouldn't implicitly convert to char | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don | http://d.puremagic.com/issues/show_bug.cgi?id=6458 Jacob Carlborg <doob@me.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |doob@me.com --- Comment #7 from Jacob Carlborg <doob@me.com> 2011-08-08 23:44:22 PDT --- As far as I can see, D uses the smallest type necessary to fit a character literal. So all non-ascii character literals will either be wchar or dchar. Both of the following passes, as expected. static assert(is(typeof('ä') == wchar)); static assert(is(typeof('a') == char)); But I don't know why the compiler allows to assign a wchar to a char array element. That doesn't seem right. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
August 09, 2011 [Issue 6458] Multibyte char literals shouldn't implicitly convert to char | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don | http://d.puremagic.com/issues/show_bug.cgi?id=6458 --- Comment #8 from Don <clugdbug@yahoo.com.au> 2011-08-09 00:09:02 PDT --- (In reply to comment #7) > As far as I can see, D uses the smallest type necessary to fit a character literal. So all non-ascii character literals will either be wchar or dchar. Both of the following passes, as expected. > > static assert(is(typeof('ä') == wchar)); > static assert(is(typeof('a') == char)); That's good news. Seems like it's only a few cases where it behaves stupidly. > But I don't know why the compiler allows to assign a wchar to a char array element. That doesn't seem right. It's more general than that: wchar w = 'ä'; char c = w; // Error: cannot implicitly convert expression (w) of type wchar to char char c = 'ä'; // passes!!! -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
January 31, 2012 [Issue 6458] Multibyte char literals shouldn't implicitly convert to char | ||||
---|---|---|---|---|
| ||||
Posted in reply to Don | http://d.puremagic.com/issues/show_bug.cgi?id=6458 yebblies <yebblies@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |yebblies@gmail.com Platform|Other |All AssignedTo|nobody@puremagic.com |yebblies@gmail.com OS/Version|Windows |All --- Comment #10 from yebblies <yebblies@gmail.com> 2012-01-31 15:24:30 EST --- (In reply to comment #9) > > The compiler complains about the code above, just as it should, because a long won't fit in an int. Don't know why character literals are treated differently. They aren't. The problem is that 'ä' evaluates to 0x00E4, and a bug in integer range propagation thinks this is ok to convert back to a char. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation