Thread overview | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
October 02, 2006 [Issue 391] New: .sort and .reverse break utf8 encoding | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=391 Summary: .sort and .reverse break utf8 encoding Product: D Version: unspecified Platform: PC OS/Version: All Status: NEW Severity: major Priority: P2 Component: DMD AssignedTo: bugzilla@digitalmars.com ReportedBy: ddparnell@bigpond.com import std.utf; import std.stdio; void main() { char[] a; a = "\u3026\u2021\u3061\n"; writefln("plain"); validate(a); writefln("sorted"); validate(a.sort); // fails writefln("reversed"); validate(a.reverse); // fails } -- |
October 03, 2006 Re: [Issue 391] New: .sort and .reverse break utf8 encoding | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | d-bugmail@puremagic.com wrote: <snip> > import std.utf; > import std.stdio; > void main() > { > char[] a; > a = "\u3026\u2021\u3061\n"; > writefln("plain"); validate(a); > writefln("sorted"); validate(a.sort); // fails > writefln("reversed"); validate(a.reverse); // fails > } AIUI sort and reverse are defined to sort/reverse the individual elements of the array, rather than the Unicode characters that make up a string. But hmm.... Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:-@ C++@ a->--- UB@ P+ L E@ W++@ N+++ o K-@ w++@ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit. |
October 03, 2006 Re: [Issue 391] New: .sort and .reverse break utf8 encoding | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stewart Gordon | On Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote: > d-bugmail@puremagic.com wrote: > <snip> >> import std.utf; >> import std.stdio; >> void main() >> { >> char[] a; >> a = "\u3026\u2021\u3061\n"; >> writefln("plain"); validate(a); >> writefln("sorted"); validate(a.sort); // fails >> writefln("reversed"); validate(a.reverse); // fails >> } > > AIUI sort and reverse are defined to sort/reverse the individual elements of the array, rather than the Unicode characters that make up a string. But hmm.... Yes, I realize that but it makes Walter's statements that char[] is all we need and we do not need a 'string' a bit weaker. -- Derek Parnell Melbourne, Australia "Down with mediocrity!" |
October 04, 2006 Re: [Issue 391] New: .sort and .reverse break utf8 encoding | ||||
---|---|---|---|---|
| ||||
Posted in reply to Derek Parnell | Derek Parnell wrote:
> On Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote:
>
>> d-bugmail@puremagic.com wrote:
>>> writefln("sorted"); validate(a.sort); // fails
>>> writefln("reversed"); validate(a.reverse); // fails
>> AIUI sort and reverse are defined to sort/reverse the individual elements of the array, rather than the Unicode characters that make up a string. But hmm....
>
> Yes, I realize that but it makes Walter's statements that char[] is all we
> need and we do not need a 'string' a bit weaker.
.sort and .reverse should reverse the unicode characters. If you want to reverse/sort the individual bytes, you should cast it to a ubyte[] first.
Both behaviors will be fixed in the next update.
|
October 04, 2006 Re: [Issue 391] New: .sort and .reverse break utf8 encoding | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail Attachments: | d-bugmail@puremagic.com schrieb am 2006-10-02: > http://d.puremagic.com/issues/show_bug.cgi?id=391 > import std.utf; > import std.stdio; > void main() > { > char[] a; > a = "\u3026\u2021\u3061\n"; > writefln("plain"); validate(a); > writefln("sorted"); validate(a.sort); // fails > writefln("reversed"); validate(a.reverse); // fails > } Added to DStress as http://dstress.kuehne.cn/run/r/reverse_08_A.d http://dstress.kuehne.cn/run/r/reverse_08_B.d http://dstress.kuehne.cn/run/r/reverse_08_C.d http://dstress.kuehne.cn/run/s/sort_16_A.d http://dstress.kuehne.cn/run/s/sort_16_B.d http://dstress.kuehne.cn/run/s/sort_16_C.d Thomas |
October 04, 2006 .sort and .reverse break utf8 encoding | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright wrote:
> Derek Parnell wrote:
>> On Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote:
>>
>>> d-bugmail@puremagic.com wrote:
>>>> writefln("sorted"); validate(a.sort); // fails
>>>> writefln("reversed"); validate(a.reverse); // fails
>>> AIUI sort and reverse are defined to sort/reverse the individual elements of the array, rather than the Unicode characters that make up a string. But hmm....
>>
>> Yes, I realize that but it makes Walter's statements that char[] is all we
>> need and we do not need a 'string' a bit weaker.
>
> .sort and .reverse should reverse the unicode characters. If you want to reverse/sort the individual bytes, you should cast it to a ubyte[] first.
Changing the behavior of .reverse kind of makes sense, but I don't understand the reason for changing .sort aside from consistency. Personally, I've never had a reason to sort a char array in the first place unless the chars were intended to represent something other than their lexical meaning. And that aside, sorting chars in a string without a comparison predicate will do so using the char's binary value, which has no lexical significance beyond the 26 letters of the English alphabet (as represented in ASCII). I'm starting to feel like people are harping on Unicode issues just for the sake of doing so rather than because these are actual problems. Can someone please explain what I'm missing?
Sean
|
October 05, 2006 Re: .sort and .reverse break utf8 encoding | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly wrote:
> Changing the behavior of .reverse kind of makes sense, but I don't understand the reason for changing .sort aside from consistency. Personally, I've never had a reason to sort a char array in the first place unless the chars were intended to represent something other than their lexical meaning. And that aside, sorting chars in a string without a comparison predicate will do so using the char's binary value, which has no lexical significance beyond the 26 letters of the English alphabet (as represented in ASCII). I'm starting to feel like people are harping on Unicode issues just for the sake of doing so rather than because these are actual problems. Can someone please explain what I'm missing?
A use for it is collecting character usage frequency statistics is one such. Read a text file into a buffer, sort the buffer, and dump the result!
I don't mind the harping on it. Getting the details right is important, even if the details themselves aren't. Besides, it's an easy fix.
|
October 05, 2006 Re: .sort and .reverse break utf8 encoding | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly wrote:
> Walter Bright wrote:
>> Derek Parnell wrote:
>>> On Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote:
>>>
>>>> d-bugmail@puremagic.com wrote:
>>>>> writefln("sorted"); validate(a.sort); // fails
>>>>> writefln("reversed"); validate(a.reverse); // fails
>>>> AIUI sort and reverse are defined to sort/reverse the individual elements of the array, rather than the Unicode characters that make up a string. But hmm....
>>>
>>> Yes, I realize that but it makes Walter's statements that char[] is all we
>>> need and we do not need a 'string' a bit weaker.
>>
>> .sort and .reverse should reverse the unicode characters. If you want to reverse/sort the individual bytes, you should cast it to a ubyte[] first.
>
> Changing the behavior of .reverse kind of makes sense, but I don't understand the reason for changing .sort aside from consistency. Personally, I've never had a reason to sort a char array in the first place unless the chars were intended to represent something other than their lexical meaning. And that aside, sorting chars in a string without a comparison predicate will do so using the char's binary value, which has no lexical significance beyond the 26 letters of the English alphabet (as represented in ASCII).
What if you want to use a quick binary search look-up to see if a text contains a given character? ;)
Not that I've ever needed it, but it makes sense to just fix it.
How often do you .reverse a string, for that matter?
L.
|
October 10, 2006 [Issue 391] .sort and .reverse break utf8 encoding | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | http://d.puremagic.com/issues/show_bug.cgi?id=391 bugzilla@digitalmars.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from bugzilla@digitalmars.com 2006-10-10 03:29 ------- Fixed DMD 0.169 -- |
December 23, 2006 [Issue 391] .sort and .reverse break utf8 encoding | ||||
---|---|---|---|---|
| ||||
Posted in reply to d-bugmail | http://d.puremagic.com/issues/show_bug.cgi?id=391 thomas-dloop@kuehne.cn changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #3 from thomas-dloop@kuehne.cn 2006-12-23 07:10 ------- Process terminating with default action of signal 11 (SIGSEGV) Bad permissions for mapped region at address 0x805A0EC at 0x80544A3: _D3std8typeinfo8ti_dchar10TypeInfo_w4swapMFPvPvZv (in run/s/sort_16_A.d.exe) by 0x8050ACD: _adSort (in run/s/sort_16_A.d.exe) by 0x804A0F4: _Dmain (in run/s/sort_16_A.d:17) by 0x804BBE6: main (in run/s/sort_16_A.d.exe) -- |
Copyright © 1999-2021 by the D Language Foundation