May 14, 2014 Re: Why std.algorithm.sort can't be applied to char[]? | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Wed, 14 May 2014 08:27:45 +0000 monarch_dodra via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote: > On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via Digitalmars-d-learn wrote: > > Sure, you can cast char[] to ubyte[] and sort that if you know > > that the array > > only holds pure ASCII. In fact, you can use > > std.string.representation to do it > > - e.g. > > > > auto ascii = str.representation; > > > > and if str were mutable, then you could sort it. But that will > > only work if > > the string only contains ASCII characters. Regardless, he > > wanted to know why > > he couldn't sort char[], and I explained why - all strings are > > treated as > > ranges of dchar, making it so that if their element type is > > char or wchar, so > > they're not random access and thus can't be sorted. > > Arguably, a smart enough implementation should know how to sort a "char[]", while still preserving codepoint integrity. I don't think that that can be done at the same algorithmic complexity though. So, I don't know if that would be acceptable or not from the standpoint of std.algorithm.sort. But even if it's a good idea, someone would have to special case sort for char[], and no one has done that. > As a matter of fact, the built in "sort" property does it. > > void main() > { > char[] s = "éöeèûà".dup; > s.sort; > writeln(s); > } > //prints: > eàèéöû I'm surprised. I thought that one of Bearophile's favorite complaints was that it didn't sort unicode properly (and hence one of the reasons that it should be removed). Regardless, I do think that it should be removed. - Jonathan M Davis |
May 15, 2014 Re: Why std.algorithm.sort can't be applied to char[]? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote: > On Wed, 14 May 2014 08:27:45 +0000 > monarch_dodra via Digitalmars-d-learn > <digitalmars-d-learn@puremagic.com> wrote: > >> On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via >> Digitalmars-d-learn wrote: >> > Sure, you can cast char[] to ubyte[] and sort that if you know >> > that the array >> > only holds pure ASCII. In fact, you can use >> > std.string.representation to do it >> > - e.g. >> > >> > auto ascii = str.representation; >> > >> > and if str were mutable, then you could sort it. But that will >> > only work if >> > the string only contains ASCII characters. Regardless, he >> > wanted to know why >> > he couldn't sort char[], and I explained why - all strings are >> > treated as >> > ranges of dchar, making it so that if their element type is >> > char or wchar, so >> > they're not random access and thus can't be sorted. >> >> Arguably, a smart enough implementation should know how to sort a >> "char[]", while still preserving codepoint integrity. > > I don't think that that can be done at the same algorithmic complexity though. > So, I don't know if that would be acceptable or not from the standpoint of > std.algorithm.sort. But even if it's a good idea, someone would have to > special case sort for char[], and no one has done that. I think it can be done at O(nlgn) complexity, but you must allocate a block of scratch space to do the sorting. You can't do it in-place because swapping isn't available. Might as well convert to dchar and back explicitly. Merge sort may allow more promising speedups, but I still think you will need to allocate extra space. > >> As a matter of fact, the built in "sort" property does it. >> >> void main() >> { >> char[] s = "éöeèûà".dup; >> s.sort; >> writeln(s); >> } >> //prints: >> eàèéöû > > I'm surprised. I thought that one of Bearophile's favorite complaints was that > it didn't sort unicode properly (and hence one of the reasons that it should > be removed). Regardless, I do think that it should be removed. I can't believe this worked. I want to say that it's a freak accident for that set of characters. Looking in druntime, I don't see where the special case is. -Steve |
May 15, 2014 Re: Why std.algorithm.sort can't be applied to char[]? | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Wednesday, 14 May 2014 at 09:01:23 UTC, John Colvin wrote: > Why would anyone ever want to sort code-points? Why not? To remove duplicate characters? > They might want to sort graphemes, but that's difficult to do in-place (needs O(n) memory, I think...). If out-of-place is good enough > > someStr.byGrapheme.array.sort(); The current "status quo" in D is that a dchar basically represents a character. |
May 15, 2014 Re: Why std.algorithm.sort can't be applied to char[]? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Thursday, 15 May 2014 at 13:26:45 UTC, Steven Schveighoffer wrote: > On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote: > >> On Wed, 14 May 2014 08:27:45 +0000 >> monarch_dodra via Digitalmars-d-learn >> <digitalmars-d-learn@puremagic.com> wrote: >>> As a matter of fact, the built in "sort" property does it. >>> >>> void main() >>> { >>> char[] s = "éöeèûà".dup; >>> s.sort; >>> writeln(s); >>> } >>> //prints: >>> eàèéöû >> >> I'm surprised. I thought that one of Bearophile's favorite complaints was that >> it didn't sort unicode properly (and hence one of the reasons that it should >> be removed). Regardless, I do think that it should be removed. > > I can't believe this worked. I want to say that it's a freak accident for that set of characters. Looking in druntime, I don't see where the special case is. > > -Steve Must be a hell of a freak accident ;) auto s = "é東öe京ûタèワà".dup; writeln(s.sort); => eàèéöûタワ京東 It's in rt/adi.d extern (C) char[] _adSortChar(char[] a) It's basically: string=>dstring=>sort=>dstring=>string. BTW, the built in reverse also works with char[], and so does std.algorithm.reverse (and it does it pretty cleverly too, might I say). As far as I'm concerned, if we *can* do it in n.log(n), and somebody provides the implementation, then there is no reason to not offer dchar sorting for char[]/wchar. |
May 15, 2014 Re: Why std.algorithm.sort can't be applied to char[]? | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Thu, 15 May 2014 11:37:07 -0400, monarch_dodra <monarchdodra@gmail.com> wrote: > On Thursday, 15 May 2014 at 13:26:45 UTC, Steven Schveighoffer > wrote: >> On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote: >> >>> On Wed, 14 May 2014 08:27:45 +0000 >>> monarch_dodra via Digitalmars-d-learn >>> <digitalmars-d-learn@puremagic.com> wrote: >>>> As a matter of fact, the built in "sort" property does it. >>>> >>>> void main() >>>> { >>>> char[] s = "éöeèûà".dup; >>>> s.sort; >>>> writeln(s); >>>> } >>>> //prints: >>>> eàèéöû >>> >>> I'm surprised. I thought that one of Bearophile's favorite complaints was that >>> it didn't sort unicode properly (and hence one of the reasons that it should >>> be removed). Regardless, I do think that it should be removed. >> >> I can't believe this worked. I want to say that it's a freak accident for that set of characters. Looking in druntime, I don't see where the special case is. >> >> -Steve > > Must be a hell of a freak accident ;) > > auto s = "é東öe京ûタèワà".dup; > writeln(s.sort); > > => eàèéöûタワ京東 > > It's in rt/adi.d Seriously, I fucking hate the file names in druntime. I looked in qsort.d. > extern (C) char[] _adSortChar(char[] a) > > It's basically: string=>dstring=>sort=>dstring=>string. OK, that's what I would have expected. I don't think we should support this. > BTW, the built in reverse also works with char[], and so does std.algorithm.reverse (and it does it pretty cleverly too, might I say). This is a different algorithm. Sort requires random-access swapping. Reverse does not. > As far as I'm concerned, if we *can* do it in n.log(n), and somebody provides the implementation, then there is no reason to not offer dchar sorting for char[]/wchar. I think there is nothing wrong with requiring the steps to be explicit. We should not hide such bloat behind sort. -Steve |
May 15, 2014 Re: Why std.algorithm.sort can't be applied to char[]? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Thursday, 15 May 2014 at 17:46:52 UTC, Steven Schveighoffer wrote:
>> As far as I'm concerned, if we *can* do it in n.log(n), and somebody provides the implementation, then there is no reason to not offer dchar sorting for char[]/wchar.
>
> I think there is nothing wrong with requiring the steps to be explicit. We should not hide such bloat behind sort.
>
> -Steve
Of course, but I meant if we could find an algoirthm O(1) (or O(log(N)) space overhead.
If the only algorithm we are capable of providing just temporarily dstrings, then no.
|
Copyright © 1999-2021 by the D Language Foundation