May 14, 2014
On Wed, 14 May 2014 08:27:45 +0000
monarch_dodra via Digitalmars-d-learn
<digitalmars-d-learn@puremagic.com> wrote:

> On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:
> > Sure, you can cast char[] to ubyte[] and sort that if you know
> > that the array
> > only holds pure ASCII. In fact, you can use
> > std.string.representation to do it
> > - e.g.
> >
> > auto ascii = str.representation;
> >
> > and if str were mutable, then you could sort it. But that will
> > only work if
> > the string only contains ASCII characters. Regardless, he
> > wanted to know why
> > he couldn't sort char[], and I explained why - all strings are
> > treated as
> > ranges of dchar, making it so that if their element type is
> > char or wchar, so
> > they're not random access and thus can't be sorted.
>
> Arguably, a smart enough implementation should know how to sort a "char[]", while still preserving codepoint integrity.

I don't think that that can be done at the same algorithmic complexity though. So, I don't know if that would be acceptable or not from the standpoint of std.algorithm.sort. But even if it's a good idea, someone would have to special case sort for char[], and no one has done that.

> As a matter of fact, the built in "sort" property does it.
>
> void main()
> {
>      char[] s = "éöeèûà".dup;
>      s.sort;
>      writeln(s);
> }
> //prints:
> eàèéöû

I'm surprised. I thought that one of Bearophile's favorite complaints was that it didn't sort unicode properly (and hence one of the reasons that it should be removed). Regardless, I do think that it should be removed.

- Jonathan M Davis

May 15, 2014
On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote:

> On Wed, 14 May 2014 08:27:45 +0000
> monarch_dodra via Digitalmars-d-learn
> <digitalmars-d-learn@puremagic.com> wrote:
>
>> On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via
>> Digitalmars-d-learn wrote:
>> > Sure, you can cast char[] to ubyte[] and sort that if you know
>> > that the array
>> > only holds pure ASCII. In fact, you can use
>> > std.string.representation to do it
>> > - e.g.
>> >
>> > auto ascii = str.representation;
>> >
>> > and if str were mutable, then you could sort it. But that will
>> > only work if
>> > the string only contains ASCII characters. Regardless, he
>> > wanted to know why
>> > he couldn't sort char[], and I explained why - all strings are
>> > treated as
>> > ranges of dchar, making it so that if their element type is
>> > char or wchar, so
>> > they're not random access and thus can't be sorted.
>>
>> Arguably, a smart enough implementation should know how to sort a
>> "char[]", while still preserving codepoint integrity.
>
> I don't think that that can be done at the same algorithmic complexity though.
> So, I don't know if that would be acceptable or not from the standpoint of
> std.algorithm.sort. But even if it's a good idea, someone would have to
> special case sort for char[], and no one has done that.

I think it can be done at O(nlgn) complexity, but you must allocate a block of scratch space to do the sorting. You can't do it in-place because swapping isn't available.

Might as well convert to dchar and back explicitly.

Merge sort may allow more promising speedups, but I still think you will need to allocate extra space.

>
>> As a matter of fact, the built in "sort" property does it.
>>
>> void main()
>> {
>>      char[] s = "éöeèûà".dup;
>>      s.sort;
>>      writeln(s);
>> }
>> //prints:
>> eàèéöû
>
> I'm surprised. I thought that one of Bearophile's favorite complaints was that
> it didn't sort unicode properly (and hence one of the reasons that it should
> be removed). Regardless, I do think that it should be removed.

I can't believe this worked. I want to say that it's a freak accident for that set of characters. Looking in druntime, I don't see where the special case is.

-Steve
May 15, 2014
On Wednesday, 14 May 2014 at 09:01:23 UTC, John Colvin wrote:
> Why would anyone ever want to sort code-points?

Why not? To remove duplicate characters?

> They might want to sort graphemes, but that's difficult to do in-place (needs O(n) memory, I think...). If out-of-place is good enough
>
> someStr.byGrapheme.array.sort();

The current "status quo" in D is that a dchar basically
represents a character.
May 15, 2014
On Thursday, 15 May 2014 at 13:26:45 UTC, Steven Schveighoffer
wrote:
> On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote:
>
>> On Wed, 14 May 2014 08:27:45 +0000
>> monarch_dodra via Digitalmars-d-learn
>> <digitalmars-d-learn@puremagic.com> wrote:
>>> As a matter of fact, the built in "sort" property does it.
>>>
>>> void main()
>>> {
>>>     char[] s = "éöeèûà".dup;
>>>     s.sort;
>>>     writeln(s);
>>> }
>>> //prints:
>>> eàèéöû
>>
>> I'm surprised. I thought that one of Bearophile's favorite complaints was that
>> it didn't sort unicode properly (and hence one of the reasons that it should
>> be removed). Regardless, I do think that it should be removed.
>
> I can't believe this worked. I want to say that it's a freak accident for that set of characters. Looking in druntime, I don't see where the special case is.
>
> -Steve

Must be a hell of a freak accident ;)

     auto s = "é東öe京ûタèワà".dup;
     writeln(s.sort);

=> eàèéöûタワ京東

It's in rt/adi.d

extern (C) char[] _adSortChar(char[] a)

It's basically: string=>dstring=>sort=>dstring=>string.



BTW, the built in reverse also works with char[], and so does std.algorithm.reverse (and it does it pretty cleverly too, might I say).

As far as I'm concerned, if we *can* do it in n.log(n), and somebody provides the implementation, then there is no reason to not offer dchar sorting for char[]/wchar.
May 15, 2014
On Thu, 15 May 2014 11:37:07 -0400, monarch_dodra <monarchdodra@gmail.com> wrote:

> On Thursday, 15 May 2014 at 13:26:45 UTC, Steven Schveighoffer
> wrote:
>> On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote:
>>
>>> On Wed, 14 May 2014 08:27:45 +0000
>>> monarch_dodra via Digitalmars-d-learn
>>> <digitalmars-d-learn@puremagic.com> wrote:
>>>> As a matter of fact, the built in "sort" property does it.
>>>>
>>>> void main()
>>>> {
>>>>     char[] s = "éöeèûà".dup;
>>>>     s.sort;
>>>>     writeln(s);
>>>> }
>>>> //prints:
>>>> eàèéöû
>>>
>>> I'm surprised. I thought that one of Bearophile's favorite complaints was that
>>> it didn't sort unicode properly (and hence one of the reasons that it should
>>> be removed). Regardless, I do think that it should be removed.
>>
>> I can't believe this worked. I want to say that it's a freak accident for that set of characters. Looking in druntime, I don't see where the special case is.
>>
>> -Steve
>
> Must be a hell of a freak accident ;)
>
>       auto s = "é東öe京ûタèワà".dup;
>       writeln(s.sort);
>
> => eàèéöûタワ京東
>
> It's in rt/adi.d

Seriously, I fucking hate the file names in druntime.

I looked in qsort.d.

> extern (C) char[] _adSortChar(char[] a)
>
> It's basically: string=>dstring=>sort=>dstring=>string.

OK, that's what I would have expected. I don't think we should support this.

> BTW, the built in reverse also works with char[], and so does std.algorithm.reverse (and it does it pretty cleverly too, might I say).

This is a different algorithm. Sort requires random-access swapping. Reverse does not.

> As far as I'm concerned, if we *can* do it in n.log(n), and somebody provides the implementation, then there is no reason to not offer dchar sorting for char[]/wchar.

I think there is nothing wrong with requiring the steps to be explicit. We should not hide such bloat behind sort.

-Steve
May 15, 2014
On Thursday, 15 May 2014 at 17:46:52 UTC, Steven Schveighoffer wrote:
>> As far as I'm concerned, if we *can* do it in n.log(n), and somebody provides the implementation, then there is no reason to not offer dchar sorting for char[]/wchar.
>
> I think there is nothing wrong with requiring the steps to be explicit. We should not hide such bloat behind sort.
>
> -Steve

Of course, but I meant if we could find an algoirthm O(1) (or O(log(N)) space overhead.

If the only algorithm we are capable of providing just temporarily dstrings, then no.
1 2
Next ›   Last »