Thread overview
Chars sorting and copies
Oct 21, 2011
bearophile
Oct 21, 2011
Jonathan M Davis
Oct 21, 2011
bearophile
Oct 23, 2011
bearophile
Oct 21, 2011
Kagamin
October 21, 2011
I have many strings and I want to use as associative array kay a sorted concat of two strings (it's a signature of the two strings):


import std.algorithm;
void main() {
    string a = "red";
    string b = "green";
    int[string] aa;
    //aa[(a ~ b).sort] = 1;
    //aa[(a ~ b).sort.idup] = 1;
    aa[(a.dup ~ b.dup).sort.idup] = 1;
}

I think the first way used to work, the second way was recently forbidden (and you can't use char[] as associative array key). The third way now works, but array.sort will go away.

So do you know what's a copy-minimizing (and possibly short) way to perform this, in future?

Bye and thank you,
bearophile
October 21, 2011
On Thursday, October 20, 2011 21:49:27 bearophile wrote:
> I have many strings and I want to use as associative array kay a sorted concat of two strings (it's a signature of the two strings):
> 
> 
> import std.algorithm;
> void main() {
>     string a = "red";
>     string b = "green";
>     int[string] aa;
>     //aa[(a ~ b).sort] = 1;
>     //aa[(a ~ b).sort.idup] = 1;
>     aa[(a.dup ~ b.dup).sort.idup] = 1;
> }
> 
> I think the first way used to work, the second way was recently forbidden (and you can't use char[] as associative array key). The third way now works, but array.sort will go away.
> 
> So do you know what's a copy-minimizing (and possibly short) way to perform
> this, in future?

The elements in a string are immutable. I don't see how the first twe sorts _ever_ worked, unless it was a bug. The reality of the matter is that if you want to sort an array, its elements must be mutable, but if you want to use it as a key in an AA, its elements must be immutable. So, you're either going to have to copy them or cast to immutable.

Also, you're going to have to use std.algorithm.sort eventually, and isn't going to work on chars or wchars (and what you're doing now is buggy unless you're restricting yourself to ASCII). So, you're either going to have to do one of

1. Use dchar[] to sort and then convert it to dstring for the key (either by casting or copying).

2. Use dchar[] to sort and then convert it to a string for the key (which results in a copy).

3. If you're using ASCII only, you could use char[], cast it to ubyte[] to sort it, and then cast it to string to use as the key (or idup it).

But sorting and and immutability are obviously at odds with one another, requiring either a copy or a cast, and sorting really isn't going to work with char[] or wchar[], unless you restrict yourself to values that fit in a UTF-8 or UTF-16 code unit.

- Jonathan M Davis
October 21, 2011
Jonathan M Davis:

> The elements in a string are immutable. I don't see how the first twe sorts _ever_ worked, unless it was a bug.

Sorry, I meant that it used to work in D1: http://codepad.org/HYdNktNd

Later I'll try some of your solutions.

Bye,
bearophile
October 21, 2011
On Thu, 20 Oct 2011 21:49:27 -0400, bearophile <bearophileHUGS@lycos.com> wrote:

> I have many strings and I want to use as associative array kay a sorted concat of two strings (it's a signature of the two strings):
>
>
> import std.algorithm;
> void main() {
>     string a = "red";
>     string b = "green";
>     int[string] aa;
>     //aa[(a ~ b).sort] = 1;
>     //aa[(a ~ b).sort.idup] = 1;
>     aa[(a.dup ~ b.dup).sort.idup] = 1;
> }
>
> I think the first way used to work, the second way was recently forbidden (and you can't use char[] as associative array key). The third way now works, but array.sort will go away.
>
> So do you know what's a copy-minimizing (and possibly short) way to perform this, in future?

a ~ b should technically be assignable to char[], since it's alread new memory.  We may yet get there with pure functions being able to implicit cast to immutable.

You could do this for now, but it's ugly:

char[] concatStr(string[] s...) pure
{
   auto len = 0;
   foreach(str; s) len += str.length;
   auto result = new char[len];
   auto slice = result;
   foreach(str; s)
   {
      slice[0..str.length] = str;
      slice = slice[str.length..$];
   }
   return result;
}

...

auto str = concatStr(a, b);
str.sort; // this is going to stop working, you'll have to cast to byte[]
aa[assumeUnique(str)] = 1;

There are also alternatives to AA (such as dcollections' HashMap) which are willing to take non-immutable keys.

-Steve
October 21, 2011
bearophile Wrote:

> I have many strings and I want to use as associative array kay a sorted concat of two strings (it's a signature of the two strings):
> 
> 
> import std.algorithm;
> void main() {
>     string a = "red";
>     string b = "green";
>     int[string] aa;
>     //aa[(a ~ b).sort] = 1;
>     //aa[(a ~ b).sort.idup] = 1;
>     aa[(a.dup ~ b.dup).sort.idup] = 1;
> }
> 
> I think the first way used to work, the second way was recently forbidden (and you can't use char[] as associative array key). The third way now works, but array.sort will go away.
> 
> So do you know what's a copy-minimizing (and possibly short) way to perform this, in future?
> 
> Bye and thank you,
> bearophile

http://www.d-programming-language.org/phobos/std_algorithm.html#setUnion ?
October 23, 2011
Thanks for all the answers.

Steven Schveighoffer:

> a ~ b should technically be assignable to char[], since it's alread new memory.  We may yet get there with pure functions being able to implicit cast to immutable.

Isn't that kind of the opposite?
Is this already in Bugzilla?

Some versions I have written, there is no very good solution, it seems:


import std.algorithm, std.exception;

void main() {
    string a = "red";
    string b = "green";
    int[string] aa;

    // Works in D1.
    // aa[(a ~ b).sort] = 1;

    // Works in D2, but will stop working.
    aa[(a.dup ~ b.dup).sort.idup] = 1;

    // Works in D2, but will stop working.
    char[] ab = (a.dup ~ b.dup).sort;
    aa[assumeUnique(ab)] = 2;

    // If keys don't actually need to be strings
    // the code gets a bit simpler.
    int[immutable(ubyte[])] aa2;

    // not good
    // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)))] = 3;

    // not good?
    // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)).release())] = 3;

    // just 1 copy, two visible casts, 3 lines
    auto ab4 = cast(ubyte[])(a ~ b);
    ab4.sort();
    aa2[cast(immutable(ubyte[]))ab4] = 3;

    // a visible cast and an hidden cast
    ubyte[] ab2 = cast(ubyte[])(a ~ b);
    ab2.sort();
    aa2[assumeUnique(ab2)] = 3;

    // 2 lines, more complex looking
    auto ab3 = sort(cast(ubyte[])(a ~ b)).release();
    aa2[assumeUnique(ab3)] = 3;
}


Bye,
bearophile
October 24, 2011
On Sat, 22 Oct 2011 23:01:17 -0400, bearophile <bearophileHUGS@lycos.com> wrote:

> Thanks for all the answers.
>
> Steven Schveighoffer:
>
>> a ~ b should technically be assignable to char[], since it's alread new
>> memory.  We may yet get there with pure functions being able to implicit
>> cast to immutable.
>
> Isn't that kind of the opposite?

No, if a ~ b would call a pure function that returns mutable when possible, then you could use it for mutable, immutable, or const.

> Is this already in Bugzilla?

Somewhat, although I like using the pure function avenue to ensure immutable does not violate const than the crazy rules we came up with here:
http://d.puremagic.com/issues/show_bug.cgi?id=1654

Note this was in the infancy of my D knowledge ;)

> Some versions I have written, there is no very good solution, it seems:
>
>
> import std.algorithm, std.exception;
>
> void main() {
>     string a = "red";
>     string b = "green";
>     int[string] aa;
>
>     // Works in D1.
>     // aa[(a ~ b).sort] = 1;
>
>     // Works in D2, but will stop working.
>     aa[(a.dup ~ b.dup).sort.idup] = 1;
>
>     // Works in D2, but will stop working.
>     char[] ab = (a.dup ~ b.dup).sort;
>     aa[assumeUnique(ab)] = 2;
>
>     // If keys don't actually need to be strings
>     // the code gets a bit simpler.
>     int[immutable(ubyte[])] aa2;
>
>     // not good
>     // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)))] = 3;
>
>     // not good?
>     // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)).release())] = 3;
>
>     // just 1 copy, two visible casts, 3 lines
>     auto ab4 = cast(ubyte[])(a ~ b);
>     ab4.sort();
>     aa2[cast(immutable(ubyte[]))ab4] = 3;
>
>     // a visible cast and an hidden cast
>     ubyte[] ab2 = cast(ubyte[])(a ~ b);
>     ab2.sort();
>     aa2[assumeUnique(ab2)] = 3;
>
>     // 2 lines, more complex looking
>     auto ab3 = sort(cast(ubyte[])(a ~ b)).release();
>     aa2[assumeUnique(ab3)] = 3;
> }

I honestly am not sure your specific problem is solvable without a lot of special cases.  It may be you have to write a sub-function that hides the mess.

-Steve