View mode: basic / threaded / horizontal-split · Log in · Help
October 21, 2011
Chars sorting and copies
I have many strings and I want to use as associative array kay a sorted concat of two strings (it's a signature of the two strings):


import std.algorithm;
void main() {
   string a = "red";
   string b = "green";
   int[string] aa;
   //aa[(a ~ b).sort] = 1;
   //aa[(a ~ b).sort.idup] = 1;
   aa[(a.dup ~ b.dup).sort.idup] = 1;
}

I think the first way used to work, the second way was recently forbidden (and you can't use char[] as associative array key). The third way now works, but array.sort will go away.

So do you know what's a copy-minimizing (and possibly short) way to perform this, in future?

Bye and thank you,
bearophile
October 21, 2011
Re: Chars sorting and copies
On Thursday, October 20, 2011 21:49:27 bearophile wrote:
> I have many strings and I want to use as associative array kay a sorted
> concat of two strings (it's a signature of the two strings):
> 
> 
> import std.algorithm;
> void main() {
>     string a = "red";
>     string b = "green";
>     int[string] aa;
>     //aa[(a ~ b).sort] = 1;
>     //aa[(a ~ b).sort.idup] = 1;
>     aa[(a.dup ~ b.dup).sort.idup] = 1;
> }
> 
> I think the first way used to work, the second way was recently forbidden
> (and you can't use char[] as associative array key). The third way now
> works, but array.sort will go away.
> 
> So do you know what's a copy-minimizing (and possibly short) way to perform
> this, in future?

The elements in a string are immutable. I don't see how the first twe sorts 
_ever_ worked, unless it was a bug. The reality of the matter is that if you 
want to sort an array, its elements must be mutable, but if you want to use it 
as a key in an AA, its elements must be immutable. So, you're either going to 
have to copy them or cast to immutable.

Also, you're going to have to use std.algorithm.sort eventually, and isn't 
going to work on chars or wchars (and what you're doing now is buggy unless 
you're restricting yourself to ASCII). So, you're either going to have to do 
one of

1. Use dchar[] to sort and then convert it to dstring for the key (either by 
casting or copying).

2. Use dchar[] to sort and then convert it to a string for the key (which 
results in a copy).

3. If you're using ASCII only, you could use char[], cast it to ubyte[] to 
sort it, and then cast it to string to use as the key (or idup it).

But sorting and and immutability are obviously at odds with one another, 
requiring either a copy or a cast, and sorting really isn't going to work with 
char[] or wchar[], unless you restrict yourself to values that fit in a UTF-8 
or UTF-16 code unit.

- Jonathan M Davis
October 21, 2011
Re: Chars sorting and copies
Jonathan M Davis:

> The elements in a string are immutable. I don't see how the first twe sorts 
> _ever_ worked, unless it was a bug.

Sorry, I meant that it used to work in D1:
http://codepad.org/HYdNktNd

Later I'll try some of your solutions.

Bye,
bearophile
October 21, 2011
Re: Chars sorting and copies
On Thu, 20 Oct 2011 21:49:27 -0400, bearophile <bearophileHUGS@lycos.com>  
wrote:

> I have many strings and I want to use as associative array kay a sorted  
> concat of two strings (it's a signature of the two strings):
>
>
> import std.algorithm;
> void main() {
>     string a = "red";
>     string b = "green";
>     int[string] aa;
>     //aa[(a ~ b).sort] = 1;
>     //aa[(a ~ b).sort.idup] = 1;
>     aa[(a.dup ~ b.dup).sort.idup] = 1;
> }
>
> I think the first way used to work, the second way was recently  
> forbidden (and you can't use char[] as associative array key). The third  
> way now works, but array.sort will go away.
>
> So do you know what's a copy-minimizing (and possibly short) way to  
> perform this, in future?

a ~ b should technically be assignable to char[], since it's alread new  
memory.  We may yet get there with pure functions being able to implicit  
cast to immutable.

You could do this for now, but it's ugly:

char[] concatStr(string[] s...) pure
{
   auto len = 0;
   foreach(str; s) len += str.length;
   auto result = new char[len];
   auto slice = result;
   foreach(str; s)
   {
      slice[0..str.length] = str;
      slice = slice[str.length..$];
   }
   return result;
}

...

auto str = concatStr(a, b);
str.sort; // this is going to stop working, you'll have to cast to byte[]
aa[assumeUnique(str)] = 1;

There are also alternatives to AA (such as dcollections' HashMap) which  
are willing to take non-immutable keys.

-Steve
October 21, 2011
Re: Chars sorting and copies
bearophile Wrote:

> I have many strings and I want to use as associative array kay a sorted concat of two strings (it's a signature of the two strings):
> 
> 
> import std.algorithm;
> void main() {
>     string a = "red";
>     string b = "green";
>     int[string] aa;
>     //aa[(a ~ b).sort] = 1;
>     //aa[(a ~ b).sort.idup] = 1;
>     aa[(a.dup ~ b.dup).sort.idup] = 1;
> }
> 
> I think the first way used to work, the second way was recently forbidden (and you can't use char[] as associative array key). The third way now works, but array.sort will go away.
> 
> So do you know what's a copy-minimizing (and possibly short) way to perform this, in future?
> 
> Bye and thank you,
> bearophile

http://www.d-programming-language.org/phobos/std_algorithm.html#setUnion ?
October 23, 2011
Re: Chars sorting and copies
Thanks for all the answers.

Steven Schveighoffer:

> a ~ b should technically be assignable to char[], since it's alread new  
> memory.  We may yet get there with pure functions being able to implicit  
> cast to immutable.

Isn't that kind of the opposite?
Is this already in Bugzilla?

Some versions I have written, there is no very good solution, it seems:


import std.algorithm, std.exception;

void main() {
   string a = "red";
   string b = "green";
   int[string] aa;

   // Works in D1.
   // aa[(a ~ b).sort] = 1;

   // Works in D2, but will stop working.
   aa[(a.dup ~ b.dup).sort.idup] = 1;

   // Works in D2, but will stop working.
   char[] ab = (a.dup ~ b.dup).sort;
   aa[assumeUnique(ab)] = 2;

   // If keys don't actually need to be strings
   // the code gets a bit simpler.
   int[immutable(ubyte[])] aa2;

   // not good
   // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)))] = 3;

   // not good?
   // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)).release())] = 3;

   // just 1 copy, two visible casts, 3 lines
   auto ab4 = cast(ubyte[])(a ~ b);
   ab4.sort();
   aa2[cast(immutable(ubyte[]))ab4] = 3;

   // a visible cast and an hidden cast
   ubyte[] ab2 = cast(ubyte[])(a ~ b);
   ab2.sort();
   aa2[assumeUnique(ab2)] = 3;

   // 2 lines, more complex looking
   auto ab3 = sort(cast(ubyte[])(a ~ b)).release();
   aa2[assumeUnique(ab3)] = 3;
}


Bye,
bearophile
October 24, 2011
Re: Chars sorting and copies
On Sat, 22 Oct 2011 23:01:17 -0400, bearophile <bearophileHUGS@lycos.com>  
wrote:

> Thanks for all the answers.
>
> Steven Schveighoffer:
>
>> a ~ b should technically be assignable to char[], since it's alread new
>> memory.  We may yet get there with pure functions being able to implicit
>> cast to immutable.
>
> Isn't that kind of the opposite?

No, if a ~ b would call a pure function that returns mutable when  
possible, then you could use it for mutable, immutable, or const.

> Is this already in Bugzilla?

Somewhat, although I like using the pure function avenue to ensure  
immutable does not violate const than the crazy rules we came up with here:
http://d.puremagic.com/issues/show_bug.cgi?id=1654

Note this was in the infancy of my D knowledge ;)

> Some versions I have written, there is no very good solution, it seems:
>
>
> import std.algorithm, std.exception;
>
> void main() {
>     string a = "red";
>     string b = "green";
>     int[string] aa;
>
>     // Works in D1.
>     // aa[(a ~ b).sort] = 1;
>
>     // Works in D2, but will stop working.
>     aa[(a.dup ~ b.dup).sort.idup] = 1;
>
>     // Works in D2, but will stop working.
>     char[] ab = (a.dup ~ b.dup).sort;
>     aa[assumeUnique(ab)] = 2;
>
>     // If keys don't actually need to be strings
>     // the code gets a bit simpler.
>     int[immutable(ubyte[])] aa2;
>
>     // not good
>     // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)))] = 3;
>
>     // not good?
>     // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)).release())] = 3;
>
>     // just 1 copy, two visible casts, 3 lines
>     auto ab4 = cast(ubyte[])(a ~ b);
>     ab4.sort();
>     aa2[cast(immutable(ubyte[]))ab4] = 3;
>
>     // a visible cast and an hidden cast
>     ubyte[] ab2 = cast(ubyte[])(a ~ b);
>     ab2.sort();
>     aa2[assumeUnique(ab2)] = 3;
>
>     // 2 lines, more complex looking
>     auto ab3 = sort(cast(ubyte[])(a ~ b)).release();
>     aa2[assumeUnique(ab3)] = 3;
> }

I honestly am not sure your specific problem is solvable without a lot of  
special cases.  It may be you have to write a sub-function that hides the  
mess.

-Steve
Top | Discussion index | About this forum | D home