September 14, 2006
Chris Nicholson-Sauls wrote:
> /***********************************************************************************
>  * Return an array composed of common elements of two arrays.
>  */
> T[] intersect (T) (inout T[] alpha, inout T[] beta)
> body {
>   T[] result;
> 
>   foreach (x; alpha) {
>     if (beta.contains(x)) {
>       result ~= x;
>     }
>   }
>   return result;
> }
> unittest {
>   writef("\t.intersect(T[]) ----> ");
>   auto foo = array!(int)(1, 2, 3, 4, 5      );
>   auto bar = array!(int)(      3, 4, 5, 6, 7);
>   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
>   writefln("Pass");
> }
> 
> /***********************************************************************************
>  * Remove duplicate values from an array.
>  */
> void unique (T) (inout T[] haystack)
> body {
>   size_t ridx ;
> 
>   for (int i = 0; i < haystack.length; i++) {
>     ridx = size_t.max;
>     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
>       haystack.removeIndex(ridx);
>     }
>   }
> }
> unittest {
>   writef("\t.unique() ----------> ");
>   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
>   foo.unique();
>   assert(foo == array!(int)(1, 2, 3, 4, 5));
>   writefln("Pass");
> }
> 

These algorithms can be implemented more efficiently.  The running time for the intersect is O(n * m) (where n,m are the length of the two arrays).  However, what about the case where alpha and beta are ordered?  Here is a simple algorithm for getting the intersect:

T[] intersect (T) (inout T[] alpha, inout T[] beta)
body
{

/* 	Assume alpha, beta are ordered such that alpha[i] < alpha[i+1]
 *	and beta[i] < beta[i+1] for all i.
 */

        T[] result;
        size_t a_pos = 0;
        size_t b_pos = 0;

        while(a_pos < alpha.length && b_pos < beta.length)
        {
                if(alpha[a_pos] == beta[b_pos])
                {
                        result ~= alpha[a_pos];
                        ++a_pos;
                        ++b_pos;
                }
                else if(alpha[a_pos] < beta[b_pos])
                        ++a_pos;
                else if(beta[b_pos] < alpha[a_pos])
                        ++b_pos;
        }

        return result;
}


The cost for this algorithm is only O(n + m).  However, ordering the two arrays is effectively an O(nlogn + mlogm) operation.  Therefore the running time for this modified algorithm is overall O(nlogn + mlogm).  A similar argument can be made for the unique function.
September 14, 2006
Mikola Lysenko wrote:
> Chris Nicholson-Sauls wrote:
> 
>> /*********************************************************************************** 
>>
>>  * Return an array composed of common elements of two arrays.
>>  */
>> T[] intersect (T) (inout T[] alpha, inout T[] beta)
>> body {
>>   T[] result;
>>
>>   foreach (x; alpha) {
>>     if (beta.contains(x)) {
>>       result ~= x;
>>     }
>>   }
>>   return result;
>> }
>> unittest {
>>   writef("\t.intersect(T[]) ----> ");
>>   auto foo = array!(int)(1, 2, 3, 4, 5      );
>>   auto bar = array!(int)(      3, 4, 5, 6, 7);
>>   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
>>   writefln("Pass");
>> }
>>
>> /*********************************************************************************** 
>>
>>  * Remove duplicate values from an array.
>>  */
>> void unique (T) (inout T[] haystack)
>> body {
>>   size_t ridx ;
>>
>>   for (int i = 0; i < haystack.length; i++) {
>>     ridx = size_t.max;
>>     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
>>       haystack.removeIndex(ridx);
>>     }
>>   }
>> }
>> unittest {
>>   writef("\t.unique() ----------> ");
>>   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
>>   foo.unique();
>>   assert(foo == array!(int)(1, 2, 3, 4, 5));
>>   writefln("Pass");
>> }
>>
> 
> These algorithms can be implemented more efficiently.  The running time for the intersect is O(n * m) (where n,m are the length of the two arrays).  However, what about the case where alpha and beta are ordered?  Here is a simple algorithm for getting the intersect:
> 
> T[] intersect (T) (inout T[] alpha, inout T[] beta)
> body
> {
> 
> /*     Assume alpha, beta are ordered such that alpha[i] < alpha[i+1]
>  *    and beta[i] < beta[i+1] for all i.
>  */
> 
>         T[] result;
>         size_t a_pos = 0;
>         size_t b_pos = 0;
> 
>         while(a_pos < alpha.length && b_pos < beta.length)
>         {
>                 if(alpha[a_pos] == beta[b_pos])
>                 {
>                         result ~= alpha[a_pos];
>                         ++a_pos;
>                         ++b_pos;
>                 }
>                 else if(alpha[a_pos] < beta[b_pos])
>                         ++a_pos;
>                 else if(beta[b_pos] < alpha[a_pos])
>                         ++b_pos;
>         }
> 
>         return result;
> }
> 
> 
> The cost for this algorithm is only O(n + m).  However, ordering the two arrays is effectively an O(nlogn + mlogm) operation.  Therefore the running time for this modified algorithm is overall O(nlogn + mlogm).  A similar argument can be made for the unique function.

True, and not a bad idea.  I'm not sure how it would be with arrays of classes/structs which have no defined order.  I'll try a rewrite and post it later.  :)  Maybe I should just ask for a DSource entry for Cashew.

-- Chris Nicholson-Sauls
September 14, 2006
Chris Nicholson-Sauls wrote:
> Mikola Lysenko wrote:
> 
>> Chris Nicholson-Sauls wrote:
>>
>>> /***********************************************************************************
>>>
>>>  * Return an array composed of common elements of two arrays.
>>>  */
>>> T[] intersect (T) (inout T[] alpha, inout T[] beta)
>>> body {
>>>   T[] result;
>>>
>>>   foreach (x; alpha) {
>>>     if (beta.contains(x)) {
>>>       result ~= x;
>>>     }
>>>   }
>>>   return result;
>>> }
>>> unittest {
>>>   writef("\t.intersect(T[]) ----> ");
>>>   auto foo = array!(int)(1, 2, 3, 4, 5      );
>>>   auto bar = array!(int)(      3, 4, 5, 6, 7);
>>>   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
>>>   writefln("Pass");
>>> }
>>>
>>> /***********************************************************************************
>>>
>>>  * Remove duplicate values from an array.
>>>  */
>>> void unique (T) (inout T[] haystack)
>>> body {
>>>   size_t ridx ;
>>>
>>>   for (int i = 0; i < haystack.length; i++) {
>>>     ridx = size_t.max;
>>>     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
>>>       haystack.removeIndex(ridx);
>>>     }
>>>   }
>>> }
>>> unittest {
>>>   writef("\t.unique() ----------> ");
>>>   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
>>>   foo.unique();
>>>   assert(foo == array!(int)(1, 2, 3, 4, 5));
>>>   writefln("Pass");
>>> }
>>>
>>
>> These algorithms can be implemented more efficiently.  The running time for the intersect is O(n * m) (where n,m are the length of the two arrays).  However, what about the case where alpha and beta are ordered?  Here is a simple algorithm for getting the intersect:
>>
>> T[] intersect (T) (inout T[] alpha, inout T[] beta)
>> body
>> {
>>
>> /*     Assume alpha, beta are ordered such that alpha[i] < alpha[i+1]
>>  *    and beta[i] < beta[i+1] for all i.
>>  */
>>
>>         T[] result;
>>         size_t a_pos = 0;
>>         size_t b_pos = 0;
>>
>>         while(a_pos < alpha.length && b_pos < beta.length)
>>         {
>>                 if(alpha[a_pos] == beta[b_pos])
>>                 {
>>                         result ~= alpha[a_pos];
>>                         ++a_pos;
>>                         ++b_pos;
>>                 }
>>                 else if(alpha[a_pos] < beta[b_pos])
>>                         ++a_pos;
>>                 else if(beta[b_pos] < alpha[a_pos])
>>                         ++b_pos;
>>         }
>>
>>         return result;
>> }
>>
>>
>> The cost for this algorithm is only O(n + m).  However, ordering the two arrays is effectively an O(nlogn + mlogm) operation.  Therefore the running time for this modified algorithm is overall O(nlogn + mlogm).  A similar argument can be made for the unique function.
> 
> 
> True, and not a bad idea.  I'm not sure how it would be with arrays of classes/structs which have no defined order.  I'll try a rewrite and post it later.  :)  Maybe I should just ask for a DSource entry for Cashew.
> 
> -- Chris Nicholson-Sauls

Well I just tried it and... woo, intersect is notably faster indeed.  I did a benchmark with arrays of 1000 elements (random values), reset and intersected 1000 times, and the numbers are striking:

<Benchmark old> 19.330000
<Benchmark new> 3.080000

I'll rewrite unique later, for now here's an attachment of the module with the rewritten intersect, which is actually almost identical to yours, so I'll be sure and include credit for it.  :)

-- Chris Nicholson-Sauls


September 14, 2006
Chris Nicholson-Sauls wrote:
> 
> 
> Reiner Pope wrote:
>> Chris Nicholson-Sauls wrote:
>>
>>> I've been pushing for some array utilities to get into Phobos, yes, but in the meantime I updated the array module in my personal Cashew lib.  Included in the updates are the rotl/rotr that I recall someone asking about.  In the process I've discovered two bugs, as well: the behavior of 'is' versus '==' is incompatable if the operands are arrays, and DDoc for abbreviated function templates is borked.  For an example of the latter, just look at Cashew's own docs.
>>>
>>> That said: if people think this is a decent collection, and Walter would take it, I would be willing to release it to public domain for Phobos' sake.
>>>
>>> The array module is attached, and the docs are at:
>>> http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html
>>>
>>> -- Christopher Nicholson-Sauls
>>>
>> It looks good. Thanks for doing this.
>>
>> Some comments/questions:
>>
>> Why does indexOfSub have bale as 'inout', not simply 'in'?
>> Same question for fill
> 
> Because I'm mildly paranoid and wanted to avoid memory allocation where I could.  :)  I do suppose it could safely be 'in' though.
> 
Passing by 'inout' is probably less efficient and certainly doesn't save any memory allocation. 'inout' is equivalent to passing a pointer to the array reference, which itself contains a pointer to the data. The extra level of indirection can only add computation time.
>> Also, couldn't you avoid the temporary in removeAll by doing this:
>> void removeAll (T) (inout T[] haystack, T needle)
>> {
>>   size_t index = 0U;
>>   size_t indices;
>>
>>   while ((index = haystack.indexOf(needle, index)) != size_t.max) {
>>     haystack.removeIndex(index);
>>   }
>> }
> 
> That's exactly how it was written (once I'd modified .indexOf) but it doesn't act quite right for some reason.  Until I figure out what the bug is, I figured I could slap together a less pretty but working version.
This passes the unit tests, though.

Cheers,

Reiner
September 16, 2006
Reiner Pope wrote:
> Chris Nicholson-Sauls wrote:
> 
>>
>>
>> Reiner Pope wrote:
>>
>>> Chris Nicholson-Sauls wrote:
>>>
>>>> I've been pushing for some array utilities to get into Phobos, yes, but in the meantime I updated the array module in my personal Cashew lib.  Included in the updates are the rotl/rotr that I recall someone asking about.  In the process I've discovered two bugs, as well: the behavior of 'is' versus '==' is incompatable if the operands are arrays, and DDoc for abbreviated function templates is borked.  For an example of the latter, just look at Cashew's own docs.
>>>>
>>>> That said: if people think this is a decent collection, and Walter would take it, I would be willing to release it to public domain for Phobos' sake.
>>>>
>>>> The array module is attached, and the docs are at: http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html
>>>>
>>>> -- Christopher Nicholson-Sauls
>>>>
>>> It looks good. Thanks for doing this.
>>>
>>> Some comments/questions:
>>>
>>> Why does indexOfSub have bale as 'inout', not simply 'in'? Same question for fill
>>
>>
>> Because I'm mildly paranoid and wanted to avoid memory allocation where I could.  :)  I do suppose it could safely be 'in' though.
>>
> Passing by 'inout' is probably less efficient and certainly doesn't save any memory allocation. 'inout' is equivalent to passing a pointer to the array reference, which itself contains a pointer to the data. The extra level of indirection can only add computation time.

Possibly true, so for now I took out some of the 'inout's.

>>> Also, couldn't you avoid the temporary in removeAll by doing this:
>>> void removeAll (T) (inout T[] haystack, T needle)
>>> {
>>>   size_t index = 0U;
>>>   size_t indices;
>>>
>>>   while ((index = haystack.indexOf(needle, index)) != size_t.max) {
>>>     haystack.removeIndex(index);
>>>   }
>>> }
>>
>>
>> That's exactly how it was written (once I'd modified .indexOf) but it doesn't act quite right for some reason.  Until I figure out what the bug is, I figured I could slap together a less pretty but working version.
> 
> This passes the unit tests, though.
> 
> Cheers,
> 
> Reiner

So it does... although I could swear that it didn't behave right before.  Ah well, it works fine now.

Also I've written a new version of .unique(), in the process of which I've added a .removeRange() function.  A quick note on .removeRange() though: it mimics the slicing logic of D arrays.  So, foo.removeRange(1, 4) actually removes [1, 2, 3], not [1, 2, 3, 4].

-- Chris Nicholson-Sauls


1 2
Next ›   Last »