October 31, 2006
Karen Lanrap wrote:
> Bill Baxter wrote:
> 
>> best answer in terms of efficiency
> 
> For me removing an element does not affect other elements.

Then I think you want a linked list, not an array.

--bb
October 31, 2006
Bill Baxter wrote:

> Then I think you want a linked list, not an array.

The diversity of answers in this thread shows that there is no canonical remove operation for arrays. One can define some operation to be called a "remove" operation, but that is by pure will, when the invariants that have to be maintained are unknown.
October 31, 2006
Bill Baxter wrote:
> David Medlock wrote:
> 
>> Bill Baxter wrote:
>>
>>> How do I remove an element from a dynamic array?
>>>
>>>    int[] a = [1,2,3,4,5];
>>>
>>> I tried every syntax I could think of:
>>>
>>>    a[3] = void;
>>>    a[3..4] = void;
>>>    a[3..4] = a[3..3];
>>>    a[3] = [];
>>>    a[3..4] = [];
>>>    delete a[3];
>>>    delete a[3..4];
>>>
>>> The last one compiles, but fills a[0] with garbage.
>>>
>>> I hope the answer isn't:
>>>
>>>    a = a[0..3] ~ a[4..length];
>>>
>>> Thanks!
>>> --bb
>>
>>
>>
>>  From my personal tools library...
>>
>> // remove an item from an array
>> template drop(T)
>> {
>>   T drop( inout T[] arr, int which )
>>   {
>>     debug if ( which>=arr.length)
>>         throw new Exception(str.format("Attempt to drop position %s from size %s",which,arr.length));
>>     T result = arr[which];
>>     int max = arr.length-1;
>>     for (; which < max; which++ ) arr[which]=arr[which+1];
>>     arr.length= max;
>>     return result;
>>   }
>> }

Something you might find interesting/indicative -- I wrote a little test program to time your function versus that found in Cashew, and with the slicing expression above.  Output:

# Array length: 10000
# Iterations removing middle element: 5000
#  ---------- Slice syntax: a[0 .. mid] ~ [mid + 1 .. a.length]
# <Benchmark array removals> Baseline 3.460000
#  ---------- CashewUtils: a.removeIndex(mid)
# <Benchmark array removals> Time 2.970000 & 1.164983 versus baseline
#  ---------- Drop: version 1
# <Benchmark array removals> Time 0.270000 & 12.814815 versus baseline
#  ---------- Drop: version 2
# <Benchmark array removals> Time 0.390000 & 8.871795 versus baseline

Drop v2 was an attempt of mine at rewriting your .drop() that, obviously, falls a little short of the original in performance.  I also find it odd that Cashew's .removeIndex() is actually timing a little master than the slice expression, because that's actually it is: a wrapper around that very same expression.  *confused*  Anyhow, clearly your .drop() is running very fast!  I actually expected the opposite, and am surprised/impressed.

Maybe you should release it into Cashew?  ;)  And maybe I should put more into benchmarking the entire Cashew array module.  There might be a few more placed it could be sped up.

>>
>> Even though it returns the array, it modifies it in place.
>>
>> -DavidM
> 
> 
> Thanks David, this seems like the best answer in terms of efficiency, although I suppose the
>     a = a[0..3] ~ a[4..length];
> could theoretically be pretty similar depending how the compiler implements it.
> 
> 
> There really should be syntax in the language or at least functions like the above in the standard library for this (and for removing slices). 

# import cashew.utils.array;
#
# int[] foo = ... ;
# foo.remove(3U);
# foo.removeRange(2U, 7U);
# foo.removeSub(foo[1 .. 5]);
etc.

That's why I maintain those.  Hopefully, eventually, it does spawn a standard library module.  (I'm always looking for ways to further optimize them, but I do want to keep them fairly simple.)

-- Chris Nicholson-Sauls
October 31, 2006
Chris Nicholson-Sauls wrote:
> 
> Anyhow, clearly your .drop() is running very fast!  I actually expected the opposite, and am surprised/impressed.
> 
> Maybe you should release it into Cashew?  ;)  And maybe I should put more into benchmarking the entire Cashew array module.  There might be a few more placed it could be sped up.

I'd be interested in seeing a test using C's memmove as well.  Though I'm not surprised drop beats the slice version, since that will allocate a new array for the data.


Sean
November 01, 2006
Chris Nicholson-Sauls wrote:
> Bill Baxter wrote:
> 
>> David Medlock wrote:
>>
>>> Bill Baxter wrote:
>>>
>>>> How do I remove an element from a dynamic array?
>>>>
>>>>    int[] a = [1,2,3,4,5];
>>>>
>>>> I tried every syntax I could think of:
>>>>
>>>>    a[3] = void;
>>>>    a[3..4] = void;
>>>>    a[3..4] = a[3..3];
>>>>    a[3] = [];
>>>>    a[3..4] = [];
>>>>    delete a[3];
>>>>    delete a[3..4];
>>>>
>>>> The last one compiles, but fills a[0] with garbage.
>>>>
>>>> I hope the answer isn't:
>>>>
>>>>    a = a[0..3] ~ a[4..length];
>>>>
>>>> Thanks!
>>>> --bb
>>>
>>>
>>>
>>>
>>>  From my personal tools library...
>>>
>>> // remove an item from an array
>>> template drop(T)
>>> {
>>>   T drop( inout T[] arr, int which )
>>>   {
>>>     debug if ( which>=arr.length)
>>>         throw new Exception(str.format("Attempt to drop position %s from size %s",which,arr.length));
>>>     T result = arr[which];
>>>     int max = arr.length-1;
>>>     for (; which < max; which++ ) arr[which]=arr[which+1];
>>>     arr.length= max;
>>>     return result;
>>>   }
>>> }
> 
> 
> Something you might find interesting/indicative -- I wrote a little test program to time your function versus that found in Cashew, and with the slicing expression above.  Output:
> 
> # Array length: 10000
> # Iterations removing middle element: 5000
> #  ---------- Slice syntax: a[0 .. mid] ~ [mid + 1 .. a.length]
> # <Benchmark array removals> Baseline 3.460000
> #  ---------- CashewUtils: a.removeIndex(mid)
> # <Benchmark array removals> Time 2.970000 & 1.164983 versus baseline
> #  ---------- Drop: version 1
> # <Benchmark array removals> Time 0.270000 & 12.814815 versus baseline
> #  ---------- Drop: version 2
> # <Benchmark array removals> Time 0.390000 & 8.871795 versus baseline
> 
> Drop v2 was an attempt of mine at rewriting your .drop() that, obviously, falls a little short of the original in performance.  I also find it odd that Cashew's .removeIndex() is actually timing a little master than the slice expression, because that's actually it is: a wrapper around that very same expression.  *confused*  Anyhow, clearly your .drop() is running very fast!  I actually expected the opposite, and am surprised/impressed.
> 
> Maybe you should release it into Cashew?  ;)  And maybe I should put more into benchmarking the entire Cashew array module.  There might be a few more placed it could be sped up.
> 
>>>
>>> Even though it returns the array, it modifies it in place.
>>>
>>> -DavidM
>>
>>
>>
>> Thanks David, this seems like the best answer in terms of efficiency, although I suppose the
>>     a = a[0..3] ~ a[4..length];
>> could theoretically be pretty similar depending how the compiler implements it.
>>
>>
>> There really should be syntax in the language or at least functions like the above in the standard library for this (and for removing slices). 
> 
> 
> # import cashew.utils.array;
> #
> # int[] foo = ... ;
> # foo.remove(3U);
> # foo.removeRange(2U, 7U);
> # foo.removeSub(foo[1 .. 5]);
> etc.
> 
> That's why I maintain those.  Hopefully, eventually, it does spawn a standard library module.  (I'm always looking for ways to further optimize them, but I do want to keep them fairly simple.)
> 
> -- Chris Nicholson-Sauls

Its public domain.  Free to use in any way you wish.

-DavidM

For giggles I posted the conditional version(also public domain):

// remove items based on a predicate. returns number of items removed
template drop_if(T)
{
  int drop_if( T[] arr, bool delegate(T)pred )
  {
    int count = 0, dest=0;
    foreach( int index, T val ; arr )
    {
      if ( pred(val) ) { count++; continue; }
      if ( dest !=index ) arr[index] = dest;
      dest++;
    }
    return count;
  }
}

November 01, 2006
David Medlock wrote:

> For giggles I posted the conditional version(also public domain):
> 
> // remove items based on a predicate. returns number of items removed
> template drop_if(T)
> {
>   int drop_if( T[] arr, bool delegate(T)pred )
>   {
>     int count = 0, dest=0;
>     foreach( int index, T val ; arr )
>     {
>       if ( pred(val) ) { count++; continue; }
>       if ( dest !=index ) arr[index] = dest;
>       dest++;
>     }
>     return count;
>   }
> }
> 

And here's the range-dropping version:

template drop_range(T)
{
  void drop_range( inout T[] arr, int start, int end )
  {
    debug if ( start>=arr.length && end > arr.length)
        throw new Exception(format("Attempt to drop range %s,%s from size %s",start,end,arr.length));
    if (start>=end) return;
    size_t len = end-start;
    size_t max = arr.length-len;
    for (; start < max; start++ ) arr[start]=arr[start+len];
    arr.length= max;
  }
}
November 01, 2006
Bill Baxter wrote:
> David Medlock wrote:
> 
>> For giggles I posted the conditional version(also public domain):
>>
>> // remove items based on a predicate. returns number of items removed
>> template drop_if(T)
>> {
>>   int drop_if( T[] arr, bool delegate(T)pred )
>>   {
>>     int count = 0, dest=0;
>>     foreach( int index, T val ; arr )
>>     {
>>       if ( pred(val) ) { count++; continue; }
>>       if ( dest !=index ) arr[index] = dest;
>>       dest++;
>>     }
>>     return count;
>>   }
>> }
>>
> 

And here's a better range-dropping version:

// Remove a range of elements from an array in place.
// It is not an error for the range to be empty or for start to be
// greater than end. If so, the array is not modified.
// Out of bounds checks performed only in debug mode.
// Returns: the array for convenience (but it is modified in-place).
// Note: This is an O(n) operation.
template drop_range(T)
{
  T[] drop_range( inout T[] arr, int start, int end )
  {
    debug if ( start>=arr.length || end > arr.length || start<0 || end<0)
        throw new Exception(format("Attempt to drop range %s,%s from size %s",start,end,arr.length));
    if (start>=end) return arr;
    size_t len = end-start;
    size_t max = arr.length-len;
    for (; start < max; start++ ) arr[start]=arr[start+len];
    arr.length= max;
    return arr;
  }
}

--bb
November 01, 2006
Chris Nicholson-Sauls skrev:
<snip>
> # import cashew.utils.array;
> #
> # int[] foo = ... ;
> # foo.remove(3U);
> # foo.removeRange(2U, 7U);
> # foo.removeSub(foo[1 .. 5]);
> etc.
> 

I like these.


And I yet again see a situation where range and set types are useful. If a range, or a set where a basic type (just as an integer, or imaginary float) then removing a range, or a set of elements would be more intuitive. And the slice operation would not be an array-hack, but a more general expression reusable everywhere.

Imagine we have: int foo[] = [1,2,3,4,5,6];
// Each example is now assumed unrelated to all others!
foo.remove(1); // foo == [1, 3, 4, 5, 6]
foo.remove(1..3); // foo = [1, 5, 6];
foo.remove(<1, 4..5>); // foo = [1, 3, 4]

or a "slice" using a set:
auto bar = foo[<1, 4..5>]; // bar == [2, 5, 6]

Naturally this would require a range's upper bound to be inclusive, or else set's would behave unintuitive. That is todays:
foo[5..5] would be a slice with one element, not empty as is.

Today passing around a range can only be done by passing around two values, or a struct. The same could be said about passing around imaginary numbers. Why imaginary numbers are valuable enough to warrant build in types, while ranges are not is beyond me. I use ranges way more than imaginary numbers.

Same goes for sets. Sure bitshifted enumerations, and boolean logic does the same trick. But adding it as a build in type would allow for far more expressive and intuitive code (Just how would you do a foreach over some "ored" enumeration constants?)


// Fredrik Olsson
November 01, 2006
Bill Baxter wrote:
> Bill Baxter wrote:
> 
>> David Medlock wrote:
>>
>>> For giggles I posted the conditional version(also public domain):
>>>
>>> // remove items based on a predicate. returns number of items removed
>>> template drop_if(T)
>>> {
>>>   int drop_if( T[] arr, bool delegate(T)pred )
>>>   {
>>>     int count = 0, dest=0;
>>>     foreach( int index, T val ; arr )
>>>     {
>>>       if ( pred(val) ) { count++; continue; }
>>>       if ( dest !=index ) arr[index] = dest;
>>>       dest++;
>>>     }
>>>     return count;
>>>   }
>>> }
>>>
>>
> 
> And here's a better range-dropping version:
> 
> // Remove a range of elements from an array in place.
> // It is not an error for the range to be empty or for start to be
> // greater than end. If so, the array is not modified.
> // Out of bounds checks performed only in debug mode.
> // Returns: the array for convenience (but it is modified in-place).
> // Note: This is an O(n) operation.
> template drop_range(T)
> {
>   T[] drop_range( inout T[] arr, int start, int end )
>   {
>     debug if ( start>=arr.length || end > arr.length || start<0 || end<0)
>         throw new Exception(format("Attempt to drop range %s,%s from size %s",start,end,arr.length));
>     if (start>=end) return arr;
>     size_t len = end-start;
>     size_t max = arr.length-len;
>     for (; start < max; start++ ) arr[start]=arr[start+len];
>     arr.length= max;
>     return arr;
>   }
> }
> 
> --bb

Lets combine all these with what Oskar has and get them into Phobos!

Since arrays are built into the language, these should come with the standard library.

-DavidM
November 01, 2006
Fredrik Olsson wrote:
> Chris Nicholson-Sauls skrev:
> <snip>
> 
>> # import cashew.utils.array;
>> #
>> # int[] foo = ... ;
>> # foo.remove(3U);
>> # foo.removeRange(2U, 7U);
>> # foo.removeSub(foo[1 .. 5]);
>> etc.
>>
> 
> I like these.
> 
> 
> And I yet again see a situation where range and set types are useful. If a range, or a set where a basic type (just as an integer, or imaginary float) then removing a range, or a set of elements would be more intuitive. And the slice operation would not be an array-hack, but a more general expression reusable everywhere.

Agreed.  I'd like to see 4..6 return some sort of range struct/object too.  Something as simple as
struct range {
   int lo;
   int hi;
};
That plus a few methods would pretty much do it.  Give it an opApply and an opApplyReverse and you've got foreach(i, 0..10) working too. (or rather than opApply/Reverse, you might want to make another special case for it in the compiler, like for arrays, because the general foreach mechanism is too inefficient.)

> Imagine we have: int foo[] = [1,2,3,4,5,6];
> // Each example is now assumed unrelated to all others!
> foo.remove(1); // foo == [1, 3, 4, 5, 6]
> foo.remove(1..3); // foo = [1, 5, 6];
> foo.remove(<1, 4..5>); // foo = [1, 3, 4]
> 
> or a "slice" using a set:
> auto bar = foo[<1, 4..5>]; // bar == [2, 5, 6]
> Naturally this would require a range's upper bound to be inclusive, or else set's would behave unintuitive. That is todays:
> foo[5..5] would be a slice with one element, not empty as is.

Not sure I agree there, or with what seems to be your assumption that sets and ranges should be the same concept.  If sets were a built-in type I would expect them to be able to represent sets of anything, not just sets of integers.

So, limiting the discussion to just ranges, including the upper bound vs not including the upper bound to me is pretty much a wash.  There are cases where each one looks good or bad, and perhaps ruby's solution of two constructs is the best bet -- 4..5 being exclusive and 4...5 being inclusive.  Use whichever works best for your case.  Or not.  I can already hear the cries of "it looks too similar! there'll be bugs!". Whatever.  It seems to work for ruby.

> Today passing around a range can only be done by passing around two values, or a struct. The same could be said about passing around imaginary numbers. Why imaginary numbers are valuable enough to warrant build in types, while ranges are not is beyond me. I use ranges way more than imaginary numbers.

Good point.  That's probably true for most non-sci folk and even a good chunk of the sci-folk.  Actually if the range object was extented to include any type of bounds (e.g. floats, doubles), it could maybe be used for interval arithmetic.  Interval arithmetic is pretty useful for continuous collision detection algorithms that are hot these days, and many other useful geometric/numerical/mathy things.  Just take the range above and parameterize with one type:
struct range(T)
{
   T lo;
   T hi;
}
Maybe there'd need to be a lot more to it to be useful for interval arithmetic, though.  I haven't worked interval arithmetic much myself, just heard talks from folks who used it.

> Same goes for sets. Sure bitshifted enumerations, and boolean logic does the same trick. But adding it as a build in type would allow for far more expressive and intuitive code (Just how would you do a foreach over some "ored" enumeration constants?)

By sets, once again, you apparently mean only sets of small integers. It seems like you could define a struct that hides bitshifting from you and can be foreach'ed without too much trouble.  You just need an appropiate opApply.  (But again it won't be as efficient as a handcoded for-loop unless it becomes a special case recognized by the compiler -- are we sensing a trend here yet?)

--

Finally on a related topic, I'd like to say that the current restriction on opIndex and opSlice methods to only take integers seems unnecessary and overly draconian.  Is there some technical reason for it?  If opIndex/opSlice could take any kind of object then it would be possible today to make things like a range object that can be used to slice or index a user defined-array.  This kind of thing is useful when you want to create a Matlab or Numpy-like array package (c.f. Norbert Nemec's N-d array proposal for D).  It's very useful for instance to be able to have a method like "find" that returns an index object that can be used to index another array.  In Numpy this sort of thing is refered to as "fancy indexing".  Numpy doesn't try to be too clever about compressing the representation of the index set returned.  It's just another array containing the index of every element that passed the 'find' test.  Not sure why it doesn't try to be smarter.  Probably because the logic to do that compression and decompression ends up being slower than just enumerating everything in many common cases.

Note that if 3..5 returned a range object, then there would be no real need for the opSlice methods.  They would be replaced by opIndex overloaed for the range type.  E.g opIndex(range r)



--bb