Jump to page: 1 2 3
Thread overview
Feature suggestion: in-place append to array
Mar 04, 2010
KF
Mar 04, 2010
grauzone
Mar 04, 2010
Clemens
Mar 31, 2010
Mike S
Mar 31, 2010
bearophile
Apr 01, 2010
Mike S
Apr 01, 2010
bearophile
Apr 01, 2010
Mike S
Apr 01, 2010
bearophile
Apr 01, 2010
Mike S
Small troubles [Was: Re: Feature...]
Apr 01, 2010
bearophile
Apr 01, 2010
bearophile
Apr 01, 2010
Mike S
Apr 01, 2010
Mike S
Mar 04, 2010
grauzone
Mar 04, 2010
grauzone
March 04, 2010
I hope this is the right place to post it.
In my work, I often need to add elements at the end of dynamic arrays and remove them from the end. This incremental changes would most
conveniently be performed by a~=e for addition of e at the end of a, and say a=a[0..$-1]. Unfortunately, ~= always creates a copy and
is thus too time consuming. What I would like to suggest is either a new operator, ~~=, in-place append that does not create a copy if
possible. Alternatively, new properties/methods could be defined:
- a.push(element or array) - add elements at the end of the array, without copying if possible
- a.pop(x) - remove x elements from the end of the array
- a.capacity - get or set the actual length allocated for the array
Currently, I am using appender(a).put from algorithm module, but it is very obscure and makes code hard to understand.
Cheers,
KF
March 04, 2010
On Thu, 04 Mar 2010 04:02:46 -0500, KF <kfleszar@gmail.com> wrote:

> I hope this is the right place to post it.
> In my work, I often need to add elements at the end of dynamic arrays and remove them from the end. This incremental changes would most
> conveniently be performed by a~=e for addition of e at the end of a, and say a=a[0..$-1]. Unfortunately, ~= always creates a copy and
> is thus too time consuming.

No, it does not create a copy.  a = a ~ e always creates a copy, a ~= e appends in place if possible.  This still will be slower than appender, but in the next release of DMD it will be much faster than the current release.

With the next release of DMD, a = a[0..$-1] will shrink the array, but the next append to a will reallocate.  There will be a function to get around this, but use it only if you know that the data removed from the end isn't referenced anywhere else.

-Steve
March 04, 2010
Steven Schveighoffer wrote:
> On Thu, 04 Mar 2010 04:02:46 -0500, KF <kfleszar@gmail.com> wrote:
> 
>> I hope this is the right place to post it.
>> In my work, I often need to add elements at the end of dynamic arrays and remove them from the end. This incremental changes would most
>> conveniently be performed by a~=e for addition of e at the end of a, and say a=a[0..$-1]. Unfortunately, ~= always creates a copy and
>> is thus too time consuming.
> 
> No, it does not create a copy.  a = a ~ e always creates a copy, a ~= e appends in place if possible.  This still will be slower than appender, but in the next release of DMD it will be much faster than the current release.
> 
> With the next release of DMD, a = a[0..$-1] will shrink the array, but the next append to a will reallocate.  There will be a function to get around this, but use it only if you know that the data removed from the end isn't referenced anywhere else.

Some sort of "resetAndReuse" function to clear an array, but enabling to reuse the old memory would be nice:

int[] a = data;
a = null;
a ~= 1; //reallocates (of course)
a.length = 0;
a ~= 1; //will reallocate (for safety), used not to reallocate
resetAndReuse(a);
assert(a.length == 0);
a ~= 1; //doesn't reallocate

This can be implemented by setting both the slice and the internal runtime length fields to  0.

Additionally, another function is necessary to replace the old preallocation trick:

//preallocate 1000 elements, but don't change actual slice length
auto len = a.length;
a.length = len + 1000;
a.length = len;

As I understood it, this won't work anymore after the change. This can be implemented by enlarging the array's memory block without touching any length fields.

I'm sure the function you had in mind does one of those things or both.

> -Steve
March 04, 2010
On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none@example.net> wrote:

> Some sort of "resetAndReuse" function to clear an array, but enabling to reuse the old memory would be nice:
>
> int[] a = data;
> a = null;
> a ~= 1; //reallocates (of course)
> a.length = 0;
> a ~= 1; //will reallocate (for safety), used not to reallocate
> resetAndReuse(a);
> assert(a.length == 0);
> a ~= 1; //doesn't reallocate
>
> This can be implemented by setting both the slice and the internal runtime length fields to  0.
>
> Additionally, another function is necessary to replace the old preallocation trick:
>
> //preallocate 1000 elements, but don't change actual slice length
> auto len = a.length;
> a.length = len + 1000;
> a.length = len;
>
> As I understood it, this won't work anymore after the change. This can be implemented by enlarging the array's memory block without touching any length fields.
>
> I'm sure the function you had in mind does one of those things or both.

proposed usage (as checked in a couple days ago):

int[] a;
a.setCapacity(10000); // pre-allocate at least 10000 elements.
foreach(i; 0..10000)
   a ~= i; // no reallocation
a.length = 100;
a.shrinkToFit(); // resize "allocated" length to 100 elements
a ~= 5; // no reallocation.

I will probably add a function to do the preallocation of a new array without having to write two statements.  Also, one cool thing about this that was not available before is that increasing the capacity will not initialize the new data as long as "no pointers" flag is set.  For pointer-containing elements, the contents must be initialized to zero to prevent false positives during collection.

Note that setCapacity is supposed to be a property (i.e. a.capacity = 10000), but the compiler doesn't support this, I added a bug (feature) request for this (3857).

There is also a readable capacity property to get the number of elements the array could grow to without reallocation, a feature that may be useful for tuning.

Also the name "shrinkToFit" may not be final :)  I wanted to call it minimize, but there were objections, and shrinkToFit is easy to search/replace later.

-Steve
March 04, 2010
Steven Schveighoffer Wrote:

> int[] a;
> a.setCapacity(10000); // pre-allocate at least 10000 elements.

I would prefer the name reserve(). It has precedent in the STL, and the method doesn't actually always set the capacity, only if the current capacity is less then the argument. Even then it may conceivably allocate more than was asked for.

In other words, the name setCapacity suggests this will always succeed:

a.setCapacity(n);
assert(a.capacity == n);

...which is not the case as far as I can tell.
March 04, 2010
Steven Schveighoffer wrote:
> On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none@example.net> wrote:
> 
>> Some sort of "resetAndReuse" function to clear an array, but enabling to reuse the old memory would be nice:
>>
>> int[] a = data;
>> a = null;
>> a ~= 1; //reallocates (of course)
>> a.length = 0;
>> a ~= 1; //will reallocate (for safety), used not to reallocate
>> resetAndReuse(a);
>> assert(a.length == 0);
>> a ~= 1; //doesn't reallocate
>>
>> This can be implemented by setting both the slice and the internal runtime length fields to  0.
>>
>> Additionally, another function is necessary to replace the old preallocation trick:
>>
>> //preallocate 1000 elements, but don't change actual slice length
>> auto len = a.length;
>> a.length = len + 1000;
>> a.length = len;
>>
>> As I understood it, this won't work anymore after the change. This can be implemented by enlarging the array's memory block without touching any length fields.
>>
>> I'm sure the function you had in mind does one of those things or both.
> 
> proposed usage (as checked in a couple days ago):
> 
> int[] a;
> a.setCapacity(10000); // pre-allocate at least 10000 elements.
> foreach(i; 0..10000)
>    a ~= i; // no reallocation
> a.length = 100;
> a.shrinkToFit(); // resize "allocated" length to 100 elements
> a ~= 5; // no reallocation.

What shrinkToFit() does is not really clear. Does it reallocate the memory block of the array such, that no space is wasted? Or does it provide (almost) the same functionality as my resetAndReuse(), and make the superfluous trailing memory available for appending without reallocation?

I think a resetAndReuse is really needed. I have found it can prevent "GC thrashing" in many cases. E.g. when caching frequently re-evaluated data in form of arrays (free previous array, then allocate array that isn't larger than the previous array).
March 04, 2010
On Thu, 04 Mar 2010 12:29:25 -0500, Clemens <eriatarka84@gmail.com> wrote:

> Steven Schveighoffer Wrote:
>
>> int[] a;
>> a.setCapacity(10000); // pre-allocate at least 10000 elements.
>
> I would prefer the name reserve(). It has precedent in the STL, and the method doesn't actually always set the capacity, only if the current capacity is less then the argument. Even then it may conceivably allocate more than was asked for.
>
> In other words, the name setCapacity suggests this will always succeed:
>
> a.setCapacity(n);
> assert(a.capacity == n);
>
> ...which is not the case as far as I can tell.

You are correct, setCapacity ensures that *at least* the given number of elements will be available for appending.

I planned on making the function a property (but a bug would not allow that), the original intended usage was:

a.capacity = 10000;

Reserve doesn't work in this context.  Can you come up with a name that does?

I'll bring up reserve (as a function) as an alternative on the phobos mailing list, and see what people say.  I kind of liked the setter/getter idea, but you make a good point.

-Steve
March 04, 2010
On Thu, 04 Mar 2010 12:43:32 -0500, grauzone <none@example.net> wrote:

> Steven Schveighoffer wrote:
>> On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none@example.net> wrote:
>>
>>> Some sort of "resetAndReuse" function to clear an array, but enabling to reuse the old memory would be nice:
>>>
>>> int[] a = data;
>>> a = null;
>>> a ~= 1; //reallocates (of course)
>>> a.length = 0;
>>> a ~= 1; //will reallocate (for safety), used not to reallocate
>>> resetAndReuse(a);
>>> assert(a.length == 0);
>>> a ~= 1; //doesn't reallocate
>>>
>>> This can be implemented by setting both the slice and the internal runtime length fields to  0.
>>>
>>> Additionally, another function is necessary to replace the old preallocation trick:
>>>
>>> //preallocate 1000 elements, but don't change actual slice length
>>> auto len = a.length;
>>> a.length = len + 1000;
>>> a.length = len;
>>>
>>> As I understood it, this won't work anymore after the change. This can be implemented by enlarging the array's memory block without touching any length fields.
>>>
>>> I'm sure the function you had in mind does one of those things or both.
>>  proposed usage (as checked in a couple days ago):
>>  int[] a;
>> a.setCapacity(10000); // pre-allocate at least 10000 elements.
>> foreach(i; 0..10000)
>>    a ~= i; // no reallocation
>> a.length = 100;
>> a.shrinkToFit(); // resize "allocated" length to 100 elements
>> a ~= 5; // no reallocation.
>
> What shrinkToFit() does is not really clear. Does it reallocate the memory block of the array such, that no space is wasted? Or does it provide (almost) the same functionality as my resetAndReuse(), and make the superfluous trailing memory available for appending without reallocation?

Sorry, should have added:

assert(a.length == 101);

Basically, shrinkToFit shrinks the "allocated" space to the length of the array. To put it another way, you could write your resetAndReuse function as follows:

void resetAndReuse(T)(ref T[] arr)
{
  arr.length = 0;
  arr.shrinkToFit();
}

I want to avoid assuming that shrinking the length to 0 is the only usable idiom.

>
> I think a resetAndReuse is really needed. I have found it can prevent "GC thrashing" in many cases. E.g. when caching frequently re-evaluated data in form of arrays (free previous array, then allocate array that isn't larger than the previous array).

Yes, it is useful in such cases.  The only questionable part about it is that it allows for stomping in cases like:

auto str = "hello".idup;
auto str2 = str[3..4];
str2.shrinkToFit();
str2 ~= "a";
assert(str == "hella");

So it probably should be marked as unsafe.

Again, the name shrinkToFit isn't my favorite, ideas welcome.

-Steve
March 04, 2010
Steven Schveighoffer wrote:
> On Thu, 04 Mar 2010 12:43:32 -0500, grauzone <none@example.net> wrote:
> 
>> Steven Schveighoffer wrote:
>>> On Thu, 04 Mar 2010 11:43:27 -0500, grauzone <none@example.net> wrote:
>>>
>>>> Some sort of "resetAndReuse" function to clear an array, but enabling to reuse the old memory would be nice:
>>>>
>>>> int[] a = data;
>>>> a = null;
>>>> a ~= 1; //reallocates (of course)
>>>> a.length = 0;
>>>> a ~= 1; //will reallocate (for safety), used not to reallocate
>>>> resetAndReuse(a);
>>>> assert(a.length == 0);
>>>> a ~= 1; //doesn't reallocate
>>>>
>>>> This can be implemented by setting both the slice and the internal runtime length fields to  0.
>>>>
>>>> Additionally, another function is necessary to replace the old preallocation trick:
>>>>
>>>> //preallocate 1000 elements, but don't change actual slice length
>>>> auto len = a.length;
>>>> a.length = len + 1000;
>>>> a.length = len;
>>>>
>>>> As I understood it, this won't work anymore after the change. This can be implemented by enlarging the array's memory block without touching any length fields.
>>>>
>>>> I'm sure the function you had in mind does one of those things or both.
>>>  proposed usage (as checked in a couple days ago):
>>>  int[] a;
>>> a.setCapacity(10000); // pre-allocate at least 10000 elements.
>>> foreach(i; 0..10000)
>>>    a ~= i; // no reallocation
>>> a.length = 100;
>>> a.shrinkToFit(); // resize "allocated" length to 100 elements
>>> a ~= 5; // no reallocation.
>>
>> What shrinkToFit() does is not really clear. Does it reallocate the memory block of the array such, that no space is wasted? Or does it provide (almost) the same functionality as my resetAndReuse(), and make the superfluous trailing memory available for appending without reallocation?
> 
> Sorry, should have added:
> 
> assert(a.length == 101);
> 
> Basically, shrinkToFit shrinks the "allocated" space to the length of the array. To put it another way, you could write your resetAndReuse function as follows:
> 
> void resetAndReuse(T)(ref T[] arr)
> {
>   arr.length = 0;
>   arr.shrinkToFit();
> }
> 
> I want to avoid assuming that shrinking the length to 0 is the only usable idiom.

Ah, great.

>>
>> I think a resetAndReuse is really needed. I have found it can prevent "GC thrashing" in many cases. E.g. when caching frequently re-evaluated data in form of arrays (free previous array, then allocate array that isn't larger than the previous array).
> 
> Yes, it is useful in such cases.  The only questionable part about it is that it allows for stomping in cases like:
> 
> auto str = "hello".idup;
> auto str2 = str[3..4];
> str2.shrinkToFit();
> str2 ~= "a";
> assert(str == "hella");
> 
> So it probably should be marked as unsafe.

Doesn't it conform to Andrei's ideas about memory safety? It can stomp over other data, but it can't be used to subvert the type system. Also, it's not the default behavior: if you don't use this function, stomping can never happen.

But it must be disabled for arrays of immutable types (i.e. strings).

> Again, the name shrinkToFit isn't my favorite, ideas welcome.

stompOnAppend()? uniqueSlice()? trashRemainder()?

I don't know either, all sound a bit silly.

PS: I'm glad to hear that your patch is supposed to be included in the next release. Still waiting for dsimcha's one.

> -Steve
March 04, 2010
On Thu, 04 Mar 2010 13:38:30 -0500, grauzone <none@example.net> wrote:

> Steven Schveighoffer wrote:
>
>>  So it probably should be marked as unsafe.
>
> Doesn't it conform to Andrei's ideas about memory safety? It can stomp over other data, but it can't be used to subvert the type system.

I think the idea is that anything that *could* result in undefined behavior is marked as unsafe.  It just means it's not available from safe functions.  It still can be called from safe functions via a trusted wrapper.  I'd call it undefined behavior, because the memory is effectively released.  Future GCs may make assumptions about that unallocated space that would cause problems.  I feel like safeD has a slight lean towards sacrificing performance for the sake of memory safety.  This is not a bad thing, and in most cases inconsequential.

> Also, it's not the default behavior: if you don't use this function, stomping can never happen.

What's nice about capturing it into a function is you *can* classify it as unsafe, versus the current method which is harder to detect.  I'd categorize it like casting.

>
> But it must be disabled for arrays of immutable types (i.e. strings).

This is tricky, you could have partially immutable types.  The function would have to examine the type of all the contained items and make sure there were no immutable bytes in the element.

Plus, it's not truly unsafe to use on immutable types if the given array is the only reference to that data.  You could potentially implement a stack using builtin array appending and shrinkToFit, even on immutable elements (although I'd advise against it :)

>
>> Again, the name shrinkToFit isn't my favorite, ideas welcome.
>
> stompOnAppend()? uniqueSlice()? trashRemainder()?
>
> I don't know either, all sound a bit silly.

I think I like shrinkToFit better than all those :)  I still like minimize the best, but the objection was that it has mathematical connotations.

-Steve
« First   ‹ Prev
1 2 3