Thread overview
proposal: capacity variable for dynamic arrays
Jul 10, 2006
Ameer Armaly
Jul 10, 2006
Oskar Linde
Jul 10, 2006
Ameer Armaly
Jul 10, 2006
Oskar Linde
Jul 10, 2006
Chris Miller
Jul 10, 2006
Lionello Lunesu
Jul 10, 2006
Ameer Armaly
Jul 18, 2006
Kevin Bealer
Jul 10, 2006
Derek Parnell
July 10, 2006
Right now dynamic arrays use length to resize and handle memmory allocation. This is a bit limiting as far as the old memmory  reservation problem as well as a few other things.  What I propose is adding a new variable called capacity, which handles the actual memmory allocation and using length for the accepted size of the array.  Here are a few code fragments to show how it would work:

char[] string = new char[1024];
assert(string.length==1024 && string.capacity=1024);
string.length=0; // we've reserved memmory, but haven't freed it
version(break) writefln(string[1..4]); // throws a bounds exception, since
length is zero
string.capacity = 1300; // traditional resize with a different variable,
will be set to whatever the true capacity is
string.length = 2048; // would still work, since length is greater than
capacity so it acts the same way.  However, capacity would be the actual
amount of memmory reserved by the malloc algorithm, in this case doubling
the size of the array.
string.capacity = 0; // freed

In short, this  solves the problem of reserving memmory with little cost, while maintaining backwards compatibility.  It could also be used in the stream and socket code, where the actual capacity doesn't change but the length does, removing the need for a separate size variable, something like this:

char[] buf = new char[1024];
buf.length = 0;
Socket s;
// set up the socket
s.read(buf); // Length is set to the actual data, but capacity is the same.
This way the array is truely dynamic.

Thoughts?
-- 


Ameer
---
Life is either tragedy or comedy. Usually it's your choice. You can whine or
you can laugh.
--Animorphs


July 10, 2006
Ameer Armaly skrev:
> Right now dynamic arrays use length to resize and handle memmory allocation. This is a bit limiting as far as the old memmory  reservation problem as well as a few other things.  What I propose is adding a new variable called capacity, which handles the actual memmory allocation and using length for the accepted size of the array.  Here are a few code fragments to show how it would work:
> 
> char[] string = new char[1024];
> assert(string.length==1024 && string.capacity=1024);
> string.length=0; // we've reserved memmory, but haven't freed it
> version(break) writefln(string[1..4]); // throws a bounds exception, since length is zero
> string.capacity = 1300; // traditional resize with a different variable, will be set to whatever the true capacity is
> string.length = 2048; // would still work, since length is greater than capacity so it acts the same way.  However, capacity would be the actual amount of memmory reserved by the malloc algorithm, in this case doubling the size of the array.
> string.capacity = 0; // freed
> 
> In short, this  solves the problem of reserving memmory with little cost, while maintaining backwards compatibility.  It could also be used in the stream and socket code, where the actual capacity doesn't change but the length does, removing the need for a separate size variable, something like this:
> 
> char[] buf = new char[1024];
> buf.length = 0;
> Socket s;
> // set up the socket
> s.read(buf); // Length is set to the actual data, but capacity is the same. This way the array is truely dynamic.
> 
> Thoughts?

Where would the capacity value be stored? Today the capacity is (as far as I understand) stored by the GC in correspondance to the the allocated  chunk of memory.

What could be done is having a virtual .capacity property that when set to a value makes sure the capacity is /at least/ the value set and returns the actual capacity when read. So for example after doing
arr.capacity = 1024;
arr.capacity could be something at least 1024, like probably 2047.

/Oskar
July 10, 2006
I'd like such a capacity property. The implementation need not even do anything and it merely be a request. A good thing about capacity is that it doesn't need to pre-initialize this extra memory since it's not even accessible yet.
July 10, 2006
"Oskar Linde" <oskar.lindeREM@OVEgmail.com> wrote in message news:e8tvq5$1i3d$1@digitaldaemon.com...
> Ameer Armaly skrev:
>> Right now dynamic arrays use length to resize and handle memmory allocation. This is a bit limiting as far as the old memmory  reservation problem as well as a few other things.  What I propose is adding a new variable called capacity, which handles the actual memmory allocation and using length for the accepted size of the array.  Here are a few code fragments to show how it would work:
>>
>> char[] string = new char[1024];
>> assert(string.length==1024 && string.capacity=1024);
>> string.length=0; // we've reserved memmory, but haven't freed it
>> version(break) writefln(string[1..4]); // throws a bounds exception,
>> since length is zero
>> string.capacity = 1300; // traditional resize with a different variable,
>> will be set to whatever the true capacity is
>> string.length = 2048; // would still work, since length is greater than
>> capacity so it acts the same way.  However, capacity would be the actual
>> amount of memmory reserved by the malloc algorithm, in this case doubling
>> the size of the array.
>> string.capacity = 0; // freed
>>
>> In short, this  solves the problem of reserving memmory with little cost, while maintaining backwards compatibility.  It could also be used in the stream and socket code, where the actual capacity doesn't change but the length does, removing the need for a separate size variable, something like this:
>>
>> char[] buf = new char[1024];
>> buf.length = 0;
>> Socket s;
>> // set up the socket
>> s.read(buf); // Length is set to the actual data, but capacity is the
>> same. This way the array is truely dynamic.
>>
>> Thoughts?
>
> Where would the capacity value be stored? Today the capacity is (as far as I understand) stored by the GC in correspondance to the the allocated chunk of memory.
>
> What could be done is having a virtual .capacity property that when set to
> a value makes sure the capacity is /at least/ the value set and returns
> the actual capacity when read. So for example after doing
> arr.capacity = 1024;
> arr.capacity could be something at least 1024, like probably 2047.
>
Sounds good to me, especially since that means no redundant variables.  What I see capacity as holding is the maximum amount of elements I can put in this array _without_ any new allocations.
> /Oskar


July 10, 2006
Ameer Armaly skrev:
> "Oskar Linde" <oskar.lindeREM@OVEgmail.com> wrote in message news:e8tvq5$1i3d$1@digitaldaemon.com...
>> Ameer Armaly skrev:
>>> Right now dynamic arrays use length to resize and handle memmory allocation. This is a bit limiting as far as the old memmory  reservation problem as well as a few other things.  What I propose is adding a new variable called capacity, which handles the actual memmory allocation and using length for the accepted size of the array.  Here are a few code fragments to show how it would work:
>>>
>>> char[] string = new char[1024];
>>> assert(string.length==1024 && string.capacity=1024);
>>> string.length=0; // we've reserved memmory, but haven't freed it
>>> version(break) writefln(string[1..4]); // throws a bounds exception, since length is zero
>>> string.capacity = 1300; // traditional resize with a different variable, will be set to whatever the true capacity is
>>> string.length = 2048; // would still work, since length is greater than capacity so it acts the same way.  However, capacity would be the actual amount of memmory reserved by the malloc algorithm, in this case doubling the size of the array.
>>> string.capacity = 0; // freed
>>>
>>> In short, this  solves the problem of reserving memmory with little cost, while maintaining backwards compatibility.  It could also be used in the stream and socket code, where the actual capacity doesn't change but the length does, removing the need for a separate size variable, something like this:
>>>
>>> char[] buf = new char[1024];
>>> buf.length = 0;
>>> Socket s;
>>> // set up the socket
>>> s.read(buf); // Length is set to the actual data, but capacity is the same. This way the array is truely dynamic.
>>>
>>> Thoughts?
>> Where would the capacity value be stored? Today the capacity is (as far as I understand) stored by the GC in correspondance to the the allocated chunk of memory.
>>
>> What could be done is having a virtual .capacity property that when set to a value makes sure the capacity is /at least/ the value set and returns the actual capacity when read. So for example after doing
>> arr.capacity = 1024;
>> arr.capacity could be something at least 1024, like probably 2047.
>>
> Sounds good to me, especially since that means no redundant variables.  What I see capacity as holding is the maximum amount of elements I can put in this array _without_ any new allocations.

Here is to play with. (You probably need to add ~/dmd/src/phobos/internal/gc to module search path). No warranties: :)


--- arrcapacity.d
import std.gc;
import internal.gc.gcx; // :)

int capacity(T)(T x) {
        gc_t gc = cast(gc_t)getGCHandle();
        uint gccap = gc.capacity(x);
        if (gccap == 0)
                return x.length;
        return (gccap-1)/(typeof(x[0]).sizeof);
}

void setCapacity(T)(inout T x, int capacity) {
        int oldLength = x.length;
        x.length = capacity;
        x = x[0..oldLength];
}
---

usage:


if (arr.capacity() < 100)
	arr.setCapacity(200);

It's a shame implicit injected array member functions don't support the property syntax: arr.capacity; arr.capacity=200;

NOTE: I have no idea if the above code is even close to correct...
The code would also be nicer if Phobos was altered slightly. Especially if _gc.capacity() was exposed by std.gc.

/Oskar
July 10, 2006
What's the capacity of a slice?

Futhermore, the capacity can't be stored in the memory block, since you don't know the start and end of the block. Could be stored in the array variables itself (as are length and ptr), but that would make the sizeof(array) and strange 12, probably being padded to 16, resulting in twice the mem transfer for array arguments, return values.

L.


July 10, 2006
On Tue, 11 Jul 2006 02:22:30 +1000, Ameer Armaly <ameer_armaly@hotmail.com> wrote:

> Right now dynamic arrays use length to resize and handle memmory allocation.
> This is a bit limiting as far as the old memmory  reservation problem as
> well as a few other things.

If the behaviour in D changed (dare I say 'fixed') such that assigning an array length to zero would *not* also set the pointer to null, then the capacity idea can be easily implemented as ...

  int[] arr;
  . . .
  arr.length = capacity;
  arr.length = 0;

This would assign RAM to the required size and by setting length back to zero, it would be kept and 'resused' as the array elements where added.

-- 
Derek Parnell
Melbourne, Australia
July 10, 2006
"Lionello Lunesu" <lionello@lunesu.remove.com> wrote in message news:e8u973$21uo$1@digitaldaemon.com...
> What's the capacity of a slice?
>
The capacity of a slice is strictly the number of elements in that slice; any expansion in capacity would result in a new memmory block.
> Futhermore, the capacity can't be stored in the memory block, since you don't know the start and end of the block. Could be stored in the array variables itself (as are length and ptr), but that would make the sizeof(array) and strange 12, probably being padded to 16, resulting in twice the mem transfer for array arguments, return values.
>
> L.
My favorite solution is the one mentioned previously in which capacity is a virtual property that obtains the block size from the gc itself in some manner.
>
> 


July 18, 2006
In article <e8udip$2c0r$1@digitaldaemon.com>, Ameer Armaly says...
>
>
>"Lionello Lunesu" <lionello@lunesu.remove.com> wrote in message news:e8u973$21uo$1@digitaldaemon.com...
>> What's the capacity of a slice?
>>
>The capacity of a slice is strictly the number of elements in that slice; any expansion in capacity would result in a new memmory block.

An alternate solution would be to use 0 for the capacity of slices; then allow people to downsize slice objects, but as soon as they increase them, reallocation is triggered.

My rationale is the following scenario:

char[] b = /* something */;
char[] a = b[0..10]; // 10 bytes

a.resize(5);
a.resize(10);

If slice capacity is the same as length, the slice gets smaller, and when it gets larger, a[5..10] is initialized, BUT so is b[5..10].  If the capacity were left at zero for the slice, the second resize would reallocate rather than corrupting the contents of B.

This also provides a handy way of checking if an array is a slice.

>> Futhermore, the capacity can't be stored in the memory block, since you don't know the start and end of the block. Could be stored in the array variables itself (as are length and ptr), but that would make the sizeof(array) and strange 12, probably being padded to 16, resulting in twice the mem transfer for array arguments, return values.
>>
>> L.

Since most systems will be 64 bit soon, I think this padding could be avoided most of the time -- you rarely want padding to more than 8 bytes.  Of course, an array of arrays has the unfortunate "multiply by 24" when indexing (i.e. multiply by a non-power of two, which is slower than shifting.)

Kevin