proposal: capacity variable for dynamic arrays - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » proposal: capacity variable for dynamic arrays

Thread overview

proposal: capacity variable for dynamic arrays
Jul 10, 2006 Ameer Armaly
Jul 10, 2006 Oskar Linde
Jul 10, 2006 Ameer Armaly
Jul 10, 2006 Oskar Linde
Jul 10, 2006 Chris Miller
Jul 10, 2006 Lionello Lunesu
Jul 10, 2006 Ameer Armaly
Jul 18, 2006 Kevin Bealer
Jul 10, 2006 Derek Parnell

July 10, 2006

proposal: capacity variable for dynamic arrays

Posted by Ameer Armaly

Ameer Armaly

Right now dynamic arrays use length to resize and handle memmory allocation. This is a bit limiting as far as the old memmory  reservation problem as well as a few other things.  What I propose is adding a new variable called capacity, which handles the actual memmory allocation and using length for the accepted size of the array.  Here are a few code fragments to show how it would work:

char[] string = new char[1024];
assert(string.length==1024 && string.capacity=1024);
string.length=0; // we've reserved memmory, but haven't freed it
version(break) writefln(string[1..4]); // throws a bounds exception, since
length is zero
string.capacity = 1300; // traditional resize with a different variable,
will be set to whatever the true capacity is
string.length = 2048; // would still work, since length is greater than
capacity so it acts the same way.  However, capacity would be the actual
amount of memmory reserved by the malloc algorithm, in this case doubling
the size of the array.
string.capacity = 0; // freed

In short, this  solves the problem of reserving memmory with little cost, while maintaining backwards compatibility.  It could also be used in the stream and socket code, where the actual capacity doesn't change but the length does, removing the need for a separate size variable, something like this:

char[] buf = new char[1024];
buf.length = 0;
Socket s;
// set up the socket
s.read(buf); // Length is set to the actual data, but capacity is the same.
This way the array is truely dynamic.

Thoughts?
-- 


Ameer
---
Life is either tragedy or comedy. Usually it's your choice. You can whine or
you can laugh.
--Animorphs

July 10, 2006

Re: proposal: capacity variable for dynamic arrays

Posted by Oskar Linde
in reply to Ameer Armaly

Oskar Linde

Posted in reply to Ameer Armaly

Ameer Armaly skrev:
> Right now dynamic arrays use length to resize and handle memmory allocation. This is a bit limiting as far as the old memmory  reservation problem as well as a few other things.  What I propose is adding a new variable called capacity, which handles the actual memmory allocation and using length for the accepted size of the array.  Here are a few code fragments to show how it would work:
> 
> char[] string = new char[1024];
> assert(string.length==1024 && string.capacity=1024);
> string.length=0; // we've reserved memmory, but haven't freed it
> version(break) writefln(string[1..4]); // throws a bounds exception, since length is zero
> string.capacity = 1300; // traditional resize with a different variable, will be set to whatever the true capacity is
> string.length = 2048; // would still work, since length is greater than capacity so it acts the same way.  However, capacity would be the actual amount of memmory reserved by the malloc algorithm, in this case doubling the size of the array.
> string.capacity = 0; // freed
> 
> In short, this  solves the problem of reserving memmory with little cost, while maintaining backwards compatibility.  It could also be used in the stream and socket code, where the actual capacity doesn't change but the length does, removing the need for a separate size variable, something like this:
> 
> char[] buf = new char[1024];
> buf.length = 0;
> Socket s;
> // set up the socket
> s.read(buf); // Length is set to the actual data, but capacity is the same. This way the array is truely dynamic.
> 
> Thoughts?

Where would the capacity value be stored? Today the capacity is (as far as I understand) stored by the GC in correspondance to the the allocated  chunk of memory.

What could be done is having a virtual .capacity property that when set to a value makes sure the capacity is /at least/ the value set and returns the actual capacity when read. So for example after doing
arr.capacity = 1024;
arr.capacity could be something at least 1024, like probably 2047.

/Oskar

July 10, 2006

Re: proposal: capacity variable for dynamic arrays

Posted by Chris Miller
in reply to Ameer Armaly

Chris Miller

Posted in reply to Ameer Armaly

I'd like such a capacity property. The implementation need not even do anything and it merely be a request. A good thing about capacity is that it doesn't need to pre-initialize this extra memory since it's not even accessible yet.

July 10, 2006

Re: proposal: capacity variable for dynamic arrays

Posted by Ameer Armaly
in reply to Oskar Linde

Ameer Armaly

Posted in reply to Oskar Linde

"Oskar Linde" <oskar.lindeREM@OVEgmail.com> wrote in message news:e8tvq5$1i3d$1@digitaldaemon.com...
> Ameer Armaly skrev:
>> Right now dynamic arrays use length to resize and handle memmory allocation. This is a bit limiting as far as the old memmory  reservation problem as well as a few other things.  What I propose is adding a new variable called capacity, which handles the actual memmory allocation and using length for the accepted size of the array.  Here are a few code fragments to show how it would work:
>>
>> char[] string = new char[1024];
>> assert(string.length==1024 && string.capacity=1024);
>> string.length=0; // we've reserved memmory, but haven't freed it
>> version(break) writefln(string[1..4]); // throws a bounds exception,
>> since length is zero
>> string.capacity = 1300; // traditional resize with a different variable,
>> will be set to whatever the true capacity is
>> string.length = 2048; // would still work, since length is greater than
>> capacity so it acts the same way.  However, capacity would be the actual
>> amount of memmory reserved by the malloc algorithm, in this case doubling
>> the size of the array.
>> string.capacity = 0; // freed
>>
>> In short, this  solves the problem of reserving memmory with little cost, while maintaining backwards compatibility.  It could also be used in the stream and socket code, where the actual capacity doesn't change but the length does, removing the need for a separate size variable, something like this:
>>
>> char[] buf = new char[1024];
>> buf.length = 0;
>> Socket s;
>> // set up the socket
>> s.read(buf); // Length is set to the actual data, but capacity is the
>> same. This way the array is truely dynamic.
>>
>> Thoughts?
>
> Where would the capacity value be stored? Today the capacity is (as far as I understand) stored by the GC in correspondance to the the allocated chunk of memory.
>
> What could be done is having a virtual .capacity property that when set to
> a value makes sure the capacity is /at least/ the value set and returns
> the actual capacity when read. So for example after doing
> arr.capacity = 1024;
> arr.capacity could be something at least 1024, like probably 2047.
>
Sounds good to me, especially since that means no redundant variables.  What I see capacity as holding is the maximum amount of elements I can put in this array _without_ any new allocations.
> /Oskar

July 10, 2006

Re: proposal: capacity variable for dynamic arrays

Posted by Oskar Linde
in reply to Ameer Armaly

Oskar Linde

Posted in reply to Ameer Armaly

Ameer Armaly skrev:
> "Oskar Linde" <oskar.lindeREM@OVEgmail.com> wrote in message news:e8tvq5$1i3d$1@digitaldaemon.com...
>> Ameer Armaly skrev:
>>> Right now dynamic arrays use length to resize and handle memmory allocation. This is a bit limiting as far as the old memmory  reservation problem as well as a few other things.  What I propose is adding a new variable called capacity, which handles the actual memmory allocation and using length for the accepted size of the array.  Here are a few code fragments to show how it would work:
>>>
>>> char[] string = new char[1024];
>>> assert(string.length==1024 && string.capacity=1024);
>>> string.length=0; // we've reserved memmory, but haven't freed it
>>> version(break) writefln(string[1..4]); // throws a bounds exception, since length is zero
>>> string.capacity = 1300; // traditional resize with a different variable, will be set to whatever the true capacity is
>>> string.length = 2048; // would still work, since length is greater than capacity so it acts the same way.  However, capacity would be the actual amount of memmory reserved by the malloc algorithm, in this case doubling the size of the array.
>>> string.capacity = 0; // freed
>>>
>>> In short, this  solves the problem of reserving memmory with little cost, while maintaining backwards compatibility.  It could also be used in the stream and socket code, where the actual capacity doesn't change but the length does, removing the need for a separate size variable, something like this:
>>>
>>> char[] buf = new char[1024];
>>> buf.length = 0;
>>> Socket s;
>>> // set up the socket
>>> s.read(buf); // Length is set to the actual data, but capacity is the same. This way the array is truely dynamic.
>>>
>>> Thoughts?
>> Where would the capacity value be stored? Today the capacity is (as far as I understand) stored by the GC in correspondance to the the allocated chunk of memory.
>>
>> What could be done is having a virtual .capacity property that when set to a value makes sure the capacity is /at least/ the value set and returns the actual capacity when read. So for example after doing
>> arr.capacity = 1024;
>> arr.capacity could be something at least 1024, like probably 2047.
>>
> Sounds good to me, especially since that means no redundant variables.  What I see capacity as holding is the maximum amount of elements I can put in this array _without_ any new allocations.

Here is to play with. (You probably need to add ~/dmd/src/phobos/internal/gc to module search path). No warranties: :)


--- arrcapacity.d
import std.gc;
import internal.gc.gcx; // :)

int capacity(T)(T x) {
        gc_t gc = cast(gc_t)getGCHandle();
        uint gccap = gc.capacity(x);
        if (gccap == 0)
                return x.length;
        return (gccap-1)/(typeof(x[0]).sizeof);
}

void setCapacity(T)(inout T x, int capacity) {
        int oldLength = x.length;
        x.length = capacity;
        x = x[0..oldLength];
}
---

usage:


if (arr.capacity() < 100)
	arr.setCapacity(200);

It's a shame implicit injected array member functions don't support the property syntax: arr.capacity; arr.capacity=200;

NOTE: I have no idea if the above code is even close to correct...
The code would also be nicer if Phobos was altered slightly. Especially if _gc.capacity() was exposed by std.gc.

/Oskar

July 10, 2006

Re: proposal: capacity variable for dynamic arrays

Posted by Lionello Lunesu
in reply to Ameer Armaly

Lionello Lunesu

Posted in reply to Ameer Armaly

What's the capacity of a slice?

Futhermore, the capacity can't be stored in the memory block, since you don't know the start and end of the block. Could be stored in the array variables itself (as are length and ptr), but that would make the sizeof(array) and strange 12, probably being padded to 16, resulting in twice the mem transfer for array arguments, return values.

L.

July 10, 2006

Re: proposal: capacity variable for dynamic arrays

Posted by Derek Parnell
in reply to Ameer Armaly

Derek Parnell

Posted in reply to Ameer Armaly

On Tue, 11 Jul 2006 02:22:30 +1000, Ameer Armaly <ameer_armaly@hotmail.com> wrote:

> Right now dynamic arrays use length to resize and handle memmory allocation.
> This is a bit limiting as far as the old memmory  reservation problem as
> well as a few other things.

If the behaviour in D changed (dare I say 'fixed') such that assigning an array length to zero would *not* also set the pointer to null, then the capacity idea can be easily implemented as ...

  int[] arr;
  . . .
  arr.length = capacity;
  arr.length = 0;

This would assign RAM to the required size and by setting length back to zero, it would be kept and 'resused' as the array elements where added.

-- 
Derek Parnell
Melbourne, Australia

July 10, 2006

Re: proposal: capacity variable for dynamic arrays

Posted by Ameer Armaly
in reply to Lionello Lunesu

Ameer Armaly

Posted in reply to Lionello Lunesu

"Lionello Lunesu" <lionello@lunesu.remove.com> wrote in message news:e8u973$21uo$1@digitaldaemon.com...
> What's the capacity of a slice?
>
The capacity of a slice is strictly the number of elements in that slice; any expansion in capacity would result in a new memmory block.
> Futhermore, the capacity can't be stored in the memory block, since you don't know the start and end of the block. Could be stored in the array variables itself (as are length and ptr), but that would make the sizeof(array) and strange 12, probably being padded to 16, resulting in twice the mem transfer for array arguments, return values.
>
> L.
My favorite solution is the one mentioned previously in which capacity is a virtual property that obtains the block size from the gc itself in some manner.
>
>

July 18, 2006

Re: proposal: capacity variable for dynamic arrays

Posted by Kevin Bealer
in reply to Ameer Armaly

Kevin Bealer

Posted in reply to Ameer Armaly

In article <e8udip$2c0r$1@digitaldaemon.com>, Ameer Armaly says...
>
>
>"Lionello Lunesu" <lionello@lunesu.remove.com> wrote in message news:e8u973$21uo$1@digitaldaemon.com...
>> What's the capacity of a slice?
>>
>The capacity of a slice is strictly the number of elements in that slice; any expansion in capacity would result in a new memmory block.

An alternate solution would be to use 0 for the capacity of slices; then allow people to downsize slice objects, but as soon as they increase them, reallocation is triggered.

My rationale is the following scenario:

char[] b = /* something */;
char[] a = b[0..10]; // 10 bytes

a.resize(5);
a.resize(10);

If slice capacity is the same as length, the slice gets smaller, and when it gets larger, a[5..10] is initialized, BUT so is b[5..10].  If the capacity were left at zero for the slice, the second resize would reallocate rather than corrupting the contents of B.

This also provides a handy way of checking if an array is a slice.

>> Futhermore, the capacity can't be stored in the memory block, since you don't know the start and end of the block. Could be stored in the array variables itself (as are length and ptr), but that would make the sizeof(array) and strange 12, probably being padded to 16, resulting in twice the mem transfer for array arguments, return values.
>>
>> L.

Since most systems will be 64 bit soon, I think this padding could be avoided most of the time -- you rarely want padding to more than 8 bytes.  Of course, an array of arrays has the unfortunate "multiply by 24" when indexing (i.e. multiply by a non-power of two, which is slower than shifting.)

Kevin

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation