Jump to page: 1 2
Thread overview
[phobos] Array preallocate function
February 18, 2010
Now that the append patch is pretty much a lock in for the next version of phobos/druntime, we need to address losing preallocate functionality.

Specifically, the following code does not preallocate as it previously did:

auto buffer = new int[10000];
buffer.length = 0;

buffer will reallocate on the first append because this can be considered a case of stomping, thereby wasting the first allocation completely.

Therefore, we need a way to implement this behavior.  I think the best way to do it is a druntime function (also TBD is where it goes!) which needs to call a function in the low-level rt/lifetime.d module.  Here is a strawman to discuss, anyone has a better idea, please present it.  I have no emotional attachment to anything here:

T[] preAlloc(T)(size_t nElements);

as for a module for it to belong in, currently druntime has the following public modules:
bitop.d
cpuid.d
exception.d
memory.d
runtime.d
thread.d
vararg.d

And of course, there's object.di

My first intuition is to use memory.d, but that contains only one public item -- the GC struct.  However, this function is currently not a GC function (it is purposely not because then it can be GC agnostic, but I'm open to discussion on this).  Therefore, it probably should be outside the GC struct.

The only other place that makes sense to me is object.di.  And of course, it could live in its own module, but that might be overkill.

-------------------------

A further feature is to "reuse" a buffer.  For example:

auto buffer = new int[10000];
loop
{
   buffer.length = 0;
   // use append as necessary
}

Should we implement this as a function or should we refer users to the Appender in std.array (does it support this feature)?  With preallocation, we can guarantee no stomping, but with reusing a buffer, we cannot because we rely on the user to ensure nothing else has references to data in the buffer.

My intuition is that we should not supply this function, and refer users to the Appender struct (and ensuring it can do this).

-Steve




February 19, 2010
I very much dislike the idiom with allocating a large array (and initializing its contents!), to then set its length to zero. It's completely assuming, and indeed the best proof is that that assumption is now invalid.

I think we could define a preAlloc() pseudo-method for arrays. Maybe we go as far as defining a @property capacity that can be read and written.

The best place is something that gets automatically included, like object.di. Then we can could on .capacity being as good as built-in.


Andrei

Steve Schveighoffer wrote:
> Now that the append patch is pretty much a lock in for the next version of phobos/druntime, we need to address losing preallocate functionality.
> 
> Specifically, the following code does not preallocate as it previously did:
> 
> auto buffer = new int[10000];
> buffer.length = 0;
> 
> buffer will reallocate on the first append because this can be considered a case of stomping, thereby wasting the first allocation completely.
> 
> Therefore, we need a way to implement this behavior.  I think the best way to do it is a druntime function (also TBD is where it goes!) which needs to call a function in the low-level rt/lifetime.d module.  Here is a strawman to discuss, anyone has a better idea, please present it.  I have no emotional attachment to anything here:
> 
> T[] preAlloc(T)(size_t nElements);
> 
> as for a module for it to belong in, currently druntime has the following public modules:
> bitop.d
> cpuid.d
> exception.d
> memory.d
> runtime.d
> thread.d
> vararg.d
> 
> And of course, there's object.di
> 
> My first intuition is to use memory.d, but that contains only one public item -- the GC struct.  However, this function is currently not a GC function (it is purposely not because then it can be GC agnostic, but I'm open to discussion on this).  Therefore, it probably should be outside the GC struct.
> 
> The only other place that makes sense to me is object.di.  And of course, it could live in its own module, but that might be overkill.
> 
> -------------------------
> 
> A further feature is to "reuse" a buffer.  For example:
> 
> auto buffer = new int[10000];
> loop
> {
>    buffer.length = 0;
>    // use append as necessary
> }
> 
> Should we implement this as a function or should we refer users to the Appender in std.array (does it support this feature)?  With preallocation, we can guarantee no stomping, but with reusing a buffer, we cannot because we rely on the user to ensure nothing else has references to data in the buffer.
> 
> My intuition is that we should not supply this function, and refer users to the Appender struct (and ensuring it can do this).
> 
> -Steve
> 
> 
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
February 20, 2010
This is a good idea.? I still think a function to do this should be available, so you can do the allocation in one step instead of declaring an array and then setting its capacity.

What about the issue of shrinking the allocated size as in the loop I showed?? Should this be allowed via builtin arrays or reserved to a wrapper?

-Steve



----- Original Message ----
> From: Andrei Alexandrescu <andrei at erdani.com>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Fri, February 19, 2010 6:20:25 PM
> Subject: Re: [phobos] Array preallocate function
> 
> I very much dislike the idiom with allocating a large array (and initializing its contents!), to then set its length to zero. It's completely assuming, and indeed the best proof is that that assumption is now invalid.

I think we
> could define a preAlloc() pseudo-method for arrays. Maybe we go as far as defining a @property capacity that can be read and written.

The best
> place is something that gets automatically included, like object.di. Then we can could on .capacity being as good as built-in.


Andrei

Steve
> Schveighoffer wrote:
> Now that the append patch is pretty much a lock in
> for the next version of phobos/druntime, we need to address losing preallocate
> functionality.
> 
> Specifically, the following code does not preallocate as it previously did:
> 
> auto buffer = new
> int[10000];
> buffer.length = 0;
> 
> buffer will reallocate on the first append because this can be considered a case of stomping, thereby wasting the first allocation completely.
> 
> Therefore, we need a way to implement this behavior.? I think the best way to do it is a druntime function (also TBD is where it goes!) which needs to call a function in the low-level rt/lifetime.d module.? Here is a strawman to discuss, anyone has a better idea, please present it.? I have no emotional attachment to anything here:
> 
> T[] preAlloc(T)(size_t nElements);
> 
> 
> as for a module for it to belong in, currently druntime has the
> following public modules:
> bitop.d
> cpuid.d
> 
> exception.d
> memory.d
> runtime.d
> thread.d
> 
> vararg.d
> 
> And of course, there's object.di
> 
> My first intuition is to use memory.d, but that contains only one public item -- the GC struct.? However, this function is currently not a GC function (it is purposely not because then it can be GC agnostic, but I'm open to discussion on this).? Therefore, it probably should be outside the GC struct.
> 
> 
> The only other place that makes sense to me is object.di.? And of course, it could live in its own module, but that might be overkill.
> 
> 
> -------------------------
> 
> A further feature is to "reuse" a buffer.? For example:
> 
> auto buffer = new
> int[10000];
> loop
> {
>? ? buffer.length =
> 0;
>? ? // use append as necessary
> }
> 
> 
> Should we implement this as a function or should we refer users to the Appender in std.array (does it support this feature)?? With preallocation, we can guarantee no stomping, but with reusing a buffer, we cannot because we rely on the user to ensure nothing else has references to data in the buffer.
> 
> 
> My intuition is that we should not supply this function, and refer users to the Appender struct (and ensuring it can do this).
> 
> 
> -Steve
> 
> 
> 
>? ? ?
> _______________________________________________
> phobos mailing
> list
> > href="mailto:phobos at puremagic.com">phobos at puremagic.com
> 
> http://lists.puremagic.com/mailman/listinfo/phobos
_______________________________________________
phobos
> mailing list
> href="mailto:phobos at puremagic.com">phobos at puremagic.com
> href="http://lists.puremagic.com/mailman/listinfo/phobos" target=_blank
> >http://lists.puremagic.com/mailman/listinfo/phobos



February 20, 2010
Steve Schveighoffer wrote:
> This is a good idea.  I still think a function to do this should be available, so you can do the allocation in one step instead of declaring an array and then setting its capacity.
> 
> What about the issue of shrinking the allocated size as in the loop I showed?  Should this be allowed via builtin arrays or reserved to a wrapper?

I think regardless of what we do, there will be people who will say buffer.length = 0 if they want to reuse a buffer. I think we should do something sensible in such cases. The case I don't think we want to endorse as an idiom is "for efficiency, initialize needlessly a large buffer and then assign its length to zero".

Andrei

February 23, 2010
----- Original Message ----

> From: Andrei Alexandrescu <andrei at erdani.com>
> 
> Steve Schveighoffer wrote:
> > This is a good idea.  I still think a function to do this should be available, so you can do the allocation in one step instead of declaring an array and then setting its capacity.
> > 
> > What about the issue of shrinking the allocated size as in the loop I showed?  Should this be allowed via builtin arrays or reserved to a wrapper?
> 
> I think regardless of what we do, there will be people who will say buffer.length = 0 if they want to reuse a buffer. I think we should do something sensible in such cases. The case I don't think we want to endorse as an idiom is "for efficiency, initialize needlessly a large buffer and then assign its length to zero".
> 

Discouraging that case doesn't go very far when we allow it to do what it did before.  I think by disallowing it, you discourage it more than adding any new functions.

If someone has an array, and they want to trim one element off the end, it would be feasible they may write:

array.length -= 1;

vs the more verbose

array = array[0..$-1];

Letting these two statements do something different is a mistake in my mind.

What about a "minimize" function, which simply truncates any "allocated" length after an array.  So you would reset an array via:

arr.length = 0;
arr.minimize();

The advantage here is the array's length is not affected, just the allocated length is reduced to match the array's length.  There are less invalid cases to worry about (i.e. "shrinking" to something larger doesn't make any sense).

-Steve




February 23, 2010
Steve Schveighoffer wrote:
> If someone has an array, and they want to trim one element off the end, it would be feasible they may write:
> 
> array.length -= 1;
> 
> vs the more verbose
> 
> array = array[0..$-1];
> 
> Letting these two statements do something different is a mistake in my mind.

Agreed.

> What about a "minimize" function, which simply truncates any "allocated" length after an array.  So you would reset an array via:
> 
> arr.length = 0; arr.minimize();
> 
> The advantage here is the array's length is not affected, just the allocated length is reduced to match the array's length.  There are less invalid cases to worry about (i.e. "shrinking" to something larger doesn't make any sense).

Sounds good. I'd choose a more specific name, e.g. shrinkToFit. Minimize has me think of optimization functions.

Andrei
February 23, 2010
I understand your objections to minimize.  I don't really have a preference, but I don't really like shrinkToFit.  I'll use that for now, since it's probably easy to search/replace later.

-Steve



----- Original Message ----
> From: Andrei Alexandrescu <andrei at erdani.com>
> > What about a "minimize" function, which simply truncates any "allocated" length after an array.  So you would reset an array via:
> > 
> > arr.length = 0; arr.minimize();
> > 
> > The advantage here is the array's length is not affected, just the allocated length is reduced to match the array's length.  There are less invalid cases to worry about (i.e. "shrinking" to something larger doesn't make any sense).
> 
> Sounds good. I'd choose a more specific name, e.g. shrinkToFit. Minimize has me think of optimization functions.
> 
> Andrei
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos




February 26, 2010
Ran into an issue with array properties:

http://d.puremagic.com/issues/show_bug.cgi?id=3857

I'm going to make the function not a property for now (at least for the "set capacity" part).  Should I wait for this bug to be fixed, or should we discuss a different scheme?

My plans were for this to work:

int[] x;

x.capacity = 10000; // set the capacity to 10000 elements
assert(x.capacity >= 10000); // get the current capacity (elements allocated + available)
int *ptr = x.ptr;
while(x.length < 10000)
   x ~= 1;
assert(x.ptr == ptr); // no reallocation
x.length = 0;
x.shrinkToFit(); // name up for debate
x ~= 1;
assert(x.ptr == ptr); // resized the allocation length, so it can be reused as a buffer.

I'll have to change the write property 'capacity' to a function 'setCapacity' for now.

-Steve




----- Original Message ----
> From: Steve Schveighoffer <schveiguy at yahoo.com>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Tue, February 23, 2010 1:47:56 PM
> Subject: Re: [phobos] Array preallocate function
> 
> I understand your objections to minimize.  I don't really have a preference, but I don't really like shrinkToFit.  I'll use that for now, since it's probably easy to search/replace later.
> 
> -Steve
> 
> 
> 
> ----- Original Message ----
> > From: Andrei Alexandrescu
> > > What about a "minimize" function, which simply truncates any "allocated" length after an array.  So you would reset an array via:
> > > 
> > > arr.length = 0; arr.minimize();
> > > 
> > > The advantage here is the array's length is not affected, just the allocated length is reduced to match the array's length.  There are less invalid cases to worry about (i.e. "shrinking" to something larger doesn't make any sense).
> > 
> > Sounds good. I'd choose a more specific name, e.g. shrinkToFit. Minimize has
> me
> > think of optimization functions.
> > 
> > Andrei
> > _______________________________________________
> > phobos mailing list
> > phobos at puremagic.com
> > http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos




March 01, 2010
In changeset 254 (http://www.dsource.org/projects/druntime/changeset/254), I added three functions.  A capacity property, a "setCapacity" function that should really be a property except for the bug mentioned below, and a shrinkToFit function as discussed below.

Note that these functions are all template wrappers in object.di to runtime functions in rt/lifetime.d, they would be easy to switch to compiler builtins *hint hint*.  However, before that should happen, we should think of a good name for shrinkToFit.  The function shrinks the allocated space of a block so it ends at the end of the given array.  Essentially, it's there so you can re-use a buffer.  Usage looks like this:

buf.length = 0;
buf.shrinkToFit();

Please kick tires as desired.

-Steve



----- Original Message ----
> From: Steve Schveighoffer <schveiguy at yahoo.com>
>
> Ran into an issue with array properties:
> 
> http://d.puremagic.com/issues/show_bug.cgi?id=3857
> 
> I'm going to make the function not a property for now (at least for the "set capacity" part).  Should I wait for this bug to be fixed, or should we discuss a different scheme?
> 
> My plans were for this to work:
> 
> int[] x;
> 
> x.capacity = 10000; // set the capacity to 10000 elements
> assert(x.capacity >= 10000); // get the current capacity (elements allocated +
> available)
> int *ptr = x.ptr;
> while(x.length < 10000)
>    x ~= 1;
> assert(x.ptr == ptr); // no reallocation
> x.length = 0;
> x.shrinkToFit(); // name up for debate
> x ~= 1;
> assert(x.ptr == ptr); // resized the allocation length, so it can be reused as a
> buffer.
> 
> I'll have to change the write property 'capacity' to a function 'setCapacity' for now.
> 
> -Steve
> 
> 
> 
> 
> ----- Original Message ----
> > From: Steve Schveighoffer
> > To: Discuss the phobos library for D
> > Sent: Tue, February 23, 2010 1:47:56 PM
> > Subject: Re: [phobos] Array preallocate function
> > 
> > I understand your objections to minimize.  I don't really have a preference,
> but
> > I don't really like shrinkToFit.  I'll use that for now, since it's probably easy to search/replace later.
> > 
> > -Steve
> > 
> > 
> > 
> > ----- Original Message ----
> > > From: Andrei Alexandrescu
> > > > What about a "minimize" function, which simply truncates any "allocated" length after an array.  So you would reset an array via:
> > > > 
> > > > arr.length = 0; arr.minimize();
> > > > 
> > > > The advantage here is the array's length is not affected, just the allocated length is reduced to match the array's length.  There are less invalid cases to worry about (i.e. "shrinking" to something larger doesn't make any sense).
> > > 
> > > Sounds good. I'd choose a more specific name, e.g. shrinkToFit. Minimize has
> 
> > me
> > > think of optimization functions.
> > > 
> > > Andrei




March 04, 2010
I outlined the ideas below on the newsgroup, and one poster had a valid point.

With x.setCapacity(...) or x.capacity = ..., it implies that the exact capacity is being set.  In other words, shrinking the allocated size may be allowed, and it would not allocate more than requested.  It also implies the following:

a.capacity = x;
assert(a.capacity == x);

Which is not true.  The capacity request is only the *minimum* size the GC should allocate.  It is allowed to allocate more.

The poster (Clemens) suggested to name it reserve(), after the STL's equivalent member functions.  It does sound more correct, but I kind of liked the idea of having it be a property (which reserve doesn't make sense as).

I would still use the name capacity as a property for the getter.

Does anyone else have any opinions?

-Steve



----- Original Message ----
> From: Steve Schveighoffer <schveiguy at yahoo.com>
> 
> Ran into an issue with array properties:
> 
> http://d.puremagic.com/issues/show_bug.cgi?id=3857
> 
> I'm going to make the function not a property for now (at least for the "set capacity" part).  Should I wait for this bug to be fixed, or should we discuss a different scheme?
> 
> My plans were for this to work:
> 
> int[] x;
> 
> x.capacity = 10000; // set the capacity to 10000 elements
> assert(x.capacity >= 10000); // get the current capacity (elements allocated +
> available)
> int *ptr = x.ptr;
> while(x.length < 10000)
>    x ~= 1;
> assert(x.ptr == ptr); // no reallocation
> x.length = 0;
> x.shrinkToFit(); // name up for debate
> x ~= 1;
> assert(x.ptr == ptr); // resized the allocation length, so it can be reused as a
> buffer.
> 
> I'll have to change the write property 'capacity' to a function 'setCapacity' for now.
> 
> -Steve
> 
> 
> 
> 
> ----- Original Message ----
> > From: Steve Schveighoffer
> > To: Discuss the phobos library for D
> > Sent: Tue, February 23, 2010 1:47:56 PM
> > Subject: Re: [phobos] Array preallocate function
> > 
> > I understand your objections to minimize.  I don't really have a preference,
> but
> > I don't really like shrinkToFit.  I'll use that for now, since it's probably easy to search/replace later.
> > 
> > -Steve
> > 
> > 
> > 
> > ----- Original Message ----
> > > From: Andrei Alexandrescu
> > > > What about a "minimize" function, which simply truncates any "allocated" length after an array.  So you would reset an array via:
> > > > 
> > > > arr.length = 0; arr.minimize();
> > > > 
> > > > The advantage here is the array's length is not affected, just the allocated length is reduced to match the array's length.  There are less invalid cases to worry about (i.e. "shrinking" to something larger doesn't make any sense).
> > > 
> > > Sounds good. I'd choose a more specific name, e.g. shrinkToFit. Minimize has
> 
> > me
> > > think of optimization functions.
> > > 
> > > Andrei
> > > _______________________________________________
> > > phobos mailing list
> > > phobos at puremagic.com
> > > http://lists.puremagic.com/mailman/listinfo/phobos
> > 
> > 
> > 
> > 
> > _______________________________________________
> > phobos mailing list
> > phobos at puremagic.com
> > http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos




« First   ‹ Prev
1 2