Feature Request: Linked Arrays - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Feature Request: Linked Arrays

Thread overview

Feature Request: Linked Arrays
May 02, 2008 downs
May 02, 2008 Robert Fraser
May 02, 2008 janderson
May 02, 2008 downs
May 02, 2008 Christopher Wright
May 02, 2008 Steven Schveighoffer
May 03, 2008 janderson
May 03, 2008 downs
May 04, 2008 Steven Schveighoffer
May 03, 2008 janderson
May 03, 2008 janderson
May 02, 2008 Sean Kelly
May 02, 2008 bearophile
May 07, 2008 BLS

May 02, 2008

Feature Request: Linked Arrays

Posted by downs

downs

Since one of D's weaknesses is bad speed when appending to dynamic arrays, I've been thinking about a way to fix this.

Linked arrays are a hybrid between flat arrays and linked list: a single linked list of array chunks.

Basically, a linked array (syntax: T[>], new T[50>]) looks like this:

struct LinkedArray {
  void* start;
  size_t my_length;
  size_t length();
  LinkedArray* nextBlock, lastBlock;
}

When appending to a LinkedArray, it is first checked if the current block can be grown to accomodate the new member(s) *without* reallocating.

If it can't, instead of reallocating and copying, a new LinkedArray is tacked on the end.

Because of this, there is never redundancy as with current arrays, and the load on the GC can be vastly reduced.

Appending is also much closer to linear than with current arrays. This makes Linked Arrays highly suitable for queues and buffers.

On the other hand, they're completely incompatible with C libraries, don't expose a .ptr property (not flat), and because of the complexities of their layout, all slices have to be constant (value, not reference).

But these disadvantages are, imho, more than made up by the benefits they'd offer as an alternative to current arrays.

Also, they don't need new keywords :)

Whaddya think?

 --downs

May 02, 2008

Re: Feature Request: Linked Arrays

Posted by Robert Fraser
in reply to downs

Robert Fraser

Posted in reply to downs

downs wrote:
> Since one of D's weaknesses is bad speed when appending to dynamic arrays, I've been thinking about a way to fix this.
> 
> Linked arrays are a hybrid between flat arrays and linked list: a single linked list of array chunks.
> 
> Basically, a linked array (syntax: T[>], new T[50>]) looks like this:
> 
> struct LinkedArray {
>   void* start;
>   size_t my_length;
>   size_t length();
>   LinkedArray* nextBlock, lastBlock;
> }
> 
> When appending to a LinkedArray, it is first checked if the current block can be grown to accomodate the new member(s) *without* reallocating.
> 
> If it can't, instead of reallocating and copying, a new LinkedArray is tacked on the end.
> 
> Because of this, there is never redundancy as with current arrays, and the load on the GC can be vastly reduced.
> 
> Appending is also much closer to linear than with current arrays. This makes Linked Arrays highly suitable for queues and buffers.
> 
> On the other hand, they're completely incompatible with C libraries, don't expose a .ptr property (not flat), and because of the complexities of their layout, all slices have to be constant (value, not reference).
> 
> But these disadvantages are, imho, more than made up by the benefits they'd offer as an alternative to current arrays.
> 
> Also, they don't need new keywords :)
> 
> Whaddya think?
> 
>  --downs

I'd rather ask the GC to expose a canExpand() method and have it implemented in the library.

May 02, 2008

Re: Feature Request: Linked Arrays

Posted by janderson
in reply to Robert Fraser

janderson

Posted in reply to Robert Fraser

Robert Fraser wrote:
> downs wrote:
>> Since one of D's weaknesses is bad speed when appending to dynamic arrays, I've been thinking about a way to fix this.
>>
>> Linked arrays are a hybrid between flat arrays and linked list: a single linked list of array chunks.
>>
>> Basically, a linked array (syntax: T[>], new T[50>]) looks like this:
>>
>> struct LinkedArray {
>>   void* start;
>>   size_t my_length;
>>   size_t length();
>>   LinkedArray* nextBlock, lastBlock;
>> }
>>
>> When appending to a LinkedArray, it is first checked if the current block can be grown to accomodate the new member(s) *without* reallocating.
>>
>> If it can't, instead of reallocating and copying, a new LinkedArray is tacked on the end.
>>
>> Because of this, there is never redundancy as with current arrays, and the load on the GC can be vastly reduced.
>>
>> Appending is also much closer to linear than with current arrays. This makes Linked Arrays highly suitable for queues and buffers.
>>
>> On the other hand, they're completely incompatible with C libraries, don't expose a .ptr property (not flat), and because of the complexities of their layout, all slices have to be constant (value, not reference).
>>
>> But these disadvantages are, imho, more than made up by the benefits they'd offer as an alternative to current arrays.
>>
>> Also, they don't need new keywords :)
>>
>> Whaddya think?
>>
>>  --downs
> 
> I'd rather ask the GC to expose a canExpand() method and have it implemented in the library.

I agree.

BTW, to state the obvious:

"Linked arrays" have several dis-advantages when compared to arrays. Namely after a while of lots of insertions deletions they become very fragmented and performance slows down.   The main thing you have to watch out for is traversal. Often I've seen code slow down because people write a data structure to speed up insertion, not realizing that most of the time was spent in traversal.  By using links the work of the pre-fetcher, code optimizer and cache is increased.

Granted, lots of long small arrays are better then a link-list however it has nothing on single array traversal.  So really to figure out how your data will be used.  How many times will a node be visited verse how many times you'll insert something.  As a game programmer, typically array nodes have the potential to be visited several times (maybe every frame) however things can only be inserted once (and normally at the end of the array).

One think what would help, is if you could get the next free aligned block in memory from the previous block.  That way the array would be slightly more cache friendly.

I'm not saying this sort of data struct doesn't have it's uses, sometimes its useful for minimizing Memory Manager fragmentation.  In fact if you have a huge array, sometimes the only way to get around fragmentation is to spit it up.

-Joel

May 02, 2008

Re: Feature Request: Linked Arrays

Posted by downs
in reply to janderson

downs

Posted in reply to janderson

janderson wrote:
> Robert Fraser wrote:
>> I'd rather ask the GC to expose a canExpand() method and have it
>> implemented in the library.
> 
> I agree.

So do I, actually. That would be best.
> 
> BTW, to state the obvious:
> 
> "Linked arrays" have several dis-advantages when compared to arrays. Namely after a while of lots of insertions deletions they become very fragmented and performance slows down.

I don't see how that could happen, if you're only appending at the end.

> The main thing you have to
> watch out for is traversal. Often I've seen code slow down because
> people write a data structure to speed up insertion, not realizing that
> most of the time was spent in traversal.  By using links the work of the
> pre-fetcher, code optimizer and cache is increased.

It depends on the size of the elements.

If it's, say, an int[>], about 1k of them could fit into a single page of memory. At that point, and especially if the next array pointer is stored at the _end_ of the memory range, and considering that linear traversal can easily be enhanced with prefetch instructions, I think the traversal speed difference to flat arrays shrinks considerably.

I don't have the benchs to demonstrate that though. Sorry.

> One think what would help, is if you could get the next free aligned block in memory from the previous block.  That way the array would be slightly more cache friendly.
> 

I don't understand.

> I'm not saying this sort of data struct doesn't have it's uses, sometimes its useful for minimizing Memory Manager fragmentation.  In fact if you have a huge array, sometimes the only way to get around fragmentation is to spit it up.

That's the idea :) Currently, appending to an array is almost embarassingly slow. This forces the use of preallocating, which is unintuitive.

Imnsho :)

> 
> -Joel

 --downs

May 02, 2008

Re: Feature Request: Linked Arrays

Posted by bearophile
in reply to downs

bearophile

Posted in reply to downs

downs Wrote:
> Whaddya think?

In my libs I'll add a Deque class with that structure, that module code is currently in debugging phase. So I think they can be useful.

Bye,
bearophile

May 02, 2008

Re: Feature Request: Linked Arrays

Posted by Christopher Wright
in reply to downs

Christopher Wright

Posted in reply to downs

downs wrote:
> janderson wrote:
>> Robert Fraser wrote:
>>> I'd rather ask the GC to expose a canExpand() method and have it
>>> implemented in the library.
>> I agree.
> 
> So do I, actually. That would be best.
>> BTW, to state the obvious:
>>
>> "Linked arrays" have several dis-advantages when compared to arrays.
>> Namely after a while of lots of insertions deletions they become very
>> fragmented and performance slows down.
> 
> I don't see how that could happen, if you're only appending at the end.

If you're only appending at the end, why use a linked array? Simply so your array does not need to be contiguous?

>> The main thing you have to
>> watch out for is traversal. Often I've seen code slow down because
>> people write a data structure to speed up insertion, not realizing that
>> most of the time was spent in traversal.  By using links the work of the
>> pre-fetcher, code optimizer and cache is increased.
> 
> It depends on the size of the elements.
> 
> If it's, say, an int[>], about 1k of them could fit into a single page of memory. At that point, and especially if the next array pointer is stored at the _end_ of the memory range, and considering that linear traversal can easily be enhanced with prefetch instructions, I think the traversal speed difference to flat arrays shrinks considerably.
> 
> I don't have the benchs to demonstrate that though. Sorry.

Data locality is a strong advantage, too. It depends, I think, on how often you are planning on appending to the array.

I think most people usually use small arrays most of the time, but I don't have data for this.

>> One think what would help, is if you could get the next free aligned
>> block in memory from the previous block.  That way the array would be
>> slightly more cache friendly.
>>
> 
> I don't understand.

When you access memory, the OS pulls in at least one page to cache. Often it'll pull in a bit more, since you're likely to access memory with nearby addresses. So if you plan your data layout around this, you get better performance.

>> I'm not saying this sort of data struct doesn't have it's uses,
>> sometimes its useful for minimizing Memory Manager fragmentation.  In
>> fact if you have a huge array, sometimes the only way to get around
>> fragmentation is to spit it up.
> 
> That's the idea :) Currently, appending to an array is almost embarassingly slow. This forces the use of preallocating, which is unintuitive.
> 
> Imnsho :)

The method is, I think, but not the idea.

May 02, 2008

Re: Feature Request: Linked Arrays

Posted by Sean Kelly
in reply to Robert Fraser

Sean Kelly

Posted in reply to Robert Fraser

Robert Fraser wrote:
> 
> I'd rather ask the GC to expose a canExpand() method and have it implemented in the library.

Easy enough, use GC.sizeOf in Tango or std.gc.capacity in Phobos.


Sean

May 02, 2008

Re: Feature Request: Linked Arrays

Posted by Steven Schveighoffer
in reply to downs

Steven Schveighoffer

Posted in reply to downs

"downs" wrote
> janderson wrote:
>> The main thing you have to
>> watch out for is traversal. Often I've seen code slow down because
>> people write a data structure to speed up insertion, not realizing that
>> most of the time was spent in traversal.  By using links the work of the
>> pre-fetcher, code optimizer and cache is increased.
>
> It depends on the size of the elements.
>
> If it's, say, an int[>], about 1k of them could fit into a single page of memory. At that point, and especially if the next array pointer is stored at the _end_ of the memory range, and considering that linear traversal can easily be enhanced with prefetch instructions, I think the traversal speed difference to flat arrays shrinks considerably.

If each array chunk is the same size, you could store the array in an array of arrays.  Then lookup time becomes O(1), and appending data is not going to cause a copy of the other array chunks that you already have allocated.

If you add a huge piece that is bigger than one chunk, just allocate that piece, then slice the piece up into chunks.

It could actually be an array of pointers, assuming all your chunks are the same size.

I'd prefer that to a linked list of arrays, where traversal is O(n) (with a small constant)

-Steve

May 03, 2008

Re: Feature Request: Linked Arrays

Posted by janderson
in reply to Steven Schveighoffer

janderson

Posted in reply to Steven Schveighoffer

Steven Schveighoffer wrote:
> "downs" wrote
>> janderson wrote:
>>> The main thing you have to
>>> watch out for is traversal. Often I've seen code slow down because
>>> people write a data structure to speed up insertion, not realizing that
>>> most of the time was spent in traversal.  By using links the work of the
>>> pre-fetcher, code optimizer and cache is increased.
>> It depends on the size of the elements.
>>
>> If it's, say, an int[>], about 1k of them could fit into a single page of memory. At that point, and especially if the next array pointer is stored at the _end_ of the memory range, and considering that linear traversal can easily be enhanced with prefetch instructions, I think the traversal speed difference to flat arrays shrinks considerably.
> 
> If each array chunk is the same size, you could store the array in an array of arrays.  Then lookup time becomes O(1), and appending data is not going to cause a copy of the other array chunks that you already have allocated.
> 
> If you add a huge piece that is bigger than one chunk, just allocate that piece, then slice the piece up into chunks.
> 
> It could actually be an array of pointers, assuming all your chunks are the same size.
> 
> I'd prefer that to a linked list of arrays, where traversal is O(n) (with a small constant)
> 
> -Steve 
> 
> 

Yes this is often better, although it depends on what you are doing.

-Joel

May 03, 2008

Re: Feature Request: Linked Arrays

Posted by janderson
in reply to downs

janderson

Posted in reply to downs

downs wrote:
> janderson wrote:
>> Robert Fraser wrote:
>>> I'd rather ask the GC to expose a canExpand() method and have it
>>> implemented in the library.
>> I agree.
> 
> So do I, actually. That would be best.
>> BTW, to state the obvious:
>>
>> "Linked arrays" have several dis-advantages when compared to arrays.
>> Namely after a while of lots of insertions deletions they become very
>> fragmented and performance slows down.
> 
> I don't see how that could happen, if you're only appending at the end.
> 
>> The main thing you have to
>> watch out for is traversal. Often I've seen code slow down because
>> people write a data structure to speed up insertion, not realizing that
>> most of the time was spent in traversal.  By using links the work of the
>> pre-fetcher, code optimizer and cache is increased.
> 
> It depends on the size of the elements.
> 
> If it's, say, an int[>], about 1k of them could fit into a single page of memory. At that point, and especially if the next array pointer is stored at the _end_ of the memory range, and considering that linear traversal can easily be enhanced with prefetch instructions, I think the traversal speed difference to flat arrays shrinks considerably.
> 
> I don't have the benchs to demonstrate that though. Sorry.
> 
>> One think what would help, is if you could get the next free aligned
>> block in memory from the previous block.  That way the array would be
>> slightly more cache friendly.
>>
> 
> I don't understand.
> 
>> I'm not saying this sort of data struct doesn't have it's uses,
>> sometimes its useful for minimizing Memory Manager fragmentation.  In
>> fact if you have a huge array, sometimes the only way to get around
>> fragmentation is to spit it up.
> 
> That's the idea :) Currently, appending to an array is almost embarassingly slow. This forces the use of preallocating, which is unintuitive.
> 
> Imnsho :)
> 
>> -Joel
> 
>  --downs


This may help:

http://video.google.com/videosearch?hl=en&safe=off&pwst=1&resnum=0&q=herb%20sutter&um=1&ie=UTF-8&sa=N&tab=wv

Note that smallest cache lines could be as small as 64bit.

-Joel

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation