August 06, 2008
> You can find a very efficient stack in the new Tango containers.

This raises the questions:
a. How does the structures from dcollection and Tango compare, in terms of integration to tango, speed, memory efficiency, richness of API?
b. Why aren't the efforts of both libraries combined ? Dcollection is maybe not mature enough ? Are there plans to merge them if dcollections happen to be more efficient ?
August 06, 2008
maelp wrote:

>> You can find a very efficient stack in the new Tango containers.
> 
> This raises the questions:
> a. How does the structures from dcollection and Tango compare, in terms of
> integration to tango, speed, memory efficiency, richness of API? b. Why
> aren't the efforts of both libraries combined ? Dcollection is maybe not
> mature enough ? Are there plans to merge them if dcollections happen to be
> more efficient ?

Implementation wise, the approaches are quite different, thus it is not likely that it is easy to merge much. It is possible that dcollections has a richer API (I don't know), but Tango's is implemented to be extemely fast and memory efficient (and have a fairly rich API). In terms of age, dcollections is actually slightly older than tango.util.container, the tango.util.collection package in Tango will be deprecated.

Steven is probably best suited to answer further questions, but he _has_ contributed to the new Tango containers.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
August 06, 2008
Lars Ivar Igesund:
> You can find a very efficient stack in the new Tango containers.

I see... I'll try to port that code to Phobos then, thank you (but I'll keep my deque, because sometimes I need more than a stack. I'll add the deque to my libs when I can).

Bye,
bearophile
August 06, 2008
"bearophile" wrote
> Steven Schveighoffer:
>> The only thing is that just because
>> the library 'builds' doesn't mean it all works :)  The classes/structs
>> are
>> all templates and so won't really 'compile' until you use them.
>
> But D unittests exists to test all templates too!
>
> Is an efficient deque too included? I often find the need for it (or just a stack). I have partially written a deque data structure (implemented as dynamic array of pointers to fixed size arrays, it's not an unrolled linked list), I may finish it to add it to your collections...

I'm assuming you are talking about STL's deque?  I don't have anything exactly.  You can implement a stack with the LinkList class.  If you end up finishing your deque, I'll gladly take a look at adding it to dcollections.

The ArrayMultiset is implemented as a linked list of dynamic arrays, a similar approach would probably work for a deque.

-Steve


August 06, 2008
"maelp" wrote
>> You can find a very efficient stack in the new Tango containers.
>
> This raises the questions:
> a. How does the structures from dcollection and Tango compare, in terms of
> integration to tango, speed, memory efficiency, richness of API?

At the moment, Tango is more efficient for speed when using keys or values that do not have GC pointers.  When using keys or values that do have GC pointers, dcollections is more efficient, however, i have submitted a ticket for Tango to include the allocator I use in dcollections to make this more efficient, and then I think Tango will have the edge in performance. However, I plan at some point to implement the same allocators in dcollections that Tango uses for the non-GC pointer types.  At this time, Tango probably still will have the edge in performance because of the design of it.

> b. Why aren't the efforts of both libraries combined ? Dcollection is maybe not mature enough ? Are there plans to merge them if dcollections happen to be more efficient ?

The two libraries have different design goals.  My design is much closer to C++'s containers, and I made sure all of dcollection's containers can be iterated in both directions, and the cursors were the main mode of operation when operating on the containers.  Tango's containers focus on speed and efficiency, while ease of iteration is a secondary goal (I'm only guessing here, as I didn't design it, but it appears that way).

However, as a contributor to Tango, I'll always try to contribute any new ideas I have with dcollections as long as it fits within the design of Tango's containers.

I doubt that dcollections will ever be more efficient than Tango's containers, but I still plan on maintaining it because I prefer the interface.  Complexity-wise, they should be about the same.

-Steve


August 06, 2008
"Steven Schveighoffer"
> "bearophile" wrote
>> Steven Schveighoffer:
>>> The only thing is that just because
>>> the library 'builds' doesn't mean it all works :)  The classes/structs
>>> are
>>> all templates and so won't really 'compile' until you use them.
>>
>> But D unittests exists to test all templates too!
>>
>> Is an efficient deque too included? I often find the need for it (or just a stack). I have partially written a deque data structure (implemented as dynamic array of pointers to fixed size arrays, it's not an unrolled linked list), I may finish it to add it to your collections...
>
> I'm assuming you are talking about STL's deque?  I don't have anything exactly.  You can implement a stack with the LinkList class.  If you end up finishing your deque, I'll gladly take a look at adding it to dcollections.
>
> The ArrayMultiset is implemented as a linked list of dynamic arrays, a similar approach would probably work for a deque.

Actually, it wouldn't work to provide O(1) lookup.  I think you need probably 2 arrays of arrays to provide the O(1) prepend, one that goes 'down' instead of 'up'.

I'll think about how this could be implemented.  Maybe I'll take a look at the design of gcc's deque.

-Steve


August 06, 2008
"bearophile" wrote
> Steven Schveighoffer:
>> The only thing is that just because
>> the library 'builds' doesn't mean it all works :)  The classes/structs
>> are
>> all templates and so won't really 'compile' until you use them.
>
> But D unittests exists to test all templates too!

Yeah, dcollections is pretty lacking in unittests :(

That's one of my TODOs

-Steve


August 06, 2008
Steven Schveighoffer:
> I'm assuming you are talking about STL's deque?

I am talking about Python collections.deque:
http://docs.python.org/lib/deque-objects.html
The purpose of my libs is yet another, (often but not always) to mimic the ease of use of Python ;-)


> I don't have anything exactly.  You can implement a stack with the LinkList class.

A normal linked list may be too much slow for this because it allocates too many small structures, and iterating on it can be slow for low space efficiency and low cache coherence.


> If you end up finishing your deque, I'll gladly take a look at adding it to dcollections.

Very good :-) But I don't know how much well its style can fit in. For example I have already written an hash set into my libs, but its API is modeled around the Python sets API...


> Actually, it wouldn't work to provide O(1) lookup.  I think you need
> probably 2 arrays of arrays to provide the O(1) prepend, one that goes
> 'down' instead of 'up'.
> I'll think about how this could be implemented.

I think there are two good ways to implement a mutable deque data structure:

1) Dynamic array of pointers to fixed-size arrays that I call blocks.
2) Unrolled linked list (double linked, usually), that is a chain of fixed-size arrays, where you add/remove items only at the tail/head and not in the middle.

Image of the first implementation: http://pages.cpsc.ucalgary.ca/~kremer/STL/1024x768/deque.jpeg

(If you are using immutable data structures you need much more complex stuff, like finger trees).

Python deque is implemented as the second one, the deque of the STL is often the first one.

They have different advantages and disadvantages, so ideally you may create a collection that includes both implementations (but I presume no one does this). And both can be tuned changing the block size (a smart data structure can keep few runtime statistics of its usage and adapt its block size to the specific usage as time goes on, I'll think about this).

They are both fast in iteration, probably just about 2-2.5 times slower than iterating over a normal array.

The second one is probably a bit simpler to implement. The bigger difference is that in the second one the access time to random items isn't O(1), but if the blocks are large enough this isn't a big problem.

The first implementation may be a bit slower for the normal deque usage, because once in a while you have to reset/tidy the main dynamic array (see below).

In both implementations you have to keep two pointers or two lengths that keep how much items are contained into the first and last fixed-size arrays (in a generic unrolled linked list implementation you can have holes everywhere, so you have to store how many items are present in each block, and you have to merge/split adjacent blocks too much empty/filled up, this makes the management less easy. A deque makes things simpler).

In the first implementation the dynamic array can be managed as a circular deque, so you may need a modulus operation each time you access an item randomly, this requires a bit of time more than accessing items sequentially in the second implementation.

When the circular deque is too much empty/filled up, you have to create a new one and copy data, but this generally requires little time because this dynamic arrays is 300/1000/10000 times smaller than the total number of items inserted into the deque. So there's probably no need to invent more complex management schemes for this, like you say.

Bye,
bearophile
August 06, 2008
On Wed, Aug 6, 2008 at 11:04 PM, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> "Steven Schveighoffer"
>> "bearophile" wrote
>>> Steven Schveighoffer:
>>>> The only thing is that just because
>>>> the library 'builds' doesn't mean it all works :)  The classes/structs
>>>> are
>>>> all templates and so won't really 'compile' until you use them.
>>>
>>> But D unittests exists to test all templates too!
>>>
>>> Is an efficient deque too included? I often find the need for it (or just a stack). I have partially written a deque data structure (implemented as dynamic array of pointers to fixed size arrays, it's not an unrolled linked list), I may finish it to add it to your collections...
>>
>> I'm assuming you are talking about STL's deque?  I don't have anything exactly.  You can implement a stack with the LinkList class.  If you end up finishing your deque, I'll gladly take a look at adding it to dcollections.
>>
>> The ArrayMultiset is implemented as a linked list of dynamic arrays, a similar approach would probably work for a deque.
>
> Actually, it wouldn't work to provide O(1) lookup.  I think you need
> probably 2 arrays of arrays to provide the O(1) prepend, one that goes
> 'down' instead of 'up'.
>
> I'll think about how this could be implemented.  Maybe I'll take a look at the design of gcc's deque.

I think the wikipedia article on deque implementations is actually pretty good.
One way to implement is to use a circular buffer internally.  You can
tack on elements at either end of the empty space.  It also works very
efficiently for the case where you are hovering around a constant
number of elements.   No re-allocs needed in that case.  And all your
extra capacity can always be used for either end of the queue.   When
you are forced to realloc you can shift elements at the same time so
they're all at the front or back end of the new buffer.

Anyway there are various ways to do it, but that was the implementation that sounded coolest to me.  Most of the others don't do well with the case of stead flow in and steady flow out.  They force either copying or reallocations.

--bb
1 2
Next ›   Last »