October 21, 2015
On Wednesday, 21 October 2015 at 22:12:32 UTC, Zz wrote:
> On Wednesday, 21 October 2015 at 19:17:52 UTC, Andrei Alexandrescu wrote:
>> On 10/21/2015 02:18 PM, Zz wrote:
>>>
>>> While looking at containers take a look at Jiri Soukup's work some good
>>> ideas could come from there.
>>>
>>> http://www.amazon.ca/Serialization-Persistent-Objects-Structures-Efficient/dp/3642393225/ref=sr_1_1?ie=UTF8&qid=1386946808&sr=8-1&keywords=SERIALIZATION+AND+PERSISTENT+OBJECTS#productPromotions%22%20target=%22_blank
>>>
>>>
>>> Intrusive Data Structures.
>>> http://www.codefarms.com/home
>>> http://www.codefarms.com/dol
>>> http://www.codefarms.com/ppf
>>> http://www.codefarms.com/ptl
>>
>> Ah, Codefarms is still around. Nice. I remember I've been reading Soukup's articles back in the days of the C++ Report magazine. I couldn't make sense of them, nor could any of my friends. -- Andrei
>
> The following links give an out line of DOL (The main library), most of the articles talked about the newer PTL (at that time).
>
> For DOL
> http://www.codefarms.com/dolclasses
> http://www.codefarms.com/docs/dol/index.htm for the list of available organisations.
>
> As for PTL the following gives an outline.
> http://www.codefarms.com/docs/ptl/chap4.htm
>
> DOL was Macro based while PTL used templates.
>
> Zz

Sorry the list of available organisations is here:
http://www.codefarms.com/docs/dol/ch11aorg.htm

Zz
October 21, 2015
On 10/21/2015 06:04 PM, Timon Gehr wrote:
> Good point, but might it not be the case that alternative
> implementations of some operations are faster for a small number of
> elements?

A container could take advantage of the knowledge for both layout and primitives implementation. As I like to say in one of my seminar: "Few elements mean structure". -- Andrei
October 22, 2015
I needed containers, so I implemented a few:
https://github.com/bitwise-github/d-collections

Currently included:

List(T) - contiguous list like std::vector
LinkedList(T) - doubly linked list like std::list
Queue(T) - double-ended queue/contiguous circular buffer
Stack(T) - contiguous fifo stack

I would like to eventually add some kind(s) of hash table(s), but it's a huge topic, and I don't immediately need one.

I wrote all of the containers from scratch, so they are an accurate/current representation of what I would like to see.

The highlights:

1) both value and reference usage is possible.

example: <list.d>
struct ListBase { ... }   // value type list
alias RefCounted!ListBase List;   // ref counted list

List!int  list1;             // list1 is ref counted
ListBase!int  list2;    // list2 is a value type

I initially had this reversed, where List(T) was a value type, and ListRef(T) was a reference type, but I flipped it the other way to avoid unneeded copying for naive usage. I'm still not sure this is the right direction though. Optimization is a job of it's own, and it's a job for a pro. I believe the amount of effort spent to idiot-proof(for lack of a better term) the containers should be limited. Especially with the power of D's ranges, I think containers should generally(and in my code, usually do) stay put. IMO, If someone wants to pass something around, they should pass a range, not the container itself. My solution, however, does support both(ref/value), if needed.

What's currently being done in std.container.Array(T) is very limiting. With my approach, you have the option of value type or reference type. You do not have this option with the current "reference type" containers. Also, while you're okay if you get a range to the container, there is extra indirection every time you access the container through opIndex(), back(), front(), etc.. I shouldn't have to pay for this extra indirection/allocation, especially when the containers are about to be designed from scratch with this issue fully known. By embedding a reference count in the container, you limit the maximum achievable performance for what should be(IMO) an obscure use case(passing a container around). Again though, my solution allows both usages.

2) Cursors(iterators)

My Cursors are similar to C++ iterators, but have several key differences.

a) There is no begin/end/rbegin/rend/cbegin/cend/crbegin/crend... The cursor knows it limits, and is considered "alive" until it has been advanced(by one) beyond either end of the container. Similar to an empty range, an invalidated Cursor is gone for good, but can be copied ahead of time, before iteration.

b) cursors can be joined to form a range. Unlike C++ however, both cursors must point to elements in the container since there is no begin/end, and the join is inclusive. There is no "past the end of the container" type of Cursor to refer to. Joining two cursors can, at the minimum, result in a Range with a length of 1.

example:

List!int a = [1, 2, 3];
auto cur1 = a.first;
auto cur2 = a.last;
auto rng1 = cur1 ~ cur2; // join two cursors
assert(rng1.equal([1, 2, 3]));
auto rng2 = ~cur1; // convert cursor to range
assert(rng2.equal([1]));
assert(cur1 && cur1.alive);
--cur1;
assert(!cur1 && !cur1.alive);

for(auto c = a.first; c; ++c)
	writeln(*c);

My containers' find() methods return cursors, not ranges. The result is that code like this actually removes the element you intended to remove, and nothing else:

a.remove( a.find(1) );

find() should not return elements that do not match the query the programmer supplied as an argument, which is currently the case with range-based find().

I'm still considering whether the traditional C++ approach(begin/end/etc..) is a better idea, but haven't reach a conclusion. I do know, however, that ranges alone are not enough.

3) Gravy.

My containers contain, by default, most of what you'll need to use them. The counter argument is code-reuse, but my containers are lightyears away from prohibitively complex, so I think the way I've done things is fine. Including core functionality into containers allows for more efficient implementations in some cases. For example, a LinkedList(T) sort. This approach is also much friendlier to documentation and intellisense, and allows a programmer to find everything they need in one place.

--

There is still work to do, of course.


On Wednesday, 21 October 2015 at 11:05:12 UTC, Andrei Alexandrescu wrote:
>Design by Introspection.

I prefer simple, predictable constructs.

> 2. Reference containers.
> 3. Eager value containers.

Both.


     Bit

October 22, 2015
On Thursday, 22 October 2015 at 02:13:12 UTC, bitwise wrote:
> I needed containers, so I implemented a few:
> https://github.com/bitwise-github/d-collections
>
> Currently included:
>
> List(T) - contiguous list like std::vector
> LinkedList(T) - doubly linked list like std::list
> Queue(T) - double-ended queue/contiguous circular buffer
> Stack(T) - contiguous fifo stack
>

You got such a good start with the first 2 ones, but somehow missed on consistency. You should name the 2 other ones QueueList and StackList.

> I initially had this reversed, where List(T) was a value type, and ListRef(T) was a reference type, but I flipped it the other way to avoid unneeded copying for naive usage. I'm still not sure this is the right direction though. Optimization is a job of it's own, and it's a job for a pro. I believe the amount of effort spent to idiot-proof(for lack of a better term) the containers should be limited. Especially with the power of D's ranges, I think containers should generally(and in my code, usually do) stay put. IMO, If someone wants to pass something around, they should pass a range, not the container itself. My solution, however, does support both(ref/value), if needed.
>

Reference types or COW are definitively the way to go for the naive road. Think about it, 40% to 50% of chrome memory is std::string . Eager copy is not a healthy default.

> There is still work to do, of course.
>

The elephant in the room: make the template parameter's type qualifier transitive with the collection's qualifier.

October 22, 2015
On 21-Oct-2015 19:13, Ola Fosheim Grøstad wrote:
> 5. Lock-free data structures.
>

More generally - concurrent. It may have fine-grained locking, it may be obstruction-free or even wait-free.

-- 
Dmitry Olshansky
October 22, 2015
Am 22.10.2015 um 00:15 schrieb Zz:
> On Wednesday, 21 October 2015 at 22:12:32 UTC, Zz wrote:
>> On Wednesday, 21 October 2015 at 19:17:52 UTC, Andrei Alexandrescu wrote:
>>> On 10/21/2015 02:18 PM, Zz wrote:
>>>>
>>>> While looking at containers take a look at Jiri Soukup's work some good
>>>> ideas could come from there.
>>>>
>>>> http://www.amazon.ca/Serialization-Persistent-Objects-Structures-Efficient/dp/3642393225/ref=sr_1_1?ie=UTF8&amp;qid=1386946808&amp;sr=8-1&amp;keywords=SERIALIZATION+AND+PERSISTENT+OBJECTS#productPromotions%22%20target=%22_blank
>>>>
>>>>
>>>>
>>>> Intrusive Data Structures.
>>>> http://www.codefarms.com/home
>>>> http://www.codefarms.com/dol
>>>> http://www.codefarms.com/ppf
>>>> http://www.codefarms.com/ptl
>>>
>>> Ah, Codefarms is still around. Nice. I remember I've been reading
>>> Soukup's articles back in the days of the C++ Report magazine. I
>>> couldn't make sense of them, nor could any of my friends. -- Andrei
>>
>> The following links give an out line of DOL (The main library), most
>> of the articles talked about the newer PTL (at that time).
>>
>> For DOL
>> http://www.codefarms.com/dolclasses
>> http://www.codefarms.com/docs/dol/index.htm for the list of available
>> organisations.
>>
>> As for PTL the following gives an outline.
>> http://www.codefarms.com/docs/ptl/chap4.htm
>>
>> DOL was Macro based while PTL used templates.
>>
>> Zz
>
> Sorry the list of available organisations is here:
> http://www.codefarms.com/docs/dol/ch11aorg.htm
>
> Zz

Intrusive data structures have their strengths especially when nodes are
part of several containers.
I implemented some of the intrusive containers back in D1 times.
See

http://dsource.org/projects/nova/browser/trunk/nova/ds/intrusive

KlausO

October 22, 2015
On Wednesday, 21 October 2015 at 11:05:12 UTC, Andrei Alexandrescu wrote:
> Andrei

Exiting, times ahead!

One thing that has caught my attention lately:

I believe one way of making `std.experimental.allocator` usage (more) automatic is to add subtypes of containers for specific limited access patterns.

For instance, some graph algorithms, such a calculating subgraphs,

for instance: https://github.com/nordlow/justd/blob/master/knet/setops.d#L10

internally uses hashsets (currently implemented as a bool[Value])

for instance: https://github.com/nordlow/justd/blob/master/knet/setops.d#L15

only require the two primitives:

- bool insert(Value value): returns true if value was stored, not already existed
- bool contains(Value value)

In this case (when no references are leaked), `HashSet` could be sub-typed to, say `ExpandableConstantHashSet`, which can @safely and automatically make use of a `RegionAllocator` for super-performance.

In this way, even complete beginners in D, could @safely use Phobos containers to beat C++ performance.

Destroy!
October 22, 2015
On Wednesday, 21 October 2015 at 11:05:12 UTC, Andrei Alexandrescu wrote:
> Internally, functional containers take advantage of common substructure and immutability to share actual data. The classic resource for defining and implementing functional containers is http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504.

Online at: https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
October 22, 2015
On Wednesday, 21 October 2015 at 16:25:19 UTC, Jonathan M Davis wrote:
> My experience with immutable containers is that their performance is trash precisely because you can't mutate them.

Well, in context of concurrency immutable containers compete with concurrent containers, not with mutable single-threaded ones.

On Wednesday, 21 October 2015 at 14:06:43 UTC, Jonathan M Davis wrote:
>> 3. Eager value containers.
>
> _Maybe_ this makes sense for specific container types, but in general, it's a horrible idea IMHO. Almost all containers involve allocating something on the heap internally, so the fact that they're treated as values doesn't avoid heap allocations, and reference-counting reference containers solves the problem of the containers living past the end of the lifetime of whatever owns them. And having value type containers is incredibly error-prone - both from an efficiency standpoint and correctness standpoint.

I suppose stl containers are value types because everything is value type in C++, they're just consistent. And then you have 3 or so ways to create a value type and 6 ways to pass it around by reference, and you choose what you want.
October 22, 2015
On Wed, 2015-10-21 at 14:49 -0400, Andrei Alexandrescu via Digitalmars- d wrote:
> 
[…]
> That's actually the experience in the Scala community. Over and again people start with immutable containers all over the place because they're cool, and end up with mutable containers because they work. -
> 

For some applications, I am sure this true. For the majority no. I have no doubt that for some people fashion and trendy are decision making criteria when clearly they should be thinking about efficacy and efficiency. However it is important to remember that persistent/functional data structures are generally both efficacious and efficient in a fine-grain parallel context. Sadly a lot of this is advocacy research because there is no systematic, statistically significant research that is credible. Scala might have been a nice context to try some experimentation but they still do not have really good data structure implementations. Hence the recent notification of another rewrite.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder