On "A New Collections Framework for the Standard Library" - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » On "A New Collections Framework for the Standard Library"

Thread overview

On "A New Collections Framework for the Standard Library"
May 18, 2017 Jack Stouffer
May 18, 2017 Jonathan M Davis
May 18, 2017 Jack Stouffer
May 18, 2017 Jack Stouffer
May 18, 2017 Andrei Alexandrescu
May 18, 2017 Andrei Alexandrescu
May 19, 2017 Jack Stouffer
May 19, 2017 Jonathan M Davis
Nov 13, 2017 Mark
Nov 13, 2017 Andrei Alexandrescu
May 19, 2017 Eugene Wissner

May 18, 2017

On "A New Collections Framework for the Standard Library"

Posted by Jack Stouffer

Jack Stouffer

I just got around to watching Eduard Staniloiu's talk at DConf [1] about the collections library he was working on. One thing seemed odd, in that Eduard seems to be saying that the container and the range over the container's elements are one in the same and the range primitives are on the container itself.

Is this accurate?

If so, then this is completely different than the currently accepted design of ranges and containers. Currently, a container allows the removal and insertion of elements, and to get a range of the elements you have to call a function or slice the container. This is how std.containers and Brain's EMSI container library. In this new library, what happens when I iterate the elements of one of these containers using foreach? Do the range primitives empty the container? How do I ask for a new, full range of my data in the collection to use in a range algorithm after I've already iterated over the elements?

As Andrei originally laid out in this talk introducing ranges [2], ranges only shrink and never grow. Put another way, Output Ranges and Input Ranges are two separate things. Andrei states durning that talk (at 1:20:20)

"Can ranges ever grow? ... No, never, because it would be fundamentally unsafe."

I'm not sure what Andrei meant by this, but he believes that growing ranges is also a bad idea.

[1] https://www.youtube.com/watch?v=k88QSIC-Na0
[2] https://archive.org/details/AndreiAlexandrescuKeynoteBoostcon2009

May 18, 2017

Re: On "A New Collections Framework for the Standard Library"

Posted by Jonathan M Davis
in reply to Jack Stouffer

Jonathan M Davis

Posted in reply to Jack Stouffer

On Thursday, May 18, 2017 15:18:00 Jack Stouffer via Digitalmars-d wrote:
> I just got around to watching Eduard Staniloiu's talk at DConf [1] about the collections library he was working on. One thing seemed odd, in that Eduard seems to be saying that the container and the range over the container's elements are one in the same and the range primitives are on the container itself.
>
> Is this accurate?
>
> If so, then this is completely different than the currently accepted design of ranges and containers. Currently, a container allows the removal and insertion of elements, and to get a range of the elements you have to call a function or slice the container. This is how std.containers and Brain's EMSI container library. In this new library, what happens when I iterate the elements of one of these containers using foreach? Do the range primitives empty the container? How do I ask for a new, full range of my data in the collection to use in a range algorithm after I've already iterated over the elements?
>
> As Andrei originally laid out in this talk introducing ranges [2], ranges only shrink and never grow. Put another way, Output Ranges and Input Ranges are two separate things. Andrei states durning that talk (at 1:20:20)
>
> "Can ranges ever grow? ... No, never, because it would be fundamentally unsafe."
>
> I'm not sure what Andrei meant by this, but he believes that growing ranges is also a bad idea.
>
> [1] https://www.youtube.com/watch?v=k88QSIC-Na0
> [2]
> https://archive.org/details/AndreiAlexandrescuKeynoteBoostcon2009

That point concerned me as well. Dynamic arrays in D are very strange beasts indeed, and while it works for them to function as both ranges and (sort of) containers, it's also created a fair bit of confusion, and it really a fair bit of what is done with dynamic arrays are _not_ range-based functions (e.g. appending), making the whole situation that much more confusing when range-based functions and array-specific functions are mixed (which is definitely going to happen in stuff like string code). And a container as a range really makes no sense to me. It makes slightly more sense than an iterator being a container (since it is a range of elements not just one), but it still goes against how containers normally work. A big part of what's different with dynamic arrays is that they don't actually own or manage their own memory - rather they refer to memory from elsewhere, and how that memory is managed depends on where that memory comes from. Yes, after appending, it can be guaranteed to be the GC, but even then, it's the GC that manages what goes on with the elements. No one dynamic array owns them, since you can have multiple of them referring to the same memory. So, they're quite schizophrenic.

Now, I could see it working to treat a container as a range if you're talking about _immutable_ containers - especially if their memory is owned by the GC (and immutable containers are definitely part of what's being worked on) - since you don't really add to those (at most, you create a new container with the same elements plus more), but for mutable containers, that just doesn't seem right.

So, yes, that bit about modeling containers after dynamic arrays greatly concerns me, but I figured that I'd wait and see the actual API before complaining about it. And it may very well turn out that what they're doing actually makes a lot of sense once you actually see the details. I don't know.

- Jonathan M Davis

May 18, 2017

Re: On "A New Collections Framework for the Standard Library"

Posted by Jack Stouffer
in reply to Jonathan M Davis

Jack Stouffer

Posted in reply to Jonathan M Davis

On Thursday, 18 May 2017 at 15:38:39 UTC, Jonathan M Davis wrote:
> That point concerned me as well. Dynamic arrays in D are very strange beasts indeed, and while it works for them to function as both ranges and (sort of) containers, it's also created a fair bit of confusion, and it really a fair bit of what is done with dynamic arrays are _not_ range-based functions (e.g. appending), making the whole situation that much more confusing when range-based functions and array-specific functions are mixed (which is definitely going to happen in stuff like string code).

Yes, adding in the free function versions of the range primitives in std.range, while convenient initially, seems to have had large negative effects down the line.

May 18, 2017

Re: On "A New Collections Framework for the Standard Library"

Posted by Jack Stouffer
in reply to Jack Stouffer

Jack Stouffer

Posted in reply to Jack Stouffer

On Thursday, 18 May 2017 at 15:18:00 UTC, Jack Stouffer wrote:
> ...

Also, shared allocators raise another problem entirely. Let's say for the sake of clarity that these future containers will use a separate range type.

Let's say you have a container with five elements, and then you give a range of those elements to a function that caches the length somewhere because it's available (very common as ranges have never been expected to change before). What happens when you're iterating over the range and then another thread calls removeBack before the loop is completed? All sorts of functions in Phobos will blow up in interesting and different ways.

I think you'd have to make all ranges created from mutable containers input ranges with no extra primitives in order to avoid this problem.

Plus, you also have a problem with things like SList where removeFront can be called after a different thread creates a range. All of a sudden your range `front` is pointing to invalid data. So for that, you need to either

1. Make a thread safe way to invalidate all old ranges (possibly by marking them empty immediately)
2. Allow the program to blow-up and mark all remove functions as @system

May 18, 2017

Re: On "A New Collections Framework for the Standard Library"

Posted by Andrei Alexandrescu
in reply to Jack Stouffer

Andrei Alexandrescu

Posted in reply to Jack Stouffer

On 05/18/2017 11:18 AM, Jack Stouffer wrote:
> I just got around to watching Eduard Staniloiu's talk at DConf [1]
> about the collections library he was working on. One thing seemed
> odd, in that Eduard seems to be saying that the container and the
> range over the container's elements are one in the same and the
> range primitives are on the container itself.
>
> Is this accurate?

That is correct. The fundamental equation is:

Ranges + Optional Primitives = Containers

> If so, then this is completely different than the currently accepted
>  design of ranges and containers. Currently, a container allows the
> removal and insertion of elements, and to get a range of the elements
> you have to call a function or slice the container. This is how
> std.containers and Brain's EMSI container library. In this new
> library, what happens when I iterate the elements of one of these
> containers using foreach?

Iterating over a container using e.g. foreach won't consume the
container same as iterating over int[] won't consume the slice.

> Do the range primitives empty the container? How do I ask for a new,
> full range of my data in the collection to use in a range algorithm
> after I've already iterated over the elements?

There is no way to reclaim the original container. If you create a container, you get a range positioned to "see" the entire container. Once you popFront from that range, you lose all access to the first element in the container, unless you have a separate copy of the range. This is not new and not different from:

auto r = new int[10];
r.popFront;

What happens here is an array of 10 elements gets created. But you don't get a type "array of 10 integers". You get "a slice of integers, incidentally initialized to refer to the entire array of 10 integers that was just created". Next, you decide you don't care about the first element in the array. Once you call r.popFront, access to that element is lost forever.

As an aside, we briefly experimented with a type called "new T[]" which meant to refer to that initial array. I tried to redo a bunch of code with it. It was a failure - figuring out what's an array and what's a slice in the array was a continuous hassle offering no redeeming insight. It took me years to figure actually there's no need for containers and ranges - all you need is ranges.

> As Andrei originally laid out in this talk introducing ranges [2],
> ranges only shrink and never grow. Put another way, Output Ranges and
> Input Ranges are two separate things. Andrei states durning that talk
> (at 1:20:20)
>
> "Can ranges ever grow? ... No, never, because it would be
> fundamentally unsafe."
>
> I'm not sure what Andrei meant by this, but he believes that growing
>  ranges is also a bad idea.

Yes, you're not supposed to grow a range on its existing support without knowing whether you have room. That's a distinct matter. You can always grow a range by means of safe primitives that may duplicate it if needed, such as ~ and ~=.

Andrei

May 18, 2017

Re: On "A New Collections Framework for the Standard Library"

Posted by Andrei Alexandrescu
in reply to Jack Stouffer

Andrei Alexandrescu

Posted in reply to Jack Stouffer

On 05/18/2017 02:17 PM, Jack Stouffer wrote:
> On Thursday, 18 May 2017 at 15:18:00 UTC, Jack Stouffer wrote:
>> ...
>
> Also, shared allocators raise another problem entirely. Let's say for
> the sake of clarity that these future containers will use a separate
> range type.
>
> Let's say you have a container with five elements, and then you give a
> range of those elements to a function that caches the length somewhere
> because it's available (very common as ranges have never been expected
> to change before). What happens when you're iterating over the range and
> then another thread calls removeBack before the loop is completed? All
> sorts of functions in Phobos will blow up in interesting and different
> ways.

If two or more threads have access to a container, that would be shared. Many traditional containers aren't usable gainfully as shared, so we plan to implement a few specialized ones. Anyhow, for now you won't be able to use containers that are shared. -- Andrei

May 19, 2017

Re: On "A New Collections Framework for the Standard Library"

Posted by Jack Stouffer
in reply to Andrei Alexandrescu

Jack Stouffer

Posted in reply to Andrei Alexandrescu

On Thursday, 18 May 2017 at 18:27:22 UTC, Andrei Alexandrescu wrote:
> Iterating over a container using e.g. foreach won't consume the
> container same as iterating over int[] won't consume the slice.

I don't understand why you're mapping the behavior of ranges/slices, which theoretically are shrinking views of data, onto containers of data which can grow or shrink. They seem antithetical to me.

The only reason foreach over a dynamic array doesn't consume the slice is because foreach uses index based iteration on static and dynamic arrays and not the std.range.primitives functions.

It's not evident to me that the non-consuming behavior makes sense to map onto something completely different like containers that won't even necessarily have index based access.

It seems odd to me that slices would be the design basis when slices

1. Are intentionally limited in scope design wise to be a shrinking view of data
2. Only make sense behavior wise for array containers
3. Their ability to grow is a GC oddity which may end up copying a whole lot of data depending on the origin of the slice

> There is no way to reclaim the original container. If you create a container, you get a range positioned to "see" the entire container. Once you popFront from that range, you lose all access to the first element in the container, unless you have a separate copy of the range. This is not new and not different from:
>
> auto r = new int[10];
> r.popFront;
>
> What happens here is an array of 10 elements gets created. But you don't get a type "array of 10 integers". You get "a slice of integers, incidentally initialized to refer to the entire array of 10 integers that was just created". Next, you decide you don't care about the first element in the array. Once you call r.popFront, access to that element is lost forever.

First off, how are you going to do something like a map over a immutable container then, as map uses the range primitives and not foreach? There's no reason in principal that that should cause an issue. But with that design popFront is mutating the underlying data, and therefore should not be allowed. But this works with the std.container design because popFront is mutating the range view which has no reason to be immutable.

Secondly, if I want to say, perform a map over the elements of a container for some output, and I want to do things with the elements later in the code, then I have to create a copy of the container and pass the original to map? If that's the case then why not just go with the std.container behavior of getting a range from a primitive if you end up having to create range copies? I mean it's fundamentally the same, but that way you don't have to worry about the copies changing the data of the container out from under you.

What happens to the copy of a container when the original modifies the underlying data? For example, what should this code print?

    auto container = Array!int(MAllocator.instance);
    container.put(iota(10));

    auto container_copy = container;
    container.popBackN(5);
    container.insertBack(1);

    container_copy.each!(a => write(a, ", "));

In my estimation, a properly designed containers library should print

    0, 1, 2, 3, 4, 1, 5, 6, 7, 8, 9

But from what you're describing, that the containers act like slices, the code with print

    0, 1, 2, 3, 4, 1, 6, 7, 8, 9

May 19, 2017

Re: On "A New Collections Framework for the Standard Library"

Posted by Eugene Wissner
in reply to Jack Stouffer

Eugene Wissner

Posted in reply to Jack Stouffer

On Thursday, 18 May 2017 at 15:18:00 UTC, Jack Stouffer wrote:
> I just got around to watching Eduard Staniloiu's talk at DConf [1] about the collections library he was working on. One thing seemed odd, in that Eduard seems to be saying that the container and the range over the container's elements are one in the same and the range primitives are on the container itself.
>
> Is this accurate?
>
> If so, then this is completely different than the currently accepted design of ranges and containers. Currently, a container allows the removal and insertion of elements, and to get a range of the elements you have to call a function or slice the container. This is how std.containers and Brain's EMSI container library. In this new library, what happens when I iterate the elements of one of these containers using foreach? Do the range primitives empty the container? How do I ask for a new, full range of my data in the collection to use in a range algorithm after I've already iterated over the elements?
>
> As Andrei originally laid out in this talk introducing ranges [2], ranges only shrink and never grow. Put another way, Output Ranges and Input Ranges are two separate things. Andrei states durning that talk (at 1:20:20)
>
> "Can ranges ever grow? ... No, never, because it would be fundamentally unsafe."
>
> I'm not sure what Andrei meant by this, but he believes that growing ranges is also a bad idea.
>
> [1] https://www.youtube.com/watch?v=k88QSIC-Na0
> [2] https://archive.org/details/AndreiAlexandrescuKeynoteBoostcon2009

I didn't look the talk yet, but about a month ago I tried to implement a similar concept and wanted to apply it to all containers as well.
My idea was to make the containers/slices reference counted and copy on write (or to be more precise, on size changing, similar how D built-in slices behave).

1) All containers have a pointer to the allocated memory and its length.
2) All containers have a pointer to the beginning and the end of the range they refer to (so popFront, popBack can be implemented without changing the real size of the allocated memory)
3) If you do something like insertBack or insertFront and the reference counter is greater than 1, then the counter decreases and the container is copied.

But at the end I couldn't deal with the constness. Slice!(ubyte) and Slice!(const ubyte) are different types - that isn't nice. But for "const Slice!ubyte" you can't do popFront and popBack. It could be solved with some casting or other hacks but I don't know how it would work in a multithreaded envrionment. So I stopped the experiment for now.

May 18, 2017

Re: On "A New Collections Framework for the Standard Library"

Posted by Jonathan M Davis
in reply to Jack Stouffer

Jonathan M Davis

Posted in reply to Jack Stouffer

On Friday, May 19, 2017 1:22:50 AM PDT Jack Stouffer via Digitalmars-d wrote:
> First off, how are you going to do something like a map over a immutable container then, as map uses the range primitives and not foreach? There's no reason in principal that that should cause an issue. But with that design popFront is mutating the underlying data, and therefore should not be allowed. But this works with the std.container design because popFront is mutating the range view which has no reason to be immutable.

Andrei was talking at dconf about this and said that we'd be adding a new range primitive called tail, and we'd basically end up with car and cdr for ranges. This makes sense for immutable ranges, and it's simple to wrap existing input ranges to makes them work with such primitives, but I'm not sure that it plays at all nicely once you start considering higher order ranges - and it has the serious problem that it would require revamping all range-based code to support it. That _might_ make sense for Phobos, since it's the standard library, but I expect that its DOA pretty much everywhere else (especially when it's for immutable containers which can be very useful but are usually very niche), and even for Phobos, it's going to be a lot of churn. Personally, I'm not convinced that immutable containers are worth enough to merit such a change, but Andrei clearly thinks that they are. Interesingly, he doesn't seem to think that it's worth solving the tail-const issue with ranges, and without a solution for that, const and ranges basically so not work together at all. And _that_ is a problem that keeps coming up, whereas most folks don't seem to be clammering for immutable containers.

Now, if you're willing to add a layer of indirection, then it's trivial to make existing ranges over immutable containers work, because you can have a mutable range refer to an immutable object, but that is either going to be unsafe or require that the range be allocated on the heap - which is presumably why Andrei didn't go with that solution, but I don't know.

But overall, it sounds like once again Andrei is getting very experimental with what he's doing (or getting his mentee to do). So if it works out well, we could end up with something pretty interesting and innovative, but it also risks being a bit of a disaster, and since it's experimental, there's really no way to know how it's going to go (even less so when all we know about it is what was in the talk and whatever tidbits Andrei has discussed in the newsgroup or at dconf). So, I guess that we'll have to wait and see - particularly since we haven't even seen the actual API yet.

- Jonathan M Davis

November 13, 2017

Re: On "A New Collections Framework for the Standard Library"

Posted by Mark
in reply to Andrei Alexandrescu

Mark

Posted in reply to Andrei Alexandrescu

On Thursday, 18 May 2017 at 18:27:22 UTC, Andrei Alexandrescu wrote:
> On 05/18/2017 11:18 AM, Jack Stouffer wrote:
>> I just got around to watching Eduard Staniloiu's talk at DConf [1]
>> about the collections library he was working on. One thing seemed
>> odd, in that Eduard seems to be saying that the container and the
>> range over the container's elements are one in the same and the
>> range primitives are on the container itself.
>>
>> Is this accurate?
>
> That is correct. The fundamental equation is:
>
> Ranges + Optional Primitives = Containers
>
> Andrei

Is the code for the new collections available on github or somewhere else? Even if it's a work in progress, I'm curious to see what it looks like.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation