August 31, 2021

On 8/31/21 1:21 PM, deadalnix wrote:

>

I want to start working on a container library. Interestingly enough, I don't think we have what we need on that front.

I'll describe here what are the design I intend to go for, and the rationales for it, but before, there is an unsolved problem that I don't know how to work around that is somewhat of a blocker. If one of you guys have an idea how to solve it, I'm taking. If I'm being honest, this is probably the most important problem that need to be solved for D right now. That problem is turtling down type qualifiers.

For instance, if we have a Vector!T, then we want it to convert to a Vector!(const T) and we want const(Vector!T) == const(Vector!(const T)), all of it implicitly. Same goes for other qualifiers.

This is the crux of the tail-const problem. The language allows tail-const for only 2 types, T[] and T*. Everything else is fully const or fully mutable.

>

To put it bluntly, I have no idea how to achieve this. And I don't think we'll have a good set of collection before this is sorted out.

I had an idea a long time ago, but ashamedly never brought it to fruition. But long story short, I think we need a way to specify "tail" for modifiers, and allow implicitly casting the head. Then you'd have tailconst(Vector!int), and you are good. I actually had a decent syntax for it, but I'm not sure it would fly anyway.

>

Now onto the design:
 - Value types. It is very easy to turn a value into a reference type, not so much the other way around. reference type also come with a bag of edges cases, like the collection being null vs empty, that cause confusion and problems in practice.

Null vs. empty isn't a problem for non-class reference types. Of course, without tail-const, you are kinda screwed.

>

 - COW. Value type collection can be very expensive to work with if the eagerly copy everything. On the other hand, COW ensures copies are cheap. There are also very common use case that are well served by COW collections, for instance, when one wants value types for collection that are infrequently changes - but you don't know or can't prove that they won't.

I don't agree that COW is a good idea for generic collections. IIRC, C++ tried this with std::string and it was a disaster.

>

 - Small collection features. As every engineer, but no mathematician will tell you, is that most numbers are small. It is also true of collection sizes. In order to speed things up for collection that are expected to be small, being able to specify Vector!(T, 8) to reserve 8 slots for elements, in the base collection, and only allocate on heap if the collection size grows past this. LLVM has a collection library doing so and it is the most useful thing in the world (see https://www.youtube.com/watch?v=vElZc6zSIXM for more details).

I think this is a great idea.

>

 - Not expose much of its internals. This like iterators being raw pointers (or the result of using the in operator) for instance, impose constraint on the internal that almost always end up costing a ton of perf as new techniques are discovered (recent exemples include open addressing with SSE probing for instance). All of this needs to be wrapped.

Feel free to copy any of my ideas from dcollections for this. See https://github.com/schveiguy/dcollections/blob/master/concepts.txt

Also note that Dcollections were strictly reference types, but had struct-based implementations (which could actually be swapped out with a different implementation if anyone had ever developed any). It came in handy in some cases, like the hash bucket collision linked list was just the implementation for the standard linked list.

>

I have a good idea how to solve all these problems. I have no idea how to solve the type qualifier one. I need help.

Feel free to ping, maybe on slack. We can discuss the tail-const problem.

-Steve

September 01, 2021

On Tuesday, 31 August 2021 at 23:56:51 UTC, Steven Schveighoffer wrote:

> >

To put it bluntly, I have no idea how to achieve this. And I don't think we'll have a good set of collection before this is sorted out.

I had an idea a long time ago, but ashamedly never brought it to fruition. But long story short, I think we need a way to specify "tail" for modifiers, and allow implicitly casting the head. Then you'd have tailconst(Vector!int), and you are good. I actually had a decent syntax for it, but I'm not sure it would fly anyway.

One problem I can forsee with this approach is that, from the compiler's point of view, there is not necessarily any predictable relationship between qualifier(Template!T) and Template!(qualifier(T)). For all it knows, you could have written code like the following:

struct Vector(T)
{
    static if (is(T == const)) {
        // do one thing
    } else {
        // do something completely different
    }
}

Maybe in some restricted cases it could examine the uninstantiated template, determine that there's no funny business going on, and conclude that the implicit conversion is valid--but this kind of heuristic-based approach tends to be very brittle and compose poorly (exhibit A: issue 1807).

In general, if we want the compiler to know that a conversion from one template instantiation to another is valid, we are going to have to prove it by construction--which is to say, by writing a function (like a constructor or an opCast overload) that actually performs the conversion.

September 01, 2021

On Tuesday, 31 August 2021 at 23:56:51 UTC, Steven Schveighoffer wrote:

>

I don't agree that COW is a good idea for generic collections. IIRC, C++ tried this with std::string and it was a disaster.

You seem to be confused. The C++11 standard prohibits std::string from using COW. Existing C++ implementations were using COW, and getting out of it was what was a disaster (notably, there were breaking ABI changes, which the C++ ecosystem is not well suited to handle).

September 01, 2021

On Wednesday, 1 September 2021 at 00:26:33 UTC, Paul Backus wrote:

>

[snip]

One problem I can forsee with this approach is that, from the compiler's point of view, there is not necessarily any predictable relationship between qualifier(Template!T) and Template!(qualifier(T)). For all it knows, you could have written code like the following:

struct Vector(T)
{
    static if (is(T == const)) {
        // do one thing
    } else {
        // do something completely different
    }
}

Maybe in some restricted cases it could examine the uninstantiated template, determine that there's no funny business going on, and conclude that the implicit conversion is valid--but this kind of heuristic-based approach tends to be very brittle and compose poorly (exhibit A: issue 1807).

In general, if we want the compiler to know that a conversion from one template instantiation to another is valid, we are going to have to prove it by construction--which is to say, by writing a function (like a constructor or an opCast overload) that actually performs the conversion.

I think you make some good points here.

My recollection is that opHeadMutable 1 was intended as a way to inform the compiler that the conversion is valid.

On issue 1807, as much as I want that enhancement, I just hope that there is a general solution to resolve it and not just a heuristic-based one.

1 https://forum.dlang.org/post/cpxfgdmklgusodqouqdr@forum.dlang.org

September 01, 2021

On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:

>

I want to start working on a container library. Interestingly enough, I don't think we have what we need on that front.

I'll describe here what are the design I intend to go for, and the rationales for it, but before, there is an unsolved problem that I don't know how to work around that is somewhat of a blocker. If one of you guys have an idea how to solve it, I'm taking. If I'm being honest, this is probably the most important problem that need to be solved for D right now. That problem is turtling down type qualifiers.

For instance, if we have a Vector!T, then we want it to convert to a Vector!(const T) and we want const(Vector!T) == const(Vector!(const T)), all of it implicitly. Same goes for other qualifiers.

To put it bluntly, I have no idea how to achieve this. And I don't think we'll have a good set of collection before this is sorted out.

Now onto the design:

  • Value types. It is very easy to turn a value into a reference type, not so much the other way around. reference type also come with a bag of edges cases, like the collection being null vs empty, that cause confusion and problems in practice.
  • COW. Value type collection can be very expensive to work with if the eagerly copy everything. On the other hand, COW ensures copies are cheap. There are also very common use case that are well served by COW collections, for instance, when one wants value types for collection that are infrequently changes - but you don't know or can't prove that they won't.
  • Small collection features. As every engineer, but no mathematician will tell you, is that most numbers are small. It is also true of collection sizes. In order to speed things up for collection that are expected to be small, being able to specify Vector!(T, 8) to reserve 8 slots for elements, in the base collection, and only allocate on heap if the collection size grows past this. LLVM has a collection library doing so and it is the most useful thing in the world (see https://www.youtube.com/watch?v=vElZc6zSIXM for more details).
  • Not expose much of its internals. This like iterators being raw pointers (or the result of using the in operator) for instance, impose constraint on the internal that almost always end up costing a ton of perf as new techniques are discovered (recent exemples include open addressing with SSE probing for instance). All of this needs to be wrapped.

I have a good idea how to solve all these problems. I have no idea how to solve the type qualifier one. I need help.

It's good that you are doing this Amaury, thanks.

We need better containers to compete with other languages. On top of that, there are patterns that other languages cannot do. One thing that comes to mind is how D is one of not many languages (languages with macros or similar probably can), that can easily allow one to write containers which lie about their layout. The most profitable example of that I can think of is automatically switching arrays of POD types to be a struct of arrays internally. This transformation makes random access to elements slower, however (say) loops where access to a subset of the elements dominates then this is much faster due to improved cache efficiency.

September 01, 2021

On Wednesday, 1 September 2021 at 01:38:49 UTC, jmh530 wrote:

>

My recollection is that opHeadMutable [1] was intended as a way to inform the compiler that the conversion is valid.

[1] https://forum.dlang.org/post/cpxfgdmklgusodqouqdr@forum.dlang.org

It looks like opHeadMutable is basically a restricted form of "implicit opCast"--that is, rather than allowing the programmer to define arbitrary implicit conversions, it lets them define implicit conversions of the form qual(Template!(Args)) -> Template!(qual(Args)).

IMO it is a compromise that both sides would find unsatisfying. Those who want convenient implicit conversions will not be happy with it because there are important cases it does not cover (e.g. qual(Template!(qual(T))) -> qual(Template!T)), but it is still flexible enough that anyone opposed to user-defined implicit conversions in general will likely find it hard to stomach.

September 01, 2021

On Tuesday, 31 August 2021 at 23:24:57 UTC, deadalnix wrote:

>

There is a ton of good work in there!

Yes, and I would really like it to come of use upstream. :)

>

Do you agree with the direction explicated?

Yes. No objections afaict. All good points.

  • CoW: sounds like the future direction in other languages such as Swift so no objections on that matter:

    • Should we enforce CoW or make it opt-in via storages fed as template parameters?
    • Will you be supporting
  • Will you be using ranges or iterators?

  • What about allocators? If you do can you please use std.experimental.allocator instead of stdx.allocator (used by emsi-containers and libdparse and dsymbol) is more than 2 years old now. std.experimental.allocator has received much love since, for instance allocateZeroed for the allocators that support it.

  • An interesting emsi container that was new to me is UnrolledList. It's used extensively in dsymbol.

>

I expected to focus on Vector/Set/Map initially. This should serve most use case in practice.

You should probably use robin-hood hashing if you pick open-addressing. I'm not using that (yet) in my hash sets/maps. It has very good asymptotic complexity. You should probably have a look at the state-of-the-art robin-hood implementations in C++ aswell. To run my own C++-benchmark for containers you can do

./test

under

benchmarks/containers-cxx

.

September 01, 2021

On Wednesday, 1 September 2021 at 02:08:40 UTC, max haughton wrote:

>

layout. The most profitable example of that I can think of is automatically switching arrays of POD types to be a struct of arrays internally. This transformation makes random access to elements slower, however (say) loops where access to a subset of the elements dominates then this is much faster due to improved cache efficiency.

Speaking of which, here's my D implementation of struct of arrays (SoA):

https://github.com/nordlow/phobos-next/blob/master/src/nxt/soa.d

Peter Kirov has an alternative implementation providing some benefits. I've copied it into

https://github.com/nordlow/phobos-next/blob/master/src/nxt/soa_petar_kirov.d

The two implementations should be united.

Note that Petar's SoA-implementation has its capacity-property functionally deduced from the length property as

/** Returns the current capacity - the current length rounded up to a power
    of two. This assumption allows us to save on having a separate field.
*/
size_t capacity() const pure @safe
{
    return _length.roundUpToPowerOf2;
}

thereby saving a word of (stack) space. That behaviour could be opt-in aswell via a template parameter.

September 01, 2021

On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:

>

I want to start working on a container library. Interestingly enough, I don't think we have what we need on that front.

You give Razvan and Andrei a ping aswell. They have done deep work on the topic of specifically trying to make RC-backed containers/collections @safe pure nothrow @nogc. One of Andrei's conclusion was that a run-time flag is needed to reveal whether a const parameter is fed an immutable argument or not.

September 01, 2021

On Wednesday, 1 September 2021 at 10:12:23 UTC, Per Nordlöw wrote:

>
  • CoW: sounds like the future direction in other languages such as Swift so no objections on that matter:
    • Should we enforce CoW or make it opt-in via storages fed as template parameters?

I think COW should be the default. It is not difficult to build reference container from COW ones, so we probably don't need it to be optional, simply provide wrappr around the COW container to provide references ones.

>
  • Will you be supporting

I'm not sure what you mean here. If it is bugfixes and all, I don't have much problem with that. Container are fairly straightforward, so I don't expect the workload to be very high anyways.

>
  • Will you be using ranges or iterators?

This is D, we must provide ranges. But in operator and similar kind forces us to do both.

>
  • What about allocators? If you do can you please use std.experimental.allocator instead of stdx.allocator (used by emsi-containers and libdparse and dsymbol) is more than 2 years old now. std.experimental.allocator has received much love since, for instance allocateZeroed for the allocators that support it.

I have to admit I have not put too much thought into this.

>
  • An interesting emsi container that was new to me is UnrolledList. It's used extensively in dsymbol.

I have no idea what that is.

> >

I expected to focus on Vector/Set/Map initially. This should serve most use case in practice.

You should probably use robin-hood hashing if you pick open-addressing.

Yes, that being said, I would like to not expose the implementation. PHP and python were able to move from a tree based approach to open addressing for dictionaries, for great benefit. They could do so because they didn't expose the innards of the container.

There are very interesting approach leveraging vector capabilities of modern CPU such as F14: https://engineering.fb.com/2019/04/25/developer-tools/f14/ and exposing the innards would prevent moving to them later on.