Jump to page: 1 2 3
Thread overview
When D feels unfinished: union initialization and NRVO
Mar 18, 2020
Mathias Lang
Mar 18, 2020
Denis Feklushkin
Mar 18, 2020
Denis Feklushkin
Mar 18, 2020
FeepingCreature
Mar 18, 2020
Denis Feklushkin
Mar 18, 2020
FeepingCreature
Mar 18, 2020
Paolo Invernizzi
Mar 18, 2020
Jacob Carlborg
Mar 19, 2020
Mathias Lang
Mar 19, 2020
kinke
Mar 19, 2020
Mathias Lang
Mar 19, 2020
kinke
Mar 20, 2020
Mathias Lang
Mar 20, 2020
kinke
Jun 17, 2021
Iain Buclaw
Mar 19, 2020
Kagamin
Mar 19, 2020
Kagamin
Mar 19, 2020
H. S. Teoh
Mar 19, 2020
Kagamin
Mar 19, 2020
H. S. Teoh
Mar 20, 2020
drug
Mar 20, 2020
Mathias Lang
Mar 19, 2020
Jacob Carlborg
Mar 20, 2020
Mathias Lang
Mar 20, 2020
Jacob Carlborg
March 18, 2020
So I've been toying around for a bit with writing a deserializer in D. It essentially converts types to an array of `ubyte[]` in a very simple way. Handles value types automatically, and pointers / arrays. Nothing too fancy, but I wanted to write something *correct*, and compatible with the types I'm dealing with.

The issue I was faced with is how to handle qualifiers. For example, deserializing `const` or `immutable` data. Originally, my deserializer accepted a `ref T` as parameter and would deserialize into it. I changed it to return an element of type `T`.
deserialization should be composable, so if an aggregate defines the `fromBinary` static method, it is used instead of whatever the default for this type would otherwise be, and that method can forward to other `deserialize` call to deserialize its member.

Now here's the catch: Some of the things being deserialized are C++ types, and may include `std::vector`, so I wanted to avoid any unnecessary copy.

This set of requirement led me to a few simple observations:
- I cannot use a temporary and `cast`. Aside from the fact that most casts are an admission that the type system is insufficient, it would force me to pass the type by `ref` when composing, which would expose the `cast` to user code, hence not `@safe`;
- In order to avoid unnecessary copies, while returning value, I need to rely heavily on NRVO (the `cast` approach would also conflict with this);
- Hence, I need to be able to return literals of everything.

Approaching this for simple value type (int, float, etc...) is trivial. When it comes to aggregate, things get a bit more complicated. An aggregate can be made of other arbitrarily complex aggregates. The solution I have so far is to require a default-like constructor and have:
```
T deserialize (T) (scope DeserializeDg dg, scope const ref Options opts)
{
    // Loads of code
    else static if (is(T == struct))
    {
        Target convert (Target) ()
        {
            // `dg` is a delegate returning a `ubyte[]`, and `opts` are options to drive deserialization
            return deserialize!Target(dg, opts);
        }
        return T(staticMap!(convert, Fields!T));
    }
}
```

As any D user should be, I was slightly skeptical, so I made a reduced test case: https://gist.github.com/Geod24/61ef0d8c57c3916cd3dd7611eac8234e
It works as expected, which makes sense as we want to be consistent with the C++ standard that require NRVO on return where the operand is an rvalue.

However, not all structs are created equals, and some are not under my control (remember, C++ bindings). And yes this is where the rant begins.

How do you initialize the following ?
```
struct Statement
{
    StatementType type;     // This is an enum to discriminate which field is active
    union {                               // Oh no
        _prepare_t prepare_;  // Each of those are complex structs with std::array, std::vector, etc...
        _confirm_t confirm_;
    }
}
```

Of course my deserializer can't know about our custom tagged union, but luckily we have a hook, so (pseudo code again):
```
struct Statement
{
    /* Above definitions */
    static QT deserializeHook (QT) (scope DeserializeDg dg, scope const ref Options opts)
    {
        // deserialize `type`
        // then use a `final switch` and `return QT(type, deserialize!ActiveType(...))`
    }
}
```

Side note: `QT` is required here, because there's no way to know if `deserializeHook` was called via an `immutable(T)`, `const(shared(T))`, or just `T`.

The problem you face when you write this code is calling the `QT` constructor. Because the `union` is anonymous, `Statement.tupleof.length` is 3, not 2 as one would expect. And while calling `QT(type, _prepare_t.init)` works, calling `QT(type, _confirm_t.init)` will complain about mismatched type, because we are trying to initialize the second member, a `_prepare_t`, with a `_confirm_t`. And using `QT(type, _prepare_t.init, _confirm_t.init)` won't work either, because then the compiler complains about overlapping initialization!

There's a small feature that would be amazing here: struct literals! Unfortunately, they can *only* be used in variable declaration, nowhere else.
But is it really a problem ? Can't we just do the following:
```
QT ret = { type: type, _confirm_t: deserialize!_confirm_t(dg, opts) };
return ret;
```
Well no, because then, NRVO is not performed anymore.

I've been toying around with this problem for a few weeks, on and off. I really couldn't find a way to make it work. Using a named union just moves the problem to the union literal (which is a struct literal, under the hood). Guaranteeing NRVO could have negative impact on C/C++ interop, so the only thing that could help is to extend struct literals. Changing struct constructor to account for `union` is not possible either, because an `union` can have multiple fields of the same type.

Note that this is just the tip of the iceberg. Has anyone ever tried to make an array literal of a non-copyable structure in one go ? Thanks to tuple, one can use the `staticMap` approach if the length is known at compile time (thanks to tuples), but what happens if it's only known at runtime ? `iota + map + array` does not work with `@disable this(this)`. And let's not even mention AA literals.

We've had quite a few new feature making their way in the language over the past few years, but many of the old features are left unfinished. We have a new contract syntax, but contract are still quite broken (quite a few bugs, as well as usability issues, e.g. one can't call the parent's contract). We are interfacing more and more with C++, but don't have the ability to control copies, and the compiler and Phobos alike assume things are copiable (you can't foreach over a range which has `@disable this(this)`). We want to make the language `@safe` by default, but we lack the language constructs to build libraries that works with both `@system` and `@safe`. Our default setup for `assert` is still not on par with what C does with a macro, and `-checkaction=context` is far from being ready (mostly due to the issues mentioned previous). We are piling up `-transition` switches 10 times faster than we are removing them.

This could go on for a while, but the point I wanted to make is: can we focus on the last 20%, please?
March 18, 2020
On Wednesday, 18 March 2020 at 06:55:24 UTC, Mathias Lang wrote:
> So I've been toying around for a bit with writing a deserializer in D. It essentially converts types to an array of `ubyte[]` in a very simple way. Handles value types automatically, and pointers / arrays. Nothing too fancy, but I wanted to write something *correct*, and compatible with the types I'm dealing with.
>
> The issue I was faced with is how to handle qualifiers. For example, deserializing `const` or `immutable` data. Originally, my deserializer accepted a `ref T` as parameter and would deserialize into it. I changed it to return an element of type `T`.

IMHO serialization of language level types is the same kind of deceiving goal as ORM, "all is object", etc.

If you move to the higher level - serialize objects (in terms of your software, not just OOP objects) that you are modeling - this problem will gone.

March 18, 2020
On Wednesday, 18 March 2020 at 06:55:24 UTC, Mathias Lang wrote:
> So I've been toying around for a bit with writing a deserializer in D. It essentially converts types to an array of `ubyte[]` in a very simple way. Handles value types automatically, and pointers / arrays. Nothing too fancy, but I wanted to write something *correct*, and compatible with the types I'm dealing with.
>
> The issue I was faced with is how to handle qualifiers. For example, deserializing `const` or `immutable` data. Originally, my deserializer accepted a `ref T` as parameter and would deserialize into it. I changed it to return an element of type `T`.

IMHO serialization of language level types/objects/another_abstractions is the same kind of deceiving goal as ORM, "all is object", etc.

If you move to the higher level - serialize objects (in terms of your software, not just OOP objects) that you are modeling - this problem will gone.

March 18, 2020
On Wednesday, 18 March 2020 at 07:17:05 UTC, Denis Feklushkin wrote:
> On Wednesday, 18 March 2020 at 06:55:24 UTC, Mathias Lang wrote:
>> So I've been toying around for a bit with writing a deserializer in D. It essentially converts types to an array of `ubyte[]` in a very simple way. Handles value types automatically, and pointers / arrays. Nothing too fancy, but I wanted to write something *correct*, and compatible with the types I'm dealing with.
>>
>> The issue I was faced with is how to handle qualifiers. For example, deserializing `const` or `immutable` data. Originally, my deserializer accepted a `ref T` as parameter and would deserialize into it. I changed it to return an element of type `T`.
>
> IMHO serialization of language level types/objects/another_abstractions is the same kind of deceiving goal as ORM, "all is object", etc.
>
> If you move to the higher level - serialize objects (in terms of your software, not just OOP objects) that you are modeling - this problem will gone.

Strongly disagree. Serialize objects, sure, events and entities and aggregates, all that good shit, but at the end of the day you'll still need a ground-level way to serialize domain values, ie. arrays, structs, strings, ints, floats, dates, Options, Nullables... basically anything that can be easily represented in JSON, stuff that can't be high-level serialized because it is *itself* the low level primitives that your high level semantics are built on. 90% of the loc effort of serialization is in those glue types, and they're very feasible to handle automatically in D. (Thank goodness.)

And yes, immutable is just an unending headache with that, though something like boilerplate's autogenerated builder types helps a lot ime, because you don't need to mixin constructor calls but can just assign to fields (immutable or not) and do the construction in one go at the end.

March 18, 2020
On Wednesday, 18 March 2020 at 06:55:24 UTC, Mathias Lang wrote:

> We've had quite a few new feature making their way in the language over the past few years, but many of the old features are left unfinished. We have a new contract syntax, but contract are still quite broken (quite a few bugs, as well as usability issues, e.g. one can't call the parent's contract). We are interfacing more and more with C++, but don't have the ability to control copies, and the compiler and Phobos alike assume things are copiable (you can't foreach over a range which has `@disable this(this)`). We want to make the language `@safe` by default, but we lack the language constructs to build libraries that works with both `@system` and `@safe`. Our default setup for `assert` is still not on par with what C does with a macro, and `-checkaction=context` is far from being ready (mostly due to the issues mentioned previous). We are piling up `-transition` switches 10 times faster than we are removing them.
>
> This could go on for a while, but the point I wanted to make is: can we focus on the last 20%, please?

As usual, my +1 for that: to be honest, years of +1 on that ...
March 18, 2020
On Wednesday, 18 March 2020 at 09:12:31 UTC, FeepingCreature wrote:
> On Wednesday, 18 March 2020 at 07:17:05 UTC, Denis Feklushkin wrote:
>> On Wednesday, 18 March 2020 at 06:55:24 UTC, Mathias Lang wrote:
>>> So I've been toying around for a bit with writing a deserializer in D. It essentially converts types to an array of `ubyte[]` in a very simple way. Handles value types automatically, and pointers / arrays. Nothing too fancy, but I wanted to write something *correct*, and compatible with the types I'm dealing with.
>>>
>>> The issue I was faced with is how to handle qualifiers. For example, deserializing `const` or `immutable` data. Originally, my deserializer accepted a `ref T` as parameter and would deserialize into it. I changed it to return an element of type `T`.
>>
>> IMHO serialization of language level types/objects/another_abstractions is the same kind of deceiving goal as ORM, "all is object", etc.
>>
>> If you move to the higher level - serialize objects (in terms of your software, not just OOP objects) that you are modeling - this problem will gone.
>
> Strongly disagree. Serialize objects, sure, events and entities and aggregates, all that good shit, but at the end of the day you'll still need a ground-level way to serialize domain values, ie. arrays, structs, strings, ints, floats, dates, Options, Nullables...

I prefer do not mix types and arrays of typed values here, and other like structs or objects in this list, which include "raw" typed values.

Some D language constructs (aggregates) just cannot be serialized automatically by design and you need to manually reinvent some constructors or special functions for this purpose.

This situation is no different from that you will still need to explain to the serializer how to correctly serialize a some complex object so that no duplication or data loss occurs.

March 18, 2020
On Wednesday, 18 March 2020 at 09:52:35 UTC, Denis Feklushkin wrote:
> On Wednesday, 18 March 2020 at 09:12:31 UTC, FeepingCreature wrote:
>> Strongly disagree. Serialize objects, sure, events and entities and aggregates, all that good shit, but at the end of the day you'll still need a ground-level way to serialize domain values, ie. arrays, structs, strings, ints, floats, dates, Options, Nullables...
>
> I prefer do not mix types and arrays of typed values here, and other like structs or objects in this list, which include "raw" typed values.
>
> Some D language constructs (aggregates) just cannot be serialized automatically by design and you need to manually reinvent some constructors or special functions for this purpose.
>
> This situation is no different from that you will still need to explain to the serializer how to correctly serialize a some complex object so that no duplication or data loss occurs.

Sure, but the limit is "anything with 'uses'-relations" - anything that contains a non-exclusive reference is hard to serialize. My point is that excluding these objects still leaves 80% of the typesystem, including structs, arrays, hashmaps, sets... The reference to "things that can be easily encoded as json" was not accidental - the common trait here is exactly that JSON doesn't support pointer types, so you can't get a reference cycle or in fact nonlocal reference at all. So pointers are out, objects are largely out; structs, hashmaps and arrays (of similarly simple types) are very much in, because they are types that generally own their subtypes. Limiting to those types lets you do easy, performant one-pass serialization.

If you're looking at a type that creates another class in its constructor, calls a method on another value, or has fields that should be excluded from serialization, then that type is probably too complex to be automatically serialized. On the other hand, stuff like structs that are just a bundle of public/read-only fields with little internal logic are both easy and very common.
March 18, 2020
On 2020-03-18 07:55, Mathias Lang wrote:

> This set of requirement led me to a few simple observations:
> - I cannot use a temporary and `cast`. Aside from the fact that most casts are an admission that the type system is insufficient, it would force me to pass the type by `ref` when composing, which would expose the `cast` to user code, hence not `@safe`;

How would the `cast` be exposed to the user code?

> But is it really a problem ? Can't we just do the following:
> ```
> QT ret = { type: type, _confirm_t: deserialize!_confirm_t(dg, opts) };
> return ret;
> ```
> Well no, because then, NRVO is not performed anymore.

I haven't looked at the generated code, but this compiles at least:

struct QT
{
    int type;
    int _confirm_t;

    @disable this(this);
    @disable ref QT opAssign () (auto ref QT other);
}

QT foo()
{
    QT ret = { type: 1, _confirm_t: 2 };
    ret.type = 4;
    return ret;
}

void main()
{
    auto qt = foo();
}

As long as you return the same variable in all branches it compiles at least. If you start to return a literal in one branch and a variable in a different branch it will fail to compile.

-- 
/Jacob Carlborg
March 19, 2020
On Wednesday, 18 March 2020 at 19:09:12 UTC, Jacob Carlborg wrote:
> On 2020-03-18 07:55, Mathias Lang wrote:
>
>> This set of requirement led me to a few simple observations:
>> - I cannot use a temporary and `cast`. Aside from the fact that most casts are an admission that the type system is insufficient, it would force me to pass the type by `ref` when composing, which would expose the `cast` to user code, hence not `@safe`;
>
> How would the `cast` be exposed to the user code?

Since there is a way to hook into the deserialization, if I create a temporary variable which contains an elaborate type (e.g. which defines `opAssign` / postblit, etc..), it would get called at least once, while users usually expect construction / deserialization to be "in one go".
In order to avoid it being called, I did explore making `deserialize` a method of the aggregate / take a `ref` to the place it should write into, but then we run into other problems. If it's a method contracts / invariants are called, and if it takes a `ref`, you don't know what the hook will do with the already-deserialized data that you just aliased to mutable.

>> But is it really a problem ? Can't we just do the following:
>> ```
>> QT ret = { type: type, _confirm_t: deserialize!_confirm_t(dg, opts) };
>> return ret;
>> ```
>> Well no, because then, NRVO is not performed anymore.
>
> I haven't looked at the generated code, but this compiles at least:
>
> struct QT
> {
>     int type;
>     int _confirm_t;
>
>     @disable this(this);
>     @disable ref QT opAssign () (auto ref QT other);
> }
>
> QT foo()
> {
>     QT ret = { type: 1, _confirm_t: 2 };
>     ret.type = 4;
>     return ret;
> }
>
> void main()
> {
>     auto qt = foo();
> }
>
> As long as you return the same variable in all branches it compiles at least. If you start to return a literal in one branch and a variable in a different branch it will fail to compile.

Ah, thanks! So this is consistent with what C++ does as well. DMD is just being a bit conservative here, but as often, the solution is to turn a runtime parameter into a compile time one and to add another level of indirection.

This is how I solved the problem: https://gist.github.com/Geod24/61ef0d8c57c3916cd3dd7611eac8234e#file-nrvo_switch-d

If you turn the `version(none)` into `version(all)` you'll see:
```
nrvo.d(54): Error: struct nrvo.Foo is not copyable because it is annotated with @disable
nrvo.d(57): Error: struct nrvo.Foo is not copyable because it is annotated with @disable
nrvo.d(60): Error: struct nrvo.Foo is not copyable because it is annotated with @disable
nrvo.d(63): Error: struct nrvo.Foo is not copyable because it is annotated with @disable
```

I guess I could raise an issue for this (it's a frontend issue so LDC and GDC also suffer from it).
March 19, 2020
On Thursday, 19 March 2020 at 10:17:20 UTC, Mathias Lang wrote:
> If you turn the `version(none)` into `version(all)` you'll see:
> ```
> nrvo.d(54): Error: struct nrvo.Foo is not copyable because it is annotated with @disable
> nrvo.d(57): Error: struct nrvo.Foo is not copyable because it is annotated with @disable
> nrvo.d(60): Error: struct nrvo.Foo is not copyable because it is annotated with @disable
> nrvo.d(63): Error: struct nrvo.Foo is not copyable because it is annotated with @disable
> ```

Another simple workaround:

import core.lifetime : move;

...
case Good.Baguette:
    Foo f = { type_: type, f1: typeof(Foo.f1)("Hello World") };
    return move(f);
...

« First   ‹ Prev
1 2 3