Jump to page: 1 2
Thread overview
Initializing an Immutable Field with Magic: The "Fake Placement New" Technique
Jul 26, 2019
FeepingCreature
Jul 26, 2019
ag0aep6g
Jul 26, 2019
FeepingCreature
Jul 26, 2019
ag0aep6g
Jul 26, 2019
FeepingCreature
Jul 26, 2019
ag0aep6g
Jul 26, 2019
FeepingCreature
Jul 26, 2019
ag0aep6g
Jul 26, 2019
FeepingCreature
Jul 26, 2019
ag0aep6g
Jul 26, 2019
FeepingCreature
Jul 26, 2019
ag0aep6g
Jul 26, 2019
FeepingCreature
Jul 26, 2019
FeepingCreature
July 26, 2019
How would you initialize an immutable field outside the constructor?

For instance, assume you're trying to implement a tagged union, and you want to switch it to a new type - but it so happens that the type you're trying to switch to is an immutable struct.

...
immutable struct S { int i; }
union
{
  ...
  S s;
}

For instance, you might, like me, decide that std.conv.emplace does what you want:

...
  emplace(&s, S(5));
...

You would then get a strange compiler error that the return type of a "side effect free" function cannot be silently thrown out; and if you changed the call to, as DMD recommends, `cast(void) emplace(&s, S(5));`, you would discover with some astonishment that the call is silently removed.

Emplace does not emplace!

What's happening here? From the perspective of the compiler, it makes perfect sense.

Emplace is defined as a pure function, meaning that it cannot have any effects other than on its parameters. However, its parameters are, in order, an immutable struct ("can't" change the caller - it's immutable) and a value parameter, S, which also can't change the caller. And we told DMD to throw away the return value.

So DMD ends up, rather reasonably, convinced that this emplace is a no-op. We would need to convince DMD of something like, "it looks like I'm giving it an S*, but it's actually a ~magical type~ with the same layout as S but not immutable". This is not possible.

Instead, we have to use the same magic hack that emplace also uses internally: fake placement new!

See, there is *exactly one* construct in the language that is allowed to assign a new value to an immutable field, and it's the constructor. So we have to make DMD believe that our variable is a field in a type we control (hack 1), and then explicitly call that type's constructor with our new value (hack 2).

struct Wrapper
{
  S s;

  this(S s)
  {
    this.s = s; // the one operation allowed to assign immutable fields
  }
}

`Wrapper` has the same layout as `S`, because it basically *is* `S`.

Then we call the constructor as if we were currently constructing Wrapper at a location that "just happens" to overlap with our field.

Wrapper* wrapper = cast(Wrapper*) &s;
wrapper.__ctor(S(5)); // fake placement new

What a mess. Works though.

Demo: https://run.dlang.io/is/kg7j3f
July 26, 2019
On 26.07.19 12:11, FeepingCreature wrote:
> How would you initialize an immutable field outside the constructor?

Not, I guess.

[...]
> What a mess. Works though.
> 
> Demo: https://run.dlang.io/is/kg7j3f

That looks like a complicated way of casting away immutable.
`cast(int) value.s.i = 5;` also "works", but has undefined behavior, of course. Surely, calling `__ctor` on an existing immutable instance also has undefined behavior.
July 26, 2019
On Friday, 26 July 2019 at 10:25:06 UTC, ag0aep6g wrote:
> That looks like a complicated way of casting away immutable.
> `cast(int) value.s.i = 5;` also "works", but has undefined behavior, of course. Surely, calling `__ctor` on an existing immutable instance also has undefined behavior.

Sure, in this example you can do that, but in a generic function you have no idea what's inside S.

July 26, 2019
On 26.07.19 12:40, FeepingCreature wrote:
> On Friday, 26 July 2019 at 10:25:06 UTC, ag0aep6g wrote:
>> That looks like a complicated way of casting away immutable.
>> `cast(int) value.s.i = 5;` also "works", but has undefined behavior, of course. Surely, calling `__ctor` on an existing immutable instance also has undefined behavior.
> 
> Sure, in this example you can do that, but in a generic function you have no idea what's inside S.

My point is that you can't do either. You can't mutate immutable data. Doesn't matter whether you try it with a `cast` or with `__ctor`. Both ways are not allowed.
July 26, 2019
On Friday, 26 July 2019 at 10:53:32 UTC, ag0aep6g wrote:
> My point is that you can't do either. You can't mutate immutable data. Doesn't matter whether you try it with a `cast` or with `__ctor`. Both ways are not allowed.

Sure you can. Look at the link, you're doing it :)

More specific, immutable is kind of awkward. You have to differentiate between immutable types and immutable memory. Those are *often* the same, but not always.

The thing you cannot do is mutate memory that was *allocated* immutable - ie. that came out of new T or T() where T was marked immutable, or had immutable fields. But that doesn't happen with immutable fields inside a union, because unions screen off all that stuff; they can't not, because immutable fields and mutable fields may freely overlap. So instead of forbidding mutable-immutable overlap in unions, the language basically just throws up its hands and goes "yeah, whatever."

So when you're switching a tagged union to an immutable member, you're not dealing with "immutable memory", you're, effectively, dealing with an uninitialized field. And you can always set an uninitialized field to a new value, whether it's immutable or not, because that's *how the constructor hack works in the first place*. If abusing a constructor like this was broken, the constructor would *itself* be broken.
July 26, 2019
On 26.07.19 13:14, FeepingCreature wrote:
> On Friday, 26 July 2019 at 10:53:32 UTC, ag0aep6g wrote:
>> My point is that you can't do either. You can't mutate immutable data. Doesn't matter whether you try it with a `cast` or with `__ctor`. Both ways are not allowed.
> 
> Sure you can. Look at the link, you're doing it :)

What you can do is write invalid programs that seem to behave as you want. But they're invalid. They might explode any time.

> More specific, immutable is kind of awkward. You have to differentiate between immutable types and immutable memory. Those are *often* the same, but not always.

As far as I understand, they're the same to the language. Consequently, they're the same to me. D doesn't have C++'s const where it matters how the data was declared originally.

> The thing you cannot do is mutate memory that was *allocated* immutable - ie. that came out of new T or T() where T was marked immutable, or had immutable fields. But that doesn't happen with immutable fields inside a union, because unions screen off all that stuff; they can't not, because immutable fields and mutable fields may freely overlap. So instead of forbidding mutable-immutable overlap in unions, the language basically just throws up its hands and goes "yeah, whatever."

Do we have it in the spec somewhere that unions defeat immutable? I'm skeptical if that can be sound.

As far as I know, we usually say that this function:

    void f(immutable int* p)
    {
        /* ... do something with *p ... */
        g();
        /* ... do more stuff with *p ... */
    }

can assume that `*p` is the same before and after calling `g`. But if unions have the power to defeat immutable, that assumption is invalid.

Or maybe we can only use that super power of unions if we take care that no other code can observe what we're doing? Can that be specified without undermining the assumption above?

> So when you're switching a tagged union to an immutable member, you're not dealing with "immutable memory", you're, effectively, dealing with an uninitialized field. And you can always set an uninitialized field to a new value, whether it's immutable or not, because that's *how the constructor hack works in the first place*. If abusing a constructor like this was broken, the constructor would *itself* be broken.

That might make sense, but it's at odds with the current spec and implementation. If an immutable union field is considered uninitialized until written to, then the language should forbid accessing it before that (in @safe code). We can't have `immutable` data change its observable value.
July 26, 2019
On Friday, 26 July 2019 at 12:12:19 UTC, ag0aep6g wrote:
> As far as I know, we usually say that this function:
>
>     void f(immutable int* p)
>     {
>         /* ... do something with *p ... */
>         g();
>         /* ... do more stuff with *p ... */
>     }
>
> can assume that `*p` is the same before and after calling `g`. But if unions have the power to defeat immutable, that assumption is invalid.

This is not correct, though it seems correct. This example hits the key of the problem though, so well spotted.

What if `g()` manually freed `p`, then allocated some new memory, and that new memory just so happened to exist at the same address? You would have observed a change in the value of `p`, even though it was marked immutable.

Now, this is invalid behavior, but it's not invalid behavior *of f*; the entire program is just written in a way that you were able to keep one pointer alive past the lifespan of the data it referenced.

Nullable and Algebraic, two types that run into such issues (Nullable uses the union hack internally!) let you control the lifespan of its members via `nullify` or assigning a different type, respectively. As such, if you take a reference to Nullable.get or an algebraic member, and then nullify or reassign it, you have broken your program. It is up to the user, not the compiler, to ensure that this does not happen.

July 26, 2019
On 26.07.19 14:36, FeepingCreature wrote:
> On Friday, 26 July 2019 at 12:12:19 UTC, ag0aep6g wrote:
>> As far as I know, we usually say that this function:
>>
>>     void f(immutable int* p)
>>     {
>>         /* ... do something with *p ... */
>>         g();
>>         /* ... do more stuff with *p ... */
>>     }
>>
>> can assume that `*p` is the same before and after calling `g`. But if unions have the power to defeat immutable, that assumption is invalid.
> 
> This is not correct, though it seems correct. This example hits the key of the problem though, so well spotted.
> 
> What if `g()` manually freed `p`, then allocated some new memory, and that new memory just so happened to exist at the same address? You would have observed a change in the value of `p`, even though it was marked immutable.
> 
> Now, this is invalid behavior, but it's not invalid behavior *of f*; the entire program is just written in a way that you were able to keep one pointer alive past the lifespan of the data it referenced.

It's invalid, yes. So we don't need to consider it. If the only way to break the assumption is to rely on undefined behavior, then there is no way to break the assumption.

`free`ing p and then dereferencing it has undefined behavior. It doesn't matter that the address happens to be reused by another allocation.

The interesting part is whether you're relying on undefined behavior with your union/__ctor stuff. If yes, then your code is just invalid. If no, then you can break the immutability assumption in a seemingly valid way. That would be interesting, but I'm not convinced that your code is valid.

The pain points:
1) The spec doesn't say clearly when union fields are considered initialized.
2) DMD allows @safe access of (uninitialized) immutable union fields.
3) __ctor can be called on an existing instance in @safe code. That's clearly a bug.

July 26, 2019
On Friday, 26 July 2019 at 14:19:11 UTC, ag0aep6g wrote:
> The interesting part is whether you're relying on undefined behavior with your union/__ctor stuff. If yes, then your code is just invalid. If no, then you can break the immutability assumption in a seemingly valid way. That would be interesting, but I'm not convinced that your code is valid.
>
> The pain points:
> 1) The spec doesn't say clearly when union fields are considered initialized.
> 2) DMD allows @safe access of (uninitialized) immutable union fields.
> 3) __ctor can be called on an existing instance in @safe code. That's clearly a bug.

I think you are just seriously overestimating the D spec.

Note that undefined behavior is a term of art arising from C/C++, referring to behavior explicitly called out as open to the compiler implementation. __ctor is not undefined behavior; I'd call it "unofficial behavior". The spec doesn't mention it.

It so happens that defining a constructor, which validly initializes an immutable field, also defines a magical added function __ctor, on which the spec says nothing, but which happens to have the same effect as the constructor. Such a function could not be written normally, but it appears anyways.

In any case, this is frontend business, and there is only one frontend and unlikely to ever be another, especially an incompatible one. So as unofficial business goes, it's probably pretty reliable. It might be changed, but if so, it'll probably be marked deprecated; even if not, some other technique will appear in its place. (emplaceRef still has to be implemented *somehow*.)


July 26, 2019
On Friday, 26 July 2019 at 14:19:11 UTC, ag0aep6g wrote:
> The pain points:
> 1) The spec doesn't say clearly when union fields are considered initialized.
> 2) DMD allows @safe access of (uninitialized) immutable union fields.
> 3) __ctor can be called on an existing instance in @safe code. That's clearly a bug.

I forgot to mention: none of this is @safe, of course. Manual lifetime management is almost inherently unsafe. Which is why Nullable is peppered with @trusted...

« First   ‹ Prev
1 2