=void in struct definition (page 2)

On Wednesday, April 11, 2018 10:45:40 Shachar Shemesh via Digitalmars-d wrote:
> On 09/04/18 14:22, Jonathan M Davis wrote:
> > On Monday, April 09, 2018 14:06:50 Shachar Shemesh via Digitalmars-d
wrote:
> >> struct S {
> >>
> >>     int a;
> >>     int[5000] arr = void;
> >>
> >> }
> >>
> >> void func() {
> >>
> >>     S s;
> >>
> >> }
> >>
> >> During the s initialization, the entire "S" area is initialized, including the member arr which we asked to be = void.
> >>
> >> Is this a bug?
> >
> > It looks like Andrei created an issue about it as an enhancement request several years ago:
> >
> > https://issues.dlang.org/show_bug.cgi?id=11331
> >
> > - Jonathan M Davis
>
> Except that issue talks about default constructed objects. My problem happens also with objects constructed with a constructor:
>
>
> extern(C) void func(ref S s);
>
> struct S {
>      uint a;
>      int[5000] arr = void;
>
>      this(uint val) {
>          a = val;
>      }
> }
>
> void main() {
>      auto s = S(12);
>
>      // To prevent the optimizer from optimizing s away
>      func(s);
> }
>
> $ ldc2 -c -O3 -g test.d
> $ objdump -S -r test.o | ddemangle > test.s
>
> 0000000000000000 <_Dmain>:
>      }
> }
>
> void main() {
>     0:    48 81 ec 28 4e 00 00    sub    $0x4e28,%rsp
>     7:    48 8d 7c 24 04          lea    0x4(%rsp),%rdi
>      auto s = S(12);
>     c:    31 f6                   xor    %esi,%esi
>     e:    ba 20 4e 00 00          mov    $0x4e20,%edx
>    13:    e8 00 00 00 00          callq  18 <_Dmain+0x18>
>           14: R_X86_64_PLT32  memset-0x4
>          a = val;
>    18:    c7 04 24 0c 00 00 00    movl   $0xc,(%rsp)
>    1f:    48 89 e7                mov    %rsp,%rdi
>
>      // To prevent the optimizer from optimizing s away
>      func(s);
>    22:    e8 00 00 00 00          callq  27 <_Dmain+0x27>
>           23: R_X86_64_PLT32  func-0x4
> }
>    27:    31 c0                   xor    %eax,%eax
>    29:    48 81 c4 28 4e 00 00    add    $0x4e28,%rsp
>    30:    c3                      retq
>
>
> Notice the call to memset.
>
> Shachar

All objects are initialized with their init values prior to the constructor being called. So, whether an object is simply default-initialized or whether the constructor is called, you're going to get the same behavior except for the fact that the constructor would normally do further initialization beyond the init value. As such, if there's a problem with the default-initialized value, you're almost certainly going to get the same problem when you call a constructor.

- Jonathan M Davis

April 11, 2018

Re: =void in struct definition

Posted by Jonathan M Davis
in reply to Shachar Shemesh

Permalink

Jonathan M Davis

Posted in reply to Shachar Shemesh

Permalink

On Wednesday, April 11, 2018 11:31:16 Shachar Shemesh via Digitalmars-d wrote:
> On 11/04/18 10:58, Jonathan M Davis wrote:
> > All objects are initialized with their init values prior to the
> > constructor being called. So, whether an object is simply
> > default-initialized or whether the constructor is called, you're going
> > to get the same behavior except for the fact that the constructor would
> > normally do further initialization beyond the init value. As such, if
> > there's a problem with the
> > default-initialized value, you're almost certainly going to get the same
> > problem when you call a constructor.
> >
> > - Jonathan M Davis
>
> That's horrible!
>
> That means that constructor initialized objects, regardless of size, get initialized twice.

Well, only the stuff you initialize in the constructor gets initialized twice, but yeah, it could result in effectively initializing everything twice if you initialize everything in the constructor. It's one of those design choices that's geared towards correctness, since it avoids ever dealing with the type having garbage, and the fact that you can do stuff like

struct S
{
    int _i;

    this(int i)
    {
        foo();
        _i = 42;
    }

    void foo()
    {
        writeln(_i);
    }
}

means that if it doesn't initialize it with the init value first, then you get undefined behavior, because _i would then be garbage when it's read (which isn't necessarily a big deal with an int but could really matter if it were something like a pointer). It also factors into how classes are guaranteed to be fully initialized to the correct type _before_ any constructors are run (avoiding the problems that you get in C++ when calling virtual functions in constructors or destructors). Unfortunately, because you're allowed to call arbitrary functions before initializing members, it's also possible to violate the type system with regards to const or immutable. e.g.

struct S
{
    immutable int _i;

    this(int i)
    {
        foo();
        _i = 42;
    }

    void foo()
    {
        writeln(_i);
    }
}

reads _i before it's fully initialized, so its state isn't identical every time it's accessed like it's supposed to be. However, because the object is default-initialized first, you never end up reading garbage, and the behavior is completely deterministic even if it arguably violates the type system. What the correct solution to that particular problem is, I don't know (probably at least disallowing calling any member functions prior to initializing any immutable or const members), but the fact that the object is default-initialized first reduces the severity of the problem.

And while you can end up with portions of an object effectively being initialized twice, for your average struct, I doubt that it matters much. It's when you start doing stuff like having large static arrays that it really becomes a problem. It also wouldn't surprise me if ldc optimized out some of the double-initializations at least some of the time, but I very much doubt that dmd's optimizer is ever that smart. Depending on the implementation of the constructor though, I would think that it would be possible for the compiler to determine that it doesn't actually need to default-initialize the struct first (or that it can just default-initialize pieces of it), because it can guarantee that a member variable isn't read before it's initialized by the constructor. So, at least in theory, the front end should be able to do some optimizations there. However, I have no idea if it ever does.

I think that in theory, the idea is that we want initializion to be as correct as possible, so there should be no garbage or undefined behavior involved, and in the case of classes, the object should be fully the type that it's supposed to be when its constructor is called so that you don't get bad behavior from virtual functions, but we then have = void so that specific variables can avoid that extra initialization cost when profiling or whatnot show that it's important. So, if you have something like

struct S
{
    int _a;
    int[5000] _b;

    this(int a)
    {
        _a = a;
    }
}

then it's going to behave well as far as correctness goes, and then if the initialization is too expensive, you do

S s = void;
s._a = 42;

I think that the problem is that void initialization was intended specifically for local variables, and the idea of = void for member variables was not really thought through. So, you can easily do something like

S s = void;
s._a = 42;

right now and avoid the default-initialization, but you can't cleanly do

struct S
{
    int _a;
    int[5000] _b = void;

    this(int a)
    {
        _a = a;
    }
}

So, the process is completely manual, which obviously sucks if it's something that you _always_ want to do with the type.

In general, D favors correctness over peformance with the idea that it gives you backdoors to get around the correctness guarantees in order to get more performance when it matters, but in this case, the backdoor arguably needs some improvement.

- Jonathan M Davis

Forums