Thread overview
Incorrect struct and union size. Possible bug?
Nov 20, 2004
MicroWizard
Nov 22, 2004
Russ Lewis
Nov 22, 2004
Russ Lewis
Nov 22, 2004
Sean Kelly
Dec 04, 2004
Walter
Dec 05, 2004
Sean Kelly
November 20, 2004
I wrote a program to access Win32 character based console form D, but got strange screen addressing problems. When I dig into the deep, I have found this.

In the following program when I mix structs and unions, the compiler does not calculate with the last struct element(s).

alias char CHAR;
alias wchar WCHAR;
alias ushort WORD;

struct TST {
union Char{
WCHAR UnicodeChar;
CHAR   AsciiChar;
};
WORD Attributes;
};

struct TST2 {
WCHAR UnicodeChar;
WORD Attributes;
};

struct TST3 {
union Char{
WCHAR UnicodeChar;
CHAR   AsciiChar;
};
union Char2{
WCHAR UnicodeChar;
CHAR   AsciiChar;
};
};

void main(char[][] arg)
{
printf("TST.sizeof=%d,TST2.sizeof=%d,TST3.sizeof=%d\n",
TST.sizeof,   TST2.sizeof,   TST3.sizeof);
}

The result is:
TST.sizeof=2,TST2.sizeof=4,TST3.sizeof=1

And what I expect:
TST.sizeof=4,TST2.sizeof=4,TST3.sizeof=4

Or missed I something completely?

Best regards,
Tamas Nagy


November 20, 2004
MicroWizard wrote:

> struct TST {
> union Char{
> WCHAR UnicodeChar;
> CHAR   AsciiChar;
> };
> WORD Attributes;
> };
> 
> struct TST2 {
> WCHAR UnicodeChar;
> WORD Attributes;
> };
> 
> struct TST3 {
> union Char{
> WCHAR UnicodeChar;
> CHAR   AsciiChar;
> };
> union Char2{
> WCHAR UnicodeChar;
> CHAR   AsciiChar;
> };
> };
> 
> void main(char[][] arg)
> {
> printf("TST.sizeof=%d,TST2.sizeof=%d,TST3.sizeof=%d\n",
> TST.sizeof,   TST2.sizeof,   TST3.sizeof);
> }
> 
> The result is:
> TST.sizeof=2,TST2.sizeof=4,TST3.sizeof=1
> 
> And what I expect:
> TST.sizeof=4,TST2.sizeof=4,TST3.sizeof=4
> 
> Or missed I something completely?

In D, declaring a union does *not* mean allocating a variable...
(even an empty structure: { } has a .sizeof of 1, a little quirk)

This works:

> union CharU{
> WCHAR UnicodeChar;
> CHAR   AsciiChar;
> }
> union Char2U{
> WCHAR UnicodeChar;
> CHAR   AsciiChar;
> }
> 
> struct TST {
> CharU Char;
> WORD Attributes;
> }
> struct TST2 {
> WCHAR UnicodeChar;
> WORD Attributes;
> }
> struct TST3 {
> CharU Char;
> Char2U Char2;
> }
(note: there is no semicolon after union or struct definitions)

Outputs:
TST.sizeof=4,TST2.sizeof=4,TST3.sizeof=4

--anders
November 22, 2004
Anders F Björklund wrote:
> In D, declaring a union does *not* mean allocating a variable...
> (even an empty structure: { } has a .sizeof of 1, a little quirk)

Actually, you aren't quite correct here.  The syntax that MicroWizard used was correct, and is known as an "anonymous union."  The documentation mentions that they are supported, provided that they are part of a struct.  See http://digitalmars.com/d/struct.html

I tested this program on Linux (0.106), and it works as expected:

> struct foo {
>   align(1):
>   union {
>     char c;
>     short s;
>   }
>   int i;
> }
>  import std.stdio;
> void main() {
>   foo f[2];
>   foo *ptr = cast(foo*)0;
>   writef("%x %x\n"
>          "%d %d %d\n"
>          "%d %d %d\n"
>          "%d %d %d\n",
>          cast(int)&f[0], cast(int)&f[1],
>          f.sizeof,f[0].sizeof,foo.sizeof,
>          foo.c.sizeof, foo.s.sizeof, foo.i.sizeof,
>          cast(int)&ptr.c, cast(int)&ptr.s, cast(int)&ptr.i);
> }


It outputs the following:

> ffffffffbfee0710 ffffffffbfee0716
> 12 6 6
> 1 2 4
> 0 0 2

November 22, 2004
Russ Lewis wrote:

>> In D, declaring a union does *not* mean allocating a variable...
> 
> Actually, you aren't quite correct here.  The syntax that MicroWizard used was correct, and is known as an "anonymous union."  The documentation mentions that they are supported, provided that they are part of a struct.  See http://digitalmars.com/d/struct.html

True, but that wasn't the syntax used in the previous example... :-P

They were all named ?

> struct TST {
> union Char{
>   WCHAR UnicodeChar;
>   CHAR   AsciiChar;
> }
> WORD Attributes;
> }
> 
> struct TST2 {
> WCHAR UnicodeChar;
> WORD Attributes;
> }
> 
> struct TST3 {
> union Char{
>   WCHAR UnicodeChar;
>   CHAR   AsciiChar;
> }
> union Char2{
>   WCHAR UnicodeChar;
>   CHAR   AsciiChar;
> }
> }

TST.sizeof=2,TST2.sizeof=4,TST3.sizeof=1
(small sizes due to no Char fields present)

They could be anonymized:

> struct TST {
> union {
>   WCHAR UnicodeChar;
>   CHAR   AsciiChar;
> }
> WORD Attributes;
> }
> 
> struct TST2 {
> WCHAR UnicodeChar;
> WORD Attributes;
> }
> 
> struct TST3 {
> union {
>   WCHAR UnicodeChar;
>   CHAR   AsciiChar;
> }
> union {
>   WCHAR UnicodeChar2;
>   CHAR   AsciiChar2;
> }
> }

TST.sizeof=4,TST2.sizeof=4,TST3.sizeof=4
(and losing one layer of indirection too)


Thanks for pointing that out, the D spec is kinda terse sometimes...

--anders

PS. This is still a little quirky:

> struct NULL
> {
> }
> import std.stdio;
> void main()
> {
>   writefln("%d", NULL.sizeof);
> }

(Hint: it does not print zero bytes)
November 22, 2004
Eek!  You're right, sorry...

Anders F Björklund wrote:
> Russ Lewis wrote:
> 
>>> In D, declaring a union does *not* mean allocating a variable...
>>
>>
>> Actually, you aren't quite correct here.  The syntax that MicroWizard used was correct, and is known as an "anonymous union."  The documentation mentions that they are supported, provided that they are part of a struct.  See http://digitalmars.com/d/struct.html
> 
> 
> True, but that wasn't the syntax used in the previous example... :-P

November 22, 2004
In article <cnt11q$2qra$1@digitaldaemon.com>, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
>
>PS. This is still a little quirky:
>
>> struct NULL
>> {
>> }
>> import std.stdio;
>> void main()
>> {
>>   writefln("%d", NULL.sizeof);
>> }
>
>(Hint: it does not print zero bytes)

It should print 1 byte (or 4 bytes, I'm not sure how big empty classes are in D).  The reason is that for a class to be uniquely addressable it has to occupy space in memory.  C++ is the same way.  C++ does have something called the "base class optimization" however, that allows empty base classes to have zero size in derived classes.  ie.

class Base {}
class Derived : Base {}

printf( "base size: %u\nderived size: %u\n", Base.sizeof, Derived.sizeof );

This should print:

base size: 4
derived size: 4

(it looks like the size of an empty class is int.sizeof after all)


Sean


November 22, 2004
Sean Kelly wrote:

> It should print 1 byte (or 4 bytes, I'm not sure how big empty classes are in
> D).  The reason is that for a class to be uniquely addressable it has to occupy
> space in memory.  C++ is the same way.  C++ does have something called the "base
> class optimization" however, that allows empty base classes to have zero size in
> derived classes.  ie.

(structs are 1 byte when compiled with gdc,
and both the base and derived class are 4)

Interesting! The following little C snippet:

> #include <stdio.h>
> 
> struct empty
> {
> };
> 
> int main(void)
> {
>   printf("%d\n",sizeof(struct empty));
>   return 0;
> }

Prints "0" compiled with gcc, and "1" with g++.
(since structs are classes in C++, no doubt...)

Just a little weird, it's not that it's used. :-)

--anders

PS. But gcc -Wall -pedantic croaks:
    "warning: struct has no members"
December 04, 2004
"Anders F Björklund" <afb@algonet.se> wrote in message news:cnt11q$2qra$1@digitaldaemon.com...
> PS. This is still a little quirky:
>
> > struct NULL
> > {
> > }
> > import std.stdio;
> > void main()
> > {
> >   writefln("%d", NULL.sizeof);
> > }
>
> (Hint: it does not print zero bytes)

It should print 1. The reason is to be compatible with C.


December 05, 2004
Walter wrote:

>>>struct NULL
>>>{
>>>}
>>
>>(Hint: it does not print zero bytes)
> 
> It should print 1. The reason is to be compatible with C.

And I understood the rationale there was just so that it
would allocate *something*, to be able to adress later on ?


null structures and classes are probably quite rare in reality,
I just found it interesting that I got different {} results:

gdc:
> struct: 4
> class:  1
g++:
> struct: 1
> class:  4

--anders
December 05, 2004
In article <coukni$1kke$1@digitaldaemon.com>, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
>
>Walter wrote:
>
>>>>struct NULL
>>>>{
>>>>}
>>>
>>>(Hint: it does not print zero bytes)
>> 
>> It should print 1. The reason is to be compatible with C.
>
>And I understood the rationale there was just so that it would allocate *something*, to be able to adress later on ?

Yup.  C++ requires that all objects have a unique address, and thus they all occupy at least one byte.  The empty base class optimization tend to help keep derived class size down however.

>null structures and classes are probably quite rare in reality, I just found it interesting that I got different {} results:
>
>gdc:
>> struct: 4
>> class:  1
>g++:
>> struct: 1
>> class:  4

That's odd.  g++ produces different sized objects just from switching 'struct' with 'class'?  Good to know, I suppose.


Sean