Jump to page: 1 2
Thread overview
Holes in structs and opEquals
Mar 07, 2010
bearophile
Mar 07, 2010
Fawzi Mohamed
Mar 07, 2010
bearophile
Mar 07, 2010
bearophile
Mar 07, 2010
Walter Bright
Mar 08, 2010
yigal chripun
Mar 08, 2010
Walter Bright
Mar 08, 2010
bearophile
Mar 08, 2010
Don
Mar 07, 2010
Don
Mar 07, 2010
Walter Bright
Mar 08, 2010
bearophile
March 07, 2010
This comes from a small thread that is going on in digitalmars.D.learn.

This program asserts:


import std.c.string;
struct S { // 16 bytes, with a hole
    ushort s;
    double d;
}
void main() {
    S s1, s2;
    memset(&s1, ubyte.min, S.sizeof);
    memset(&s2, ubyte.max, S.sizeof);
    s1.s = s2.s = 0;
    s1.d = s2.d = 0;
    assert(s1 == s2);
}


But a correctly implemented opEquals (and opCmp) among structs has to ignore the contents of the holes, because they can be filled with anything, for example if the structs where not initialized (with =void).

A correct opEquals has to work recursively (so if the struct contains a string, it has to control the string equality too).

And a built-in recursive opCmp/opHash for structs is about as useful as having a correct opEquals.

A correct opEquals can be a little slower, but correctness come first. In the uncommon situations where I really need max speed and I don't care of correctness I can use a memcmp(&s1, &s2, S.sizeof). (And the compiler is free to use just memcmp when a struct has no holes and doesn't contain references, associative arrays and dynamic arrays).

Correctness of basic struct operators is not an optional feature, like properties or even enums. If the opEquals among structs is wrong then it's better to not have it at all.

Bye,
bearophile
March 07, 2010
On 2010-03-07 15:09:53 +0100, bearophile <bearophileHUGS@lycos.com> said:

> This comes from a small thread that is going on in digitalmars.D.learn.
> 
> This program asserts:
> 
> 
> import std.c.string;
> struct S { // 16 bytes, with a hole
>     ushort s;
>     double d;
> }
> void main() {
>     S s1, s2;
>     memset(&s1, ubyte.min, S.sizeof);
>     memset(&s2, ubyte.max, S.sizeof);
>     s1.s = s2.s = 0;
>     s1.d = s2.d = 0;
>     assert(s1 == s2);
> }
> 
> 
> But a correctly implemented opEquals (and opCmp) among structs has to ignore the contents of the holes, because they can be filled with anything, for example if the structs where not initialized (with =void).
> 
> A correct opEquals has to work recursively (so if the struct contains a string, it has to control the string equality too).
> 
> And a built-in recursive opCmp/opHash for structs is about as useful as having a correct opEquals.
> 
> A correct opEquals can be a little slower, but correctness come first. In the uncommon situations where I really need max speed and I don't care of correctness I can use a memcmp(&s1, &s2, S.sizeof). (And the compiler is free to use just memcmp when a struct has no holes and doesn't contain references, associative arrays and dynamic arrays).
> 
> Correctness of basic struct operators is not an optional feature, like properties or even enums. If the opEquals among structs is wrong then it's better to not have it at all.

one could argue that the unsafe operation is memset.
The compiler always initializes a struct, so that what you describe should never happen in a safe program.
Still as you say the following example that might indeed considered a bug:

S s1=void,s2=void;
s1.s=0; s1.d=0;
s2.s=0; s2.d=0;
assert(s1 == s2);

here the assert might fail depending on the previous content of the memory.
I am not sure of what is the best solution, I am not sure that defining a special comparison operation by default for each struct is the correct solution (it can be quite some bloat), please note that a user defined comparison function will not have these problems.
Still I agree that traking down a bug due to this might be very ugly...

March 07, 2010
Fawzi Mohamed:

>I am not sure of what is the best solution,<

This is true for me too, that's why I have created this thread, to find a way to solve this problem.


>I am not sure that defining a special comparison operation by default for each struct is the correct solution (it can be quite some bloat), please note that a user defined comparison function will not have these problems.<

Note that if you write a opEquals for each struct then you have the same code bloat. The difference is that you have had to write lot of boring code, that can contain bugs, and your program is longer.

In a program you don't use == and != (or hashing) on all the structs, so in theory the compiler can add such opEquals only to the structs that are tested for equality in the program. I guess the usual modular compilation requirement of D code asks too all structs to have such opEquals. In this case the Link-Time Optimization of LLVM comes to the rescue and removed unused functions from the whole program/whole compilation block.

Another possible solution is to have a single opEquals in the runtime, that works like the array sort, using run time knowledge about the types. But this is of probably slower.

Bye,
bearophile
March 07, 2010
There are other solutions.
For example remove the current opEquals from structs, so doing == among two structs become a syntax error. And then add a property like @equable that when present beside the struct name adds a specific and correct and recursive opEquals to it.

@equable stuct Foo {
  int x;
}

Probably instead of @equable it's better to have an attribute that adds opHash, opEquals and opCmp to a struct.
This is safe, avoids most code bloat, and keeps code fast. The disadvantage is a that it adds a new attribute to the language.

If the user can create new attributes, then it's surely possible to define this attribute with D code and some compile-time reflection, keeping the language simple enough. There is one or more Java libs that use attributes for this specific purpose.

Bye,
bearophile
March 07, 2010
bearophile wrote:
> This comes from a small thread that is going on in digitalmars.D.learn.
> 
> This program asserts:
> 
> 
> import std.c.string;
> struct S { // 16 bytes, with a hole
>     ushort s;
>     double d;
> }
> void main() {
>     S s1, s2;
>     memset(&s1, ubyte.min, S.sizeof);
>     memset(&s2, ubyte.max, S.sizeof);
>     s1.s = s2.s = 0;
>     s1.d = s2.d = 0;
>     assert(s1 == s2);
> }
> 
> 
> But a correctly implemented opEquals (and opCmp) among structs has to ignore the contents of the holes, because they can be filled with anything, for example if the structs where not initialized (with =void).
> Correctness of basic struct operators is not an optional feature, like properties or even enums. If the opEquals among structs is wrong then it's better to not have it at all.

Yes. Most of the problems with struct opEquals were fixed in bug 3433. You've found an annoying case which wasn't covered.
March 07, 2010
bearophile wrote:
> But a correctly implemented opEquals (and opCmp) among structs has to
> ignore the contents of the holes, because they can be filled with
> anything,


The "holes" are defined to be filled with 0, and are when initialized by the compiler. This is specifically so that memcmp can be done to compare the struct contents.


> for example if the structs where not initialized (with =void).

If you use =void, or allocate with malloc(), it is your responsibility to deal with the holes.
March 07, 2010
Fawzi Mohamed wrote:
> one could argue that the unsafe operation is memset.

That's right.


> The compiler always initializes a struct, so that what you describe should never happen in a safe program.

Right.


> Still as you say the following example that might indeed considered a bug:
> 
> S s1=void,s2=void;
> s1.s=0; s1.d=0;
> s2.s=0; s2.d=0;
> assert(s1 == s2);
> 
> here the assert might fail depending on the previous content of the memory.
> I am not sure of what is the best solution, I am not sure that defining a special comparison operation by default for each struct is the correct solution (it can be quite some bloat), please note that a user defined comparison function will not have these problems.

No, it's not a bug in the language, it's a bug in the user code. Using =void is an advanced feature, and it requires the user to know what he's doing with it. That's why it isn't allowed in safe mode.


> Still I agree that traking down a bug due to this might be very ugly...

The idea with =void initializations is that they are findable using grep, and so can be singled out for special attention when there is a problem.
March 08, 2010
Walter Bright:
> The "holes" are defined to be filled with 0, and are when initialized by the compiler. This is specifically so that memcmp can be done to compare the struct contents.

Thank you for your answer. I didn't read this important detail in the online documentation, I must have missed it, sorry. It's a detail that D programmers must keep in mind well.

A future lint program for D will need to try to warn the programmer that using the default opEquals among uninitialized structs is wrong.

Bye,
bearophile
March 08, 2010
Walter Bright Wrote:

> Fawzi Mohamed wrote:
> > one could argue that the unsafe operation is memset.
> 
> That's right.
> 
> 
> > The compiler always initializes a struct, so that what you describe should never happen in a safe program.
> 
> Right.
> 
> 
> > Still as you say the following example that might indeed considered a bug:
> > 
> > S s1=void,s2=void;
> > s1.s=0; s1.d=0;
> > s2.s=0; s2.d=0;
> > assert(s1 == s2);
> > 
> > here the assert might fail depending on the previous content of the memory.
> > I am not sure of what is the best solution, I am not sure that defining
> > a special comparison operation by default for each struct is the correct
> > solution (it can be quite some bloat), please note that a user defined
> > comparison function will not have these problems.
> 
> No, it's not a bug in the language, it's a bug in the user code. Using =void is an advanced feature, and it requires the user to know what he's doing with it. That's why it isn't allowed in safe mode.
> 
> 
> > Still I agree that traking down a bug due to this might be very ugly...
> 
> The idea with =void initializations is that they are findable using grep, and so can be singled out for special attention when there is a problem.

The compiler knows at compile-time what variables are initialized with "=void". The compiler than can add a compilation error when such a struct variable is used in an equals expression. this doesn't cover use of malloc() which must be the user's responsebility.

e.g.
 S s1=void,s2=void;
 s1.s=0; s1.d=0;
 s2.s=0; s2.d=0;
 assert(s1 == s2); // <- this line should not compile

March 08, 2010
yigal chripun wrote:
> The compiler knows at compile-time what variables are initialized with "=void". The compiler than can add a compilation error when such a struct variable is used in an equals expression. this doesn't cover use of malloc() which must be the user's responsebility. 
> 
> e.g.
>  S s1=void,s2=void;
>  s1.s=0; s1.d=0;
>  s2.s=0; s2.d=0;
>  assert(s1 == s2); // <- this line should not compile
> 

It can, but I don't agree that it should. For an =void initialization, it's the user's responsibility to initialize it properly. The use of =void implies the user knows what he's doing with it, and will take care to initialize the 'holes' as necessary.

Trying to disable == for such structs is a losing battle, anyway, as the compiler could only detect the most obvious cases. Pass a reference to it to a function, store it in a data structure, etc., and all that goes away.
« First   ‹ Prev
1 2