February 02, 2011
spir:

> Do you know more about why/how the above fails?

It's simple. A string (or array) is a 2-words long struct that contains a pointer to the data and a size_t length. Default struct equality just compares the bits of those two fields. In the above example I have created f1 and f2 using two strings that have the same contents and lengths, but the pointers are different, because they are generated at run-time (normally the compiler uses a pool of shared string literals), so the equality fails.

I have asked Walter to fix this problem with strings and arrays probably three years ago or more, it's not a new problem :-)

Bye,
bearophile
February 02, 2011
On 02/02/2011 07:41 PM, spir wrote:
> On 02/02/2011 07:05 PM, bearophile wrote:
>> spir:
>>
>>> * The issue reported is about '==' on structs not using member opEquals when
>>> defined, instead performing bitwise comparison. This is not my case: Lexeme
>>> members are plain strings and an uint. They should just be compared as is.
>>> Bitwise comparison should just work fine.
>>> Also, this issue is marked solved for dmd 2.037 (I use 2.051).
>>
>> Lars is right, the == among structs is broken still:
>>
>> struct Foo { string s; }
>> void main() {
>> string s1 = "he";
>> string s2 = "llo";
>> string s3 = "hel";
>> string s4 = "lo";
>> auto f1 = Foo(s1 ~ s2);
>> auto f2 = Foo(s3 ~ s4);
>> assert((s1 ~ s2) == (s3 ~ s4));
>> assert(f1 == f2);
>> }
>
> Thank you, this helps much. I don't get the details yet, but think some similar
> issue is playing a role in my case. String members of the compared Lexeme
> structs are not concatenated, but one of them is sliced from the scanned source.
> If I dup'ed instead of slicing, this would create brand new strings; thus '=='
> performing bitwise comp should run fine, don't you think? I'll try in a short
> while.

No! idup does not help, still need opEquals. See also this example case:

struct S {string s;}
unittest {
    // concat
    string s1 = "he"; string s2 = "llo";
    string s3 = "hel"; string s4 = "lo";
    assert ( S(s1 ~ s2) != S(s3 ~ s4) );
    // slice
    string s = "hello";
    assert ( S(s[1..$-1]) != S("ell") );
    // idup'ed
    assert ( S(s[1..$-1].idup) != S("ell") );
    s2 = s[1..$-1].idup;
    assert ( S(s2) != S("ell") );
}

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

February 02, 2011
On 02/02/2011 08:20 PM, bearophile wrote:
> spir:
>
>> Do you know more about why/how the above fails?
>
> It's simple. A string (or array) is a 2-words long struct that contains a pointer to the data and a size_t length. Default struct equality just compares the bits of those two fields. In the above example I have created f1 and f2 using two strings that have the same contents and lengths, but the pointers are different, because they are generated at run-time (normally the compiler uses a pool of shared string literals), so the equality fails.
>
> I have asked Walter to fix this problem with strings and arrays probably three years ago or more, it's not a new problem :-)

All right, you mean string literals are interned? Explaining why the case below works...

struct S {string s;}
unittest {
    // plainly equal members
    string s01 = "hello"; string s02 = "hello";
    assert ( S(s01) == S(s02) );
}

... because s01 & s02 are actually the same, unique, piece of data in memory (thus pointers are equal indeed)?

I'm ok to write another bug report as you asked. But since you've asked for this already, and there is bug#3433 on a very similar topic supposedly closed as well, I fear it's useless, don't you?
And if we fix string, then the case of regular arrays becomes inconsistent.
The code issue about clear semantics, I guess, is that the case above works *due to* an implementation detail. The rest is just annoying (need to write opequals to get expected semantics in 99% cases, probably), but /not/ inconsistent.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

February 02, 2011
spir:

> And if we fix string, then the case of regular arrays becomes inconsistent.

The bug report is about arrays too, of course. I will write this bug report.

Bye,
bearophile
February 02, 2011
> The bug report is about arrays too, of course. I will write this bug report.

http://d.puremagic.com/issues/show_bug.cgi?id=5519

Bye,
bearophile
February 03, 2011
On Wed, 02 Feb 2011 17:35:50 +0100, spir wrote:

> On 02/02/2011 04:20 PM, Lars T. Kyllingstad wrote:
>> On Wed, 02 Feb 2011 15:55:53 +0100, spir wrote:
>>
>>> Hello,
>>>
>>> What are the default semantics for '==' on structs?
>>>
>>> I ask this because I was forced to write opEquals on a struct to get expected behaviour. This struct is basically:
>>>
>>> struct Lexeme {
>>>       string tag;
>>>       string slice;
>>>       Ordinal index;
>>> }
>>>
>>> Equal Lexeme's compare unequal using default '=='. When I add:
>>>
>>>       const bool opEquals (ref const(Lexeme) l) {
>>>           return (
>>>                  this.tag   == l.tag
>>>               && this.slice == l.slice
>>>               && this.index == l.index
>>>           );
>>>       }
>>>
>>> then all works fine. What do I miss?
>>
>> I think the compiler does a bitwise comparison in this case, meaning that it compares the arrays' pointers instead of their data.  Related bug report:
>>
>>    http://d.puremagic.com/issues/show_bug.cgi?id=3433
>>
>> -Lars
> 
> Thank you, Lars.
> In fact, I do not really understand what you mean. But it helped me
> think further :-)
> Two points:
> 
> * The issue reported is about '==' on structs not using member opEquals when defined, instead performing bitwise comparison. This is not my case: Lexeme members are plain strings and an uint. They should just be compared as is. Bitwise comparison should just work fine. Also, this issue is marked solved for dmd 2.037 (I use 2.051).

Yeah, but I would say it isn't really fixed.  It seems that the final decision was that members which define opEquals() are compared using opEquals(), while all other members are compared bitwisely.  But built-in dynamic arrays can also be compared in two ways, using '==' (equality) or 'is' (identity, i.e. bitwise equality).  Struct members which are dynamic arrays should, IMO, be compared using '==', but apparently they are not.


> * The following works as expected:
> 
> struct Floats {float f1, f2;}
> struct Strings {string s1, s2;}
> struct Lexeme {
>      string tag;
>      string slice;
>      uint index;
> }
> 
> unittest {
>      assert ( Floats(1.1,2.2)  == Floats(1.1,2.2) ); assert (
>      Strings("a","b") == Strings("a","b") ); assert ( Lexeme("a","b",1)
>      == Lexeme("a","b",1) );
> }
> 
> This shows, if I'm right:
> 1. Array (string) members are compared by value, not by ref/pointer. 2.
> Comparing Lexeme's works in this test case.

Nope, it doesn't show that, because you are assigning literals to your strings, and DMD is smart enough to detect duplicate literals.

    string s1 = "foo";
    string s2 = "foo";
    assert (s1.ptr == s2.ptr);

That is actually pretty cool, by the way. :)

Here's an example to demonstrate my point:

    import std.stdio;

    struct T { string s; }

    void main(string[] args)
    {
        auto s1 = args[1];
        auto s2 = args[2];
        auto t1 = T(s1);
        auto t2 = T(s2);

        if (s1 == s2) writeln("Arrays are equal");
        else writeln("Arrays are different");

        if (t1 == t2) writeln("Structs are equal");
        else writeln("Structs are different");
    }

If run with the arguments "foo bar" it prints:

    Arrays are different
    Structs are different

If run with the arguments "foo foo" it prints:

    Arrays are equal
    Structs are different

-Lars
February 03, 2011
On 02/03/2011 09:09 AM, Lars T. Kyllingstad wrote:
> On Wed, 02 Feb 2011 17:35:50 +0100, spir wrote:
>
>> On 02/02/2011 04:20 PM, Lars T. Kyllingstad wrote:
>>> On Wed, 02 Feb 2011 15:55:53 +0100, spir wrote:
>>>
>>>> Hello,
>>>>
>>>> What are the default semantics for '==' on structs?
>>>>
>>>> I ask this because I was forced to write opEquals on a struct to get
>>>> expected behaviour. This struct is basically:
>>>>
>>>> struct Lexeme {
>>>>        string tag;
>>>>        string slice;
>>>>        Ordinal index;
>>>> }
>>>>
>>>> Equal Lexeme's compare unequal using default '=='. When I add:
>>>>
>>>>        const bool opEquals (ref const(Lexeme) l) {
>>>>            return (
>>>>                   this.tag   == l.tag
>>>>                &&  this.slice == l.slice
>>>>                &&  this.index == l.index
>>>>            );
>>>>        }
>>>>
>>>> then all works fine. What do I miss?
>>>
>>> I think the compiler does a bitwise comparison in this case, meaning
>>> that it compares the arrays' pointers instead of their data.  Related
>>> bug report:
>>>
>>>     http://d.puremagic.com/issues/show_bug.cgi?id=3433
>>>
>>> -Lars
>>
>> Thank you, Lars.
>> In fact, I do not really understand what you mean. But it helped me
>> think further :-)
>> Two points:
>>
>> * The issue reported is about '==' on structs not using member opEquals
>> when defined, instead performing bitwise comparison. This is not my
>> case: Lexeme members are plain strings and an uint. They should just be
>> compared as is. Bitwise comparison should just work fine. Also, this
>> issue is marked solved for dmd 2.037 (I use 2.051).
>
> Yeah, but I would say it isn't really fixed.  It seems that the final
> decision was that members which define opEquals() are compared using
> opEquals(), while all other members are compared bitwisely.  But built-in
> dynamic arrays can also be compared in two ways, using '==' (equality) or
> 'is' (identity, i.e. bitwise equality).  Struct members which are dynamic
> arrays should, IMO, be compared using '==', but apparently they are not.
>
>
>> * The following works as expected:
>>
>> struct Floats {float f1, f2;}
>> struct Strings {string s1, s2;}
>> struct Lexeme {
>>       string tag;
>>       string slice;
>>       uint index;
>> }
>>
>> unittest {
>>       assert ( Floats(1.1,2.2)  == Floats(1.1,2.2) ); assert (
>>       Strings("a","b") == Strings("a","b") ); assert ( Lexeme("a","b",1)
>>       == Lexeme("a","b",1) );
>> }
>>
>> This shows, if I'm right:
>> 1. Array (string) members are compared by value, not by ref/pointer. 2.
>> Comparing Lexeme's works in this test case.
>
> Nope, it doesn't show that, because you are assigning literals to your
> strings, and DMD is smart enough to detect duplicate literals.
>
>      string s1 = "foo";
>      string s2 = "foo";
>      assert (s1.ptr == s2.ptr);
>
> That is actually pretty cool, by the way. :)
>
> Here's an example to demonstrate my point:
>
>      import std.stdio;
>
>      struct T { string s; }
>
>      void main(string[] args)
>      {
>          auto s1 = args[1];
>          auto s2 = args[2];
>          auto t1 = T(s1);
>          auto t2 = T(s2);
>
>          if (s1 == s2) writeln("Arrays are equal");
>          else writeln("Arrays are different");
>
>          if (t1 == t2) writeln("Structs are equal");
>          else writeln("Structs are different");
>      }
>
> If run with the arguments "foo bar" it prints:
>
>      Arrays are different
>      Structs are different
>
> If run with the arguments "foo foo" it prints:
>
>      Arrays are equal
>      Structs are different
>
> -Lars

Thank you again, Lars: I was wrong and you are right. The key point is interned string literals, that interacted with my issue.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

February 03, 2011
On Wed, 02 Feb 2011 11:35:50 -0500, spir <denis.spir@gmail.com> wrote:

> On 02/02/2011 04:20 PM, Lars T. Kyllingstad wrote:

>> I think the compiler does a bitwise comparison in this case, meaning that
>> it compares the arrays' pointers instead of their data.  Related bug
>> report:
>
> Thank you, Lars.
> In fact, I do not really understand what you mean. But it helped me think further :-)

I couldn't get from all your posts that you understand the issue.  A bitwise comparison compares ONLY the bits in the struct, NOT what the struct points to.

Comparing two arrays compares the data they point to.  So what is happening is essentially, the struct default comparison is comparing that both strings are equal in the identity sense, i.e. they both point to the exact same data with the exact same length.

If you analyze a string array, it looks like this (switch to mono-spaced font now :) :


+----------------------+
|int length            |
|immutable(char) *ptr -|------> "hello world"
+----------------------+

The pointer points to the data, it is not contained within the array "head".  The bitwise comparison only compares the head (what's in the box).

Apologies if you already understood this, but I wanted to be sure that you "got it."

-Steve
February 03, 2011
On 02/03/2011 02:27 PM, Steven Schveighoffer wrote:
> On Wed, 02 Feb 2011 11:35:50 -0500, spir <denis.spir@gmail.com> wrote:
>
>> On 02/02/2011 04:20 PM, Lars T. Kyllingstad wrote:
>
>>> I think the compiler does a bitwise comparison in this case, meaning that
>>> it compares the arrays' pointers instead of their data. Related bug
>>> report:
>>
>> Thank you, Lars.
>> In fact, I do not really understand what you mean. But it helped me think
>> further :-)
>
> I couldn't get from all your posts that you understand the issue. A bitwise
> comparison compares ONLY the bits in the struct, NOT what the struct points to.
>
> Comparing two arrays compares the data they point to. So what is happening is
> essentially, the struct default comparison is comparing that both strings are
> equal in the identity sense, i.e. they both point to the exact same data with
> the exact same length.
>
> If you analyze a string array, it looks like this (switch to mono-spaced font
> now :) :
>
>
> +----------------------+
> |int length |
> |immutable(char) *ptr -|------> "hello world"
> +----------------------+
>
> The pointer points to the data, it is not contained within the array "head".
> The bitwise comparison only compares the head (what's in the box).
>
> Apologies if you already understood this, but I wanted to be sure that you "got
> it."

Thank you very much Steven to take the time to explain this, and very clearly. Actually, I had understood this, but was mislead by another fact interacting with the issue discussed here: D interns string literals, so that 2 string struct members that happen to be literals /look like/ beeing compared by value:

struct S {string s;}
unittest {
    // literals
    string s01 = "hello"; string s02 = "hello";
    assert ( S(s01) == S(s02) );
    assert (s01 is s02);	// additional info
    // concat
    string s1 = "he"; string s2 = "llo";
    string s3 = "hel"; string s4 = "lo";
    assert ( S(s1 ~ s2) != S(s3 ~ s4) );
    // slice
    string s = "hello";
    assert ( S(s[1..$-1]) != S("ell") );
    // idup'ed
    assert ( S(s[1..$-1].idup) != S("ell") );
    auto s5 = s[1..$-1].idup;
    assert ( S(s5) != S("ell") );
}

The case of literals passes as expected. Actually, s01 & s02 are the same piece of data in memory:
    assert (s01 is s02);
but if one doesn't know dmd interns string literals, all looks like behaving as if they were compared by value. (Hope I'm clear.)

Side-questions: is it written somewhere dmd interns string literals? If yes, where? Is this supposed to be part of D's spec or an implementation aspect of dmd?

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

February 03, 2011
On Thu, 03 Feb 2011 12:52:28 -0500, spir <denis.spir@gmail.com> wrote:

>
> Side-questions: is it written somewhere dmd interns string literals? If yes, where? Is this supposed to be part of D's spec or an implementation aspect of dmd?

String literals are immutable, which means the compiler is free to re-use them wherever it wants without repercussions (you can't change immutable data).

It's not documented, but it fits within the requirements.

One thing that *is* documented is that string literals always have an implicit 0 character appended to the end of them, to allow easy interaction with C.

-Steve