November 28, 2012
On Wednesday, 28 November 2012 at 13:09:36 UTC, Dan wrote:
> On Monday, 26 November 2012 at 15:44:42 UTC, Joseph Rushton Wakeling wrote:
>> Hello all,
>>
>> I'm writing some code which is meant to represent a network of linked nodes.
> [snip]
>
> Ok, another follow up. I can reproduce your segfault using your posted code, it is included below. But the interesting thing is, I was doing the "testing" out of your Network!Node2 in a unittest section. By whittling down, the smallest crash case I could come up with is this:
>
> import std.typecons;
> import std.stdio;
> alias RefCounted!(int) Foo;
> unittest {
>   Foo[int] map;
>   map[1] = Foo();
> }
>
> When I change that unittest block to a 'void main()' it works just fine. I tried the same change of unittest to 'void main()' on your code and found the same results - no crash. I think the crash issue might not be with map (even though Maxim found some troubling stuff with uninitialized structs being destructed when inserting a key that is not present). Or maybe it is just a map problem and by switching to main I am just getting lucky in not getting a crash.
>

Actually bug is still there - changing unittest to main() does not fix program, even if it seems to run correctly. The problem with memory corruption is that it may happen with no observable segfaults just because erroneous operation was performed on memory which was permitted to be read/written.

Valgrind is tool which may show absence of memory corruption errors. Giving your example:

import std.typecons;
import std.stdio;
alias RefCounted!(int) Foo;

void main() { // main instead of unittest
  Foo[int] map;
  map[1] = Foo();
}

valgrind outputs:
==2562== Conditional jump or move depends on uninitialised value(s)
==2562==    at 0x41A424: _D3std8typecons18__T10RefCountedTiZ10RefCounted6__dtorMFZv
==2562==    by 0x41A4D5: _D3std8typecons18__T10RefCountedTiZ10RefCounted8opAssignMFS3std8typecons18__T10RefCountedTiZ10RefCountedZv
==2562==    by 0x41A238: _Dmain
==2562==    by 0x41C42B: _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi7runMainMFZv
==2562==    by 0x41BCCD: _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi7tryExecMFMDFZvZv
==2562==    by 0x41C472: _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi6runAllMFZv
==2562==    by 0x41BCCD: _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi7tryExecMFMDFZvZv
==2562==    by 0x41BC89: _d_run_main
==2562==    by 0x41BACA: main

which is exactly the same problem with unittest version: uninitialized object and bogus this pointer - now "this" just points to some allocated memory. Probably this happens because between running unittest and main function druntime has allocated more memory - and more memory is valid to be read.

Actually, I have also found this and considered to be not significant (bug still exists) and thus have not reported - next time I will be more verbose.
November 28, 2012
On 11/28/2012 02:09 PM, Dan wrote:
> It would be interesting to know if Joseph was doing his "testing" out in
> unittest or in a main.

In main -- the code you saw attached to my earlier email is the code I was running.

Thanks very much for your efforts at chasing down the bug!
November 28, 2012
On Wednesday, 28 November 2012 at 13:43:04 UTC, Maxim Fomin wrote:
> On Wednesday, 28 November 2012 at 13:09:36 UTC, Dan wrote:

> Actually bug is still there - changing unittest to main() does not fix program, even if it seems to run correctly. The problem with memory corruption is that it may happen with no observable segfaults just because erroneous operation was performed on memory which was permitted to be read/written.
>
> Valgrind is tool which may show absence of memory corruption errors. Giving your example:

Thanks! I see what you are saying in valgrind. However, the following shows no problem in valgrind. Same code, only using S instead of RefCounted!(int).
How could that be explained?
Note that both RefCount!() and your posted S have opAssign, whereas the one below does not. Perhaps it is a problem only when there exists a custom opAssign?


Thanks
Dan

---------
import std.typecons;
import std.stdio;
import pprint.pp;

struct S {
  int x;
  char[] c;
}
//alias RefCounted!(int) Foo;
alias S Foo;
Foo[int] map;

Foo f;

unittest {
  int i=3;
  map[i] = Foo();
}
November 28, 2012
On Wednesday, 28 November 2012 at 18:08:59 UTC, Dan wrote:
> Thanks! I see what you are saying in valgrind. However, the following shows no problem in valgrind. Same code, only using S instead of RefCounted!(int).
> How could that be explained?
> Note that both RefCount!() and your posted S have opAssign, whereas the one below does not. Perhaps it is a problem only when there exists a custom opAssign?
>
>
> Thanks
> Dan
>
> <skipped>

Because code does not access memory expected to be corrupted.

import core.stdc.stdio : printf;

struct S {
  long x = 42;
  version(bug) {
    void opAssign(S rhs)
    {
      printf("%d\n", this.x);
    }
  }
}

//alias RefCounted!(int) Foo;
alias S Foo;
Foo[int] map;

void main() {
  map[3] = Foo();
  printf("%d\n", map[3].x);
}

This code with version=bug produces garbage because of opAssign. It seems that opAssign is actually called before accessing map:

push   %rbp
mov    %rsp,%rbp
sub    $0x28,%rsp
push   %rbx
movabs $0x2a,%rax
mov    %rax,-0x10(%rbp)
mov    %rax,%rsi
lea    -0x18(%rbp),%rdi
callq  0x418c70 <_D3aux1S8opAssignMFS3aux1SZv>
mov    -0x18(%rbp),%rcx
mov    %rcx,-0x28(%rbp)
mov    $0x3,%edx
mov    %edx,-0x20(%rbp)
lea    -0x20(%rbp),%rcx
mov    $0x8,%dl
movabs $0x430db0,%rsi
mov    %fs:0x0,%rdi
add    0x21d2c1(%rip),%rdi        # 0x635fb0
callq  0x419048 <_aaGetX>


November 28, 2012
On Wednesday, 28 November 2012 at 20:30:41 UTC, Maxim Fomin wrote:
> On Wednesday, 28 November 2012 at 18:08:59 UTC, Dan wrote:
>
> This code with version=bug produces garbage because of opAssign. It seems that opAssign is actually called before accessing map:
>

Maxim, thanks for looking more at this. This bug is not affecting my work in any way - I'm just trying to learn more about D and how to debug, and your responses are helpful.

I've now built druntime and phobos with debug to see if I can see what is going on more.

For me the relevant assembly of your code looks like below. I see the value in the assoc array get created and memset initialized to 0 inside aaA.d function _aaGetX with this line:

    memset(ptail + aligntsize(keytitsize), 0, valuesize); // zero value

Then between +62 and +102 it goes off the rails and upon entry to opAssign *this* is likely garbage. I can't be certain because the values I see are 0, which would be consistent with 0 initialization - but could also just be luck. At this point I wish I knew some assembly. Anyway, I don't know if it is a problem with associative array code, per se, or the code generated by the compiler when opAssign and/or postblit are defined for the value type. I do know that if you have no postblit and no opAssign there are no issues - even with a destructor. This is consistent with the examples you have shown, they have postblit and/or opAssign as well as the RefCount code which has postblit and dtor. In both your example crash and the RefCount crash the actual seg fault is in the dtor accessing bogus data because as you pointed out it was not correctly initialized.

Thanks
Dan

   0x000000000041a630 <+0>:	push   %rbp
   0x000000000041a631 <+1>:	mov    %rsp,%rbp
   0x000000000041a634 <+4>:	sub    $0x38,%rsp
   0x000000000041a638 <+8>:	push   %rbx
   0x000000000041a639 <+9>:	mov    $0x3,%eax
   0x000000000041a63e <+14>:	mov    %eax,-0x30(%rbp)
   0x000000000041a641 <+17>:	lea    -0x30(%rbp),%rcx
   0x000000000041a645 <+21>:	movabs $0x8,%rdx
   0x000000000041a64f <+31>:	movabs $0x43cc10,%rsi
   0x000000000041a659 <+41>:	mov    %fs:0x0,%rdi
   0x000000000041a662 <+50>:	add    0x229957(%rip),%rdi        # 0x643fc0
   0x000000000041a669 <+57>:	callq  0x41ae84 <_aaGetX>
   0x000000000041a66e <+62>:	mov    %rax,-0x28(%rbp)
   0x000000000041a672 <+66>:	test   %rax,%rax
   0x000000000041a675 <+69>:	jne    0x41a681 <_Dmain+81>
   0x000000000041a677 <+71>:	mov    $0x12,%edi
   0x000000000041a67c <+76>:	callq  0x41a710 <_D5again7__arrayZ>
=> 0x000000000041a681 <+81>:	movabs $0x2a,%rax
   0x000000000041a68b <+91>:	mov    %rax,-0x18(%rbp)
   0x000000000041a68f <+95>:	mov    %rax,%rsi
   0x000000000041a692 <+98>:	lea    -0x20(%rbp),%rdi
   0x000000000041a696 <+102>:	callq  0x41a5c8 <_D5again1S8opAssignMFS5again1SZv>

November 29, 2012
On Wednesday, 28 November 2012 at 22:13:05 UTC, Dan wrote:
> On Wednesday, 28 November 2012 at 20:30:41 UTC, Maxim Fomin wrote:
>> On Wednesday, 28 November 2012 at 18:08:59 UTC, Dan wrote:
>>
>> This code with version=bug produces garbage because of opAssign. It seems that opAssign is actually called before accessing map:
>>
>
> Maxim, thanks for looking more at this. This bug is not affecting my work in any way - I'm just trying to learn more about D and how to debug, and your responses are helpful.
>
> I've now built druntime and phobos with debug to see if I can see what is going on more.
>
> For me the relevant assembly of your code looks like below. I see the value in the assoc array get created and memset initialized to 0 inside aaA.d function _aaGetX with this line:
>
>     memset(ptail + aligntsize(keytitsize), 0, valuesize); // zero value
>
> Then between +62 and +102 it goes off the rails and upon entry to opAssign *this* is likely garbage. I can't be certain because the values I see are 0, which would be consistent with 0 initialization - but could also just be luck. At this point I wish I knew some assembly. Anyway, I don't know if it is a problem with associative array code, per se, or the code generated by the compiler when opAssign and/or postblit are defined for the value type. I do know that if you have no postblit and no opAssign there are no issues - even with a destructor. This is consistent with the examples you have shown, they have postblit and/or opAssign as well as the RefCount code which has postblit and dtor. In both your example crash and the RefCount crash the actual seg fault is in the dtor accessing bogus data because as you pointed out it was not correctly initialized.
>
> Thanks
> Dan
>
>  <skipped>

This doesn't look like assembly for previous source. Please provide the source for which you have assembly and tell which dmd options do you use.
November 29, 2012
On Thursday, 29 November 2012 at 07:59:02 UTC, Maxim Fomin wrote:

> This doesn't look like assembly for previous source. Please provide the source for which you have assembly and tell which dmd options do you use.

Well, I'm using the latest dmd (from the trunk), phobos, druntime, so I could build and step through.

rdmd -version=bug --force --build-only -g -w -property -I/home/dbdavidson/stage/vibe.d/source -I/home/dbdavidson/plusauri/dlang -L-levent_pthreads -L-levent -L-lssl -L-lcrypto /home/dbdavidson/tmp/again.d

dmd -v
DMD v2.061 DEBUG
DMD64 D Compiler v2.061

-----------------------
import core.stdc.stdio : printf;

struct S {
  long x = 42;
  version(bug) {
    void opAssign(S rhs)
    {
      printf("Assign %d\n", this.x);
    }
  }
}

//alias RefCounted!(int) Foo;
alias S Foo;
Foo[int] map;

void main() {
  map[3] = Foo();
  printf("%d\n", map[3].x);
}

------------------------

(gdb) disass
Dump of assembler code for function _Dmain:
   0x000000000041a630 <+0>:	push   %rbp
   0x000000000041a631 <+1>:	mov    %rsp,%rbp
   0x000000000041a634 <+4>:	sub    $0x38,%rsp
   0x000000000041a638 <+8>:	push   %rbx
=> 0x000000000041a639 <+9>:	mov    $0x3,%eax
   0x000000000041a63e <+14>:	mov    %eax,-0x30(%rbp)
   0x000000000041a641 <+17>:	lea    -0x30(%rbp),%rcx
   0x000000000041a645 <+21>:	movabs $0x8,%rdx
   0x000000000041a64f <+31>:	movabs $0x43cc10,%rsi
   0x000000000041a659 <+41>:	mov    %fs:0x0,%rdi
   0x000000000041a662 <+50>:	add    0x229957(%rip),%rdi        # 0x643fc0
   0x000000000041a669 <+57>:	callq  0x41ae84 <_aaGetX>
   0x000000000041a66e <+62>:	mov    %rax,-0x28(%rbp)
   0x000000000041a672 <+66>:	test   %rax,%rax
   0x000000000041a675 <+69>:	jne    0x41a681 <_Dmain+81>
   0x000000000041a677 <+71>:	mov    $0x12,%edi
   0x000000000041a67c <+76>:	callq  0x41a710 <_D5again7__arrayZ>
   0x000000000041a681 <+81>:	movabs $0x2a,%rax
   0x000000000041a68b <+91>:	mov    %rax,-0x18(%rbp)
   0x000000000041a68f <+95>:	mov    %rax,%rsi
   0x000000000041a692 <+98>:	lea    -0x20(%rbp),%rdi
   0x000000000041a696 <+102>:	callq  0x41a5c8 <_D5again1S8opAssignMFS5again1SZv>
   0x000000000041a69b <+107>:	mov    -0x20(%rbp),%rcx
   0x000000000041a69f <+111>:	mov    -0x28(%rbp),%rdx
   0x000000000041a6a3 <+115>:	mov    %rcx,(%rdx)

November 29, 2012
On Thursday, 29 November 2012 at 12:38:03 UTC, Dan wrote:
> On Thursday, 29 November 2012 at 07:59:02 UTC, Maxim Fomin wrote:
>
>> This doesn't look like assembly for previous source. Please provide the source for which you have assembly and tell which dmd options do you use.
>
> Well, I'm using the latest dmd (from the trunk), phobos, druntime, so I could build and step through.
>

This is version generated by rdmd -version=bug --force --build-only -g -w -property. The only difference between this and yours is in <+71>.

<+00>:	push   %rbp
<+01>:	mov    %rsp,%rbp
<+04>:	sub    $0x38,%rsp
<+08>:	push   %rbx
// push 3 as a pkey, -0x30(%rbp) is 3
<+09>:	mov    $0x3,%eax
<+14>:	mov    %eax,-0x30(%rbp)
<+17>:	lea    -0x30(%rbp),%rcx
// push valuesize
<+21>:	movabs $0x8,%rdx
// push keyti
<+31>:	movabs $0x430f50,%rsi
// push map address, it is tls object
<+41>:	mov    %fs:0x0,%rdi
<+50>:	add    0x21e22f(%rip),%rdi        # 0x636fb0
// call, keyti=8, aa=&map, valuesize=8, pkey=3
<+57>:	callq  0x419140 <_aaGetX>
// store return value in -0x28(%rbp)
<+62>:	mov    %rax,-0x28(%rbp)
// check array bounds
<+66>:	test   %rax,%rax
<+69>:	jne    0x418d99 <_Dmain+81>
<+71>:	mov    $0x13,%edi
<+76>:	callq  0x418e28 <_D4main7__arrayZ>
// constructing S rhs for opAssign
<+81>:	movabs $0x2a,%rax
<+91>:	mov    %rax,-0x18(%rbp) // rhs
<+95>:	mov    %rax,%rsi
// making this from -0x20(%rbp), but it was stored in -0x28(%rbp)
<+98>:	lea    -0x20(%rbp),%rdi
// opAssign loads this from %rdi
<+102>:	callq  0x418ce0 <_D4main1S8opAssignMFS4main1SZv>
// replace content in -0x28(%rbp) by -0x20(%rbp)
<+107>:	mov    -0x20(%rbp),%rcx
<+111>:	mov    -0x28(%rbp),%rdx
<+115>:	mov    %rcx,(%rdx)
// load previously created object
<+118>:	movl   $0x3,-0x10(%rbp)
<+125>:	lea    -0x10(%rbp),%rcx
<+129>:	movabs $0x8,%rdx
<+139>:	movabs $0x430f50,%rsi
<+149>:	mov    %fs:0x0,%rax
<+158>:	mov    0x21e1c3(%rip),%rbx        # 0x636fb0
<+165>:	mov    (%rax,%rbx,1),%rdi
<+169>:	callq  0x4192b8 <_aaGetRvalueX>
// store pointer to existed object
<+174>:	mov    %rax,-0x8(%rbp)
// check array bounds
<+178>:	test   %rax,%rax
<+181>:	jne    0x418e09 <_Dmain+193>
<+183>:	mov    $0x14,%edi
<+188>:	callq  0x418e28 <_D4main7__arrayZ>
// load x member
<+193>:	mov    -0x8(%rbp),%rcx
<+197>:	mov    (%rcx),%rsi
<+200>:	movabs $0x430ac0,%rdi
<+210>:	xor    %eax,%eax
<+212>:	callq  0x4188e0 <printf@plt>
<+217>:	xor    %eax,%eax
<+219>:	pop    %rbx
<+220>:	leaveq
<+221>:	retq
End of assembler dump.

Main stack frame looks follows:
-0x30(%rbp):-0x28(%rbp) 3 as a pkey arg to _aaGetX
-0x28(%rbp):-0x20(%rbp) pointer to S allocated by _aaGetX
-0x20(%rbp):-0x18(%rbp) uninitialized struct - "this" ptr
-0x18(%rbp):-0x10(%rbp) constructed struct passed as rhs to opAssign
-0x10(%rbp):-0x08(%rbp) 3 as a pkey arg to _aaGetRvalueX
-0x08(%rbp):-0x00(%rbp) pointer to S returned from _aaGetRvalueX

It seems that dmd allocates space for two structs, one non-initialized, second initialized, issues call to druntime, then places a call to opAssign with non-initialized as "this" object with initialized as rhs argument, then writes content on non-initialized to allocated by druntime. So, this looks like a codegen bug.

By the way, if the code is compiled with command "dmd main -g -release -noboundscheck -version=bug", opAssign call is placed before calling _aaGetX.
November 29, 2012
On Thursday, 29 November 2012 at 15:06:07 UTC, Maxim Fomin wrote:
> On Thursday, 29 November 2012 at 12:38:03 UTC, Dan wrote:
>> On Thursday, 29 November 2012 at 07:59:02 UTC, Maxim Fomin wrote:
>>
>>> This doesn't look like assembly for previous source. Please provide the source for which you have assembly and tell which dmd options do you use.
>>
>> Well, I'm using the latest dmd (from the trunk), phobos, druntime, so I could build and step through.
>>
>
> This is version generated by rdmd -version=bug --force --build-only -g -w -property. The only difference between this and yours is in <+71>.
>
> <+00>:	push   %rbp
> <+01>:	mov    %rsp,%rbp
> <+04>:	sub    $0x38,%rsp
> <+08>:	push   %rbx
> // push 3 as a pkey, -0x30(%rbp) is 3
> <+09>:	mov    $0x3,%eax
> <+14>:	mov    %eax,-0x30(%rbp)
> <+17>:	lea    -0x30(%rbp),%rcx
> // push valuesize
> <+21>:	movabs $0x8,%rdx
> // push keyti
> <+31>:	movabs $0x430f50,%rsi
> // push map address, it is tls object
> <+41>:	mov    %fs:0x0,%rdi
> <+50>:	add    0x21e22f(%rip),%rdi        # 0x636fb0
> // call, keyti=8, aa=&map, valuesize=8, pkey=3
> <+57>:	callq  0x419140 <_aaGetX>
> // store return value in -0x28(%rbp)
> <+62>:	mov    %rax,-0x28(%rbp)
> // check array bounds
> <+66>:	test   %rax,%rax
> <+69>:	jne    0x418d99 <_Dmain+81>
> <+71>:	mov    $0x13,%edi
> <+76>:	callq  0x418e28 <_D4main7__arrayZ>
> // constructing S rhs for opAssign
> <+81>:	movabs $0x2a,%rax
> <+91>:	mov    %rax,-0x18(%rbp) // rhs
> <+95>:	mov    %rax,%rsi
> // making this from -0x20(%rbp), but it was stored in -0x28(%rbp)
> <+98>:	lea    -0x20(%rbp),%rdi
> // opAssign loads this from %rdi
> <+102>:	callq  0x418ce0 <_D4main1S8opAssignMFS4main1SZv>
> // replace content in -0x28(%rbp) by -0x20(%rbp)
> <+107>:	mov    -0x20(%rbp),%rcx
> <+111>:	mov    -0x28(%rbp),%rdx
> <+115>:	mov    %rcx,(%rdx)
> // load previously created object
> <+118>:	movl   $0x3,-0x10(%rbp)
> <+125>:	lea    -0x10(%rbp),%rcx
> <+129>:	movabs $0x8,%rdx
> <+139>:	movabs $0x430f50,%rsi
> <+149>:	mov    %fs:0x0,%rax
> <+158>:	mov    0x21e1c3(%rip),%rbx        # 0x636fb0
> <+165>:	mov    (%rax,%rbx,1),%rdi
> <+169>:	callq  0x4192b8 <_aaGetRvalueX>
> // store pointer to existed object
> <+174>:	mov    %rax,-0x8(%rbp)
> // check array bounds
> <+178>:	test   %rax,%rax
> <+181>:	jne    0x418e09 <_Dmain+193>
> <+183>:	mov    $0x14,%edi
> <+188>:	callq  0x418e28 <_D4main7__arrayZ>
> // load x member
> <+193>:	mov    -0x8(%rbp),%rcx
> <+197>:	mov    (%rcx),%rsi
> <+200>:	movabs $0x430ac0,%rdi
> <+210>:	xor    %eax,%eax
> <+212>:	callq  0x4188e0 <printf@plt>
> <+217>:	xor    %eax,%eax
> <+219>:	pop    %rbx
> <+220>:	leaveq
> <+221>:	retq
> End of assembler dump.
>
> Main stack frame looks follows:
> -0x30(%rbp):-0x28(%rbp) 3 as a pkey arg to _aaGetX
> -0x28(%rbp):-0x20(%rbp) pointer to S allocated by _aaGetX
> -0x20(%rbp):-0x18(%rbp) uninitialized struct - "this" ptr
> -0x18(%rbp):-0x10(%rbp) constructed struct passed as rhs to opAssign
> -0x10(%rbp):-0x08(%rbp) 3 as a pkey arg to _aaGetRvalueX
> -0x08(%rbp):-0x00(%rbp) pointer to S returned from _aaGetRvalueX
>
> It seems that dmd allocates space for two structs, one non-initialized, second initialized, issues call to druntime, then places a call to opAssign with non-initialized as "this" object with initialized as rhs argument, then writes content on non-initialized to allocated by druntime. So, this looks like a codegen bug.
>
> By the way, if the code is compiled with command "dmd main -g -release -noboundscheck -version=bug", opAssign call is placed before calling _aaGetX.

Very educational - thanks!

1 2
Next ›   Last »