December 10, 2014
On Wed, 10 Dec 2014 12:52:22 +0000
Ruslan Mullakhmetov via Digitalmars-d-learn
<digitalmars-d-learn@puremagic.com> wrote:

> On Wednesday, 10 December 2014 at 08:46:12 UTC, Ruslan Mullakhmetov wrote:
> > yes. that was the mistake. also after fixing bug in Blk Attributes printing i got more reasonable attrs
> >
> > for object blk: FINALIZE
> > for array of objects blk: NO_SCAN APPENDABLE
> >
> > this is sound good except for NO_SCAN.
> >
> > ...
> > the other question why this happens... try to debug more.
> 
> I've done more dubugging.
> 
> what i've found:
> 
> initially array blk has only attrs APPENDABLE, but after some time this blk is shrinked and reallocated (moved) and then NO_SCAN attr appears.
> 
> 
> here the output of my extended logs:
> 
> --------
> before tag: 1 len: 2 ptr: 103DD9058 root: 103DD8000:8192 attr:
> APPENDABLE
> after tag: 1 len: 3 ptr: 103A21DD0 root: 103A21DC0:64 attr:
> NO_SCAN APPENDABLE
> --------
> 
> this is produced by the following code
> 
> http://dpaste.dzfl.pl/0c6dc16270a1
> 
> so in a nutshell after appending to array via ~= operator blk attrs changed from APPENDABLE to NO_SCAN APPENDABLE which cause the problem.
> 
> why and how this happens? can anybody explain it to me?
> 
can you give us a minified code that causes this behavior?


December 10, 2014
On Wednesday, 10 December 2014 at 12:52:24 UTC, Ruslan Mullakhmetov wrote:
>
> why and how this happens? can anybody explain it to me?

I tried to extract this and saw NO NO_SCAN attrs after moving blk:


the following piece of output produced by http://dpaste.dzfl.pl/6f773e17de92

len: 99996 ptr: 109DF0010 root: 109DF0000:1048576 attr: APPENDABLE
len: 99997 ptr: 109DF0010 root: 109DF0000:1048576 attr: APPENDABLE
len: 99998 ptr: 109DF0010 root: 109DF0000:1048576 attr: APPENDABLE
len: 99999 ptr: 109DF0010 root: 109DF0000:1048576 attr: APPENDABLE
len: 100000 ptr: 109DF0010 root: 109DF0000:1048576 attr: APPENDABLE
--- shrinked --
len: 1 ptr: 109EB3508 root: 109DF0000:1048576 attr: APPENDABLE
len: 2 ptr: 109EB3508 root: 109DF0000:1048576 attr: APPENDABLE
len: 3 ptr: 109EB3508 root: 109DF0000:1048576 attr: APPENDABLE
len: 4 ptr: 109EB3508 root: 109DF0000:1048576 attr: APPENDABLE
len: 5 ptr: 109EB3508 root: 109DF0000:1048576 attr: APPENDABLE
len: 6 ptr: 109F60640 root: 109F60640:64 attr: APPENDABLE
len: 7 ptr: 109F60640 root: 109F60640:64 attr: APPENDABLE
December 10, 2014
On Wednesday, 10 December 2014 at 13:00:45 UTC, ketmar via Digitalmars-d-learn wrote:
> can you give us a minified code that causes this behavior?

see previous post. the problem vanish if i try to extract it.

December 10, 2014
On Wed, 10 Dec 2014 13:03:21 +0000
Ruslan Mullakhmetov via Digitalmars-d-learn
<digitalmars-d-learn@puremagic.com> wrote:

> On Wednesday, 10 December 2014 at 12:52:24 UTC, Ruslan Mullakhmetov wrote:
> >
> > why and how this happens? can anybody explain it to me?
> 
> I tried to extract this and saw NO NO_SCAN attrs after moving blk:
i strongly believe that you have some strange casts buried somewhere in the depth of the complex code, or something similar.

maybe trackallocs.d from https://bitbucket.org/infognition/dstuff can help to track (re)allocations. you can modify source to dump the flags.


December 11, 2014
On 12/10/14 7:52 AM, Ruslan Mullakhmetov wrote:
> On Wednesday, 10 December 2014 at 08:46:12 UTC, Ruslan Mullakhmetov wrote:
>> yes. that was the mistake. also after fixing bug in Blk Attributes
>> printing i got more reasonable attrs
>>
>> for object blk: FINALIZE
>> for array of objects blk: NO_SCAN APPENDABLE
>>
>> this is sound good except for NO_SCAN.
>>
>> ...
>> the other question why this happens... try to debug more.
>
> I've done more dubugging.
>
> what i've found:
>
> initially array blk has only attrs APPENDABLE, but after some time this
> blk is shrinked and reallocated (moved) and then NO_SCAN attr appears.
>
>
> here the output of my extended logs:
>
> --------
> before tag: 1 len: 2 ptr: 103DD9058 root: 103DD8000:8192 attr: APPENDABLE
> after tag: 1 len: 3 ptr: 103A21DD0 root: 103A21DC0:64 attr: NO_SCAN
> APPENDABLE
> --------
>
> this is produced by the following code
>
> http://dpaste.dzfl.pl/0c6dc16270a1
>
> so in a nutshell after appending to array via ~= operator blk attrs
> changed from APPENDABLE to NO_SCAN APPENDABLE which cause the problem.
>
> why and how this happens? can anybody explain it to me?

I have an idea of what is happening, I will do some testing. Thanks for debugging this so far, this is useful info.

This is *definitely* a bug, if the code you gave is what caused that output. Appending should not add the NO_SCAN tag.

-Steve
December 11, 2014
On 12/10/14 7:52 AM, Ruslan Mullakhmetov wrote:
> On Wednesday, 10 December 2014 at 08:46:12 UTC, Ruslan Mullakhmetov wrote:
>> yes. that was the mistake. also after fixing bug in Blk Attributes
>> printing i got more reasonable attrs
>>
>> for object blk: FINALIZE
>> for array of objects blk: NO_SCAN APPENDABLE
>>
>> this is sound good except for NO_SCAN.
>>
>> ...
>> the other question why this happens... try to debug more.
>
> I've done more dubugging.
>
> what i've found:
>
> initially array blk has only attrs APPENDABLE, but after some time this
> blk is shrinked and reallocated (moved) and then NO_SCAN attr appears.
>
>
> here the output of my extended logs:
>
> --------
> before tag: 1 len: 2 ptr: 103DD9058 root: 103DD8000:8192 attr: APPENDABLE
> after tag: 1 len: 3 ptr: 103A21DD0 root: 103A21DC0:64 attr: NO_SCAN
> APPENDABLE
> --------

My analysis so far:

1. The before/after makes sense except for the attribute and the offset. A realloc into a 64-byte block should NOT cause an offset of 16 bytes. I have found why it's happening, which is a bug, but not one that should cause the problem of setting the noscan bit (will file an issue on that).
2. In the array append code, the block attributes are obtained via GC.query, which has this code for getting the attributes:

https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L1792

Quoting from that function:

// reset the offset to the base pointer, otherwise the bits
// are the bits for the pointer, which may be garbage
offset = cast(size_t)(info.base - pool.baseAddr);
info.attr = getBits(pool, cast(size_t)(offset >> pool.shiftBy));

Which should get the correct bits. I suspected there was an issue with getting the wrong bits, but this code looks correct.

3. The runtime caches the block info for thread local data for append speed. A potential issue is that the attributes are cached from a previous use for that block, but the GC (and the runtime itself) SHOULD clear that cache entry when that block is freed, avoiding this issue. A potential way to check this is to assert in a debug build of druntime that the cached block info always equals the actual block info. Are you able to build a debug version of druntime to test this? I can give you the changes you should make. This would explain the great difficulty in reproducing the issue.

4. If your code is multi-threaded, but using __gshared, it can make the cache incorrect. Are you doing this?

But the cache is really the only possible place I can see where the bits are set incorrectly, given that you just verified the bits are correct before the append.

Can you just list the version of the compiler you are using? I want to make sure this isn't an issue that has already been fixed.

-Steve
December 12, 2014
On Thursday, 11 December 2014 at 18:36:59 UTC, Steven Schveighoffer wrote:
> My analysis so far:
>
> 2. In the array append code, the block attributes are obtained via GC.query, which has this code for getting the attributes:
>
> https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L1792
>
> Quoting from that function:
>
> // reset the offset to the base pointer, otherwise the bits
> // are the bits for the pointer, which may be garbage
> offset = cast(size_t)(info.base - pool.baseAddr);
> info.attr = getBits(pool, cast(size_t)(offset >> pool.shiftBy));
>
> Which should get the correct bits. I suspected there was an issue with getting the wrong bits, but this code looks correct.
>
> 3. The runtime caches the block info for thread local data for append speed. A potential issue is that the attributes are cached from a previous use for that block, but the GC (and the runtime itself) SHOULD clear that cache entry when that block is freed, avoiding this issue. A potential way to check this is to assert in a debug build of druntime that the cached block info always equals the actual block info. Are you able to build a debug version of druntime to test this? I can give you the changes you should make. This would explain the great difficulty in reproducing the issue.

I will try to build debug version of dmd compiler and check the issue.

>
> 4. If your code is multi-threaded, but using __gshared, it can make the cache incorrect. Are you doing this?
>

the app is multi-threaded via std.concurrency.

there is only one known to me place where __gshared is used: logging library (checked by searching through whole source tree). make stub for this lib and try, so identify whether cache invalidated by _gshared or not.

> But the cache is really the only possible place I can see where the bits are set incorrectly, given that you just verified the bits are correct before the append.
>
> Can you just list the version of the compiler you are using? I want to make sure this isn't an issue that has already been fixed.

the last. first of all i updated whole toolchain (dmd, dub).

$ dmd
DMD64 D Compiler v2.066.1

> -Steve

I started looking druntime and dmd source code myself before i checked the thread (thsnks for your help and feedback) and i have some questions. could you explain to me something?

i_m looking here https://github.com/D-Programming-Language/druntime/blob/v2.066.1/src/rt/lifetime.d#L591

-------
line #603
auto size = ti.next.tsize;

why `next`? it can be even null if this is last TypeInfo in the linked list.

---------

btw, i used suggested trackallocs.d and GC defenetely receives NO_SCAN

before tag: 1 len: 2 ptr: 103A78058 root: 103A77000:8192 attr: APPENDABLE
gc_qalloc(41, NO_SCAN APPENDABLE ) cc: 29106 asz: 10152603, ti: null ret: BlkInfo_(104423800, 64, 10)
after tag: 1 len: 3 ptr: 104423810 root: 104423800:64 attr: NO_SCAN APPENDABLE
December 12, 2014
On Friday, 12 December 2014 at 12:53:00 UTC, Ruslan Mullakhmetov wrote:
> On Thursday, 11 December 2014 at 18:36:59 UTC, Steven Schveighoffer wrote:
>> My analysis so far:
>>
>> 4. If your code is multi-threaded, but using __gshared, it can make the cache incorrect. Are you doing this?
>>
>
> the app is multi-threaded via std.concurrency.
>
> there is only one known to me place where __gshared is used: logging library (checked by searching through whole source tree). make stub for this lib and try, so identify whether cache invalidated by _gshared or not.
>


removing __gshared seems does not helped.
December 12, 2014
On 12/12/14 7:52 AM, Ruslan Mullakhmetov wrote:
> On Thursday, 11 December 2014 at 18:36:59 UTC, Steven Schveighoffer wrote:
>> My analysis so far:
>>
>> 2. In the array append code, the block attributes are obtained via
>> GC.query, which has this code for getting the attributes:
>>
>> https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L1792
>>
>>
>> Quoting from that function:
>>
>> // reset the offset to the base pointer, otherwise the bits
>> // are the bits for the pointer, which may be garbage
>> offset = cast(size_t)(info.base - pool.baseAddr);
>> info.attr = getBits(pool, cast(size_t)(offset >> pool.shiftBy));
>>
>> Which should get the correct bits. I suspected there was an issue with
>> getting the wrong bits, but this code looks correct.
>>
>> 3. The runtime caches the block info for thread local data for append
>> speed. A potential issue is that the attributes are cached from a
>> previous use for that block, but the GC (and the runtime itself)
>> SHOULD clear that cache entry when that block is freed, avoiding this
>> issue. A potential way to check this is to assert in a debug build of
>> druntime that the cached block info always equals the actual block
>> info. Are you able to build a debug version of druntime to test this?
>> I can give you the changes you should make. This would explain the
>> great difficulty in reproducing the issue.
>
> I will try to build debug version of dmd compiler and check the issue.

A debug version of compiler is not necessary, not even a debug version of phobos, just druntime. But it's not going to matter yet, because I need to give you the asserts to put in there. I just wanted to know if you needed help doing it.

>
>>
>> 4. If your code is multi-threaded, but using __gshared, it can make
>> the cache incorrect. Are you doing this?
>>
>
> the app is multi-threaded via std.concurrency.

This should be OK, you should not be able to share data that is not marked as shared.

> there is only one known to me place where __gshared is used: logging
> library (checked by searching through whole source tree). make stub for
> this lib and try, so identify whether cache invalidated by _gshared or not.

Here is where it might occur:

1. Due to shared data having typeinfo attached to it that it is actually shared, the runtime takes advantage of that. We can use a lock-free cache that is thread-local for anything not marked as shared, because nothing outside the thread can access that data.
2. __gshared gets around this because it is not marked as shared by the compiler. This means, if you, for instance, appended to a __gshared array, the runtime would treat it like a thread-local array. If you did this from multiple threads, the cache may be invalid in one or more of them.
3. Actual 'shared' arrays are not permitted to use the cache, so they should not have this issue.

I see that you removed the only instance of __gshared and it did not help. That at least rules that out.

>> But the cache is really the only possible place I can see where the
>> bits are set incorrectly, given that you just verified the bits are
>> correct before the append.
>>
>> Can you just list the version of the compiler you are using? I want to
>> make sure this isn't an issue that has already been fixed.
>
> the last. first of all i updated whole toolchain (dmd, dub).
>
> $ dmd
> DMD64 D Compiler v2.066.1

Thanks, this at least gives me a baseline to know what to test and debug with. I do not believe the code has had any significant fixes that would help with this issue since then.

>
> I started looking druntime and dmd source code myself before i checked
> the thread (thsnks for your help and feedback) and i have some
> questions. could you explain to me something?
>
> i_m looking here
> https://github.com/D-Programming-Language/druntime/blob/v2.066.1/src/rt/lifetime.d#L591
>
>
> -------
> line #603
> auto size = ti.next.tsize;
>
> why `next`? it can be even null if this is last TypeInfo in the linked
> list.

This is the way the compiler constructs the type info. The first TypeInfo is always TypeInfo_Array (or TypeInfo_Shared, or const or whatever), and the .next is the typeinfo for the element type. all this does is get the size of an element. Since we know we are dealing with an array, we know next is always valid.


> btw, i used suggested trackallocs.d and GC defenetely receives NO_SCAN
>
> before tag: 1 len: 2 ptr: 103A78058 root: 103A77000:8192 attr: APPENDABLE
> gc_qalloc(41, NO_SCAN APPENDABLE ) cc: 29106 asz: 10152603, ti: null
> ret: BlkInfo_(104423800, 64, 10)
> after tag: 1 len: 3 ptr: 104423810 root: 104423800:64 attr: NO_SCAN
> APPENDABLE

This is good information, thanks. I will get back to you with a druntime branch to try. Can I email you at this address? If not, email me at the address from my post to let me know your contact, no reason to work through building issues on the public forum :)

-Steve
December 12, 2014
On Friday, 12 December 2014 at 15:50:26 UTC, Steven Schveighoffer wrote:

> Can I email you at this address? If not, email me at the address from my post to let me know your contact, no reason to work through building issues on the public forum :)
>
> -Steve

reach me at theambient [] me__com