Jump to page: 1 27  
Page
Thread overview
Either I'm confused or the gc is
Oct 21, 2020
donallen
Oct 21, 2020
H. S. Teoh
Oct 21, 2020
donallen
Oct 21, 2020
donallen
Oct 21, 2020
H. S. Teoh
Oct 21, 2020
donallen
Oct 21, 2020
Max Haughton
Oct 21, 2020
Max Haughton
Oct 21, 2020
Imperatorn
Oct 21, 2020
donallen
Oct 22, 2020
Imperatorn
Oct 21, 2020
ikod
Oct 22, 2020
rikki cattermole
Oct 22, 2020
donallen
Oct 22, 2020
rikki cattermole
Oct 22, 2020
tsbockman
Oct 22, 2020
donallen
Oct 22, 2020
jmh530
Oct 22, 2020
donallen
Oct 23, 2020
Ali Çehreli
Oct 22, 2020
user1234
Oct 22, 2020
H. S. Teoh
Oct 22, 2020
user1234
Oct 22, 2020
donallen
Oct 23, 2020
user1234
Oct 22, 2020
ketmar
Oct 22, 2020
donallen
Oct 22, 2020
donallen
Oct 22, 2020
donallen
Oct 22, 2020
donallen
Oct 22, 2020
Jerry
Oct 22, 2020
rikki cattermole
Oct 23, 2020
donallen
Oct 23, 2020
H. S. Teoh
Oct 23, 2020
donallen
Oct 23, 2020
donallen
Oct 23, 2020
matheus
Oct 23, 2020
H. S. Teoh
Oct 23, 2020
rikki cattermole
Oct 24, 2020
donallen
Oct 24, 2020
Imperatorn
Oct 24, 2020
donallen
Oct 24, 2020
Imperatorn
Oct 25, 2020
donallen
Oct 26, 2020
donallen
Oct 26, 2020
rikki cattermole
Oct 26, 2020
donallen
Nov 15, 2020
donallen
Nov 15, 2020
H. S. Teoh
Nov 15, 2020
Ali Çehreli
Nov 16, 2020
H. S. Teoh
Nov 16, 2020
Ali Çehreli
Nov 15, 2020
mw
Nov 15, 2020
rikki cattermole
Oct 24, 2020
H. S. Teoh
Oct 22, 2020
donallen
Oct 22, 2020
mw
Oct 22, 2020
mw
October 21, 2020
I'm new to D, but not to programming (I wrote my first line of code 60 years ago and am retired from a long career as a developer and project manager).

I have a suite of personal finance applications that I wrote in C. I'd like to port them to D. The financial data is stored in a sqlite database. I spent yesterday re-writing the verifier application in D and have run into a perplexing issue.

The verifier does a recursive descent of the tree of accounts, checking a bunch of things at each node. After the checks are done, a query is done to discover how many children the current account has, at which point it allocates, with 'new', a dynamic array of Account structs. It then goes into a loop, collecting information from the db about each of the children, which gets stored in one of the Account structs in the array. Finally, it does a
foreach through the array, doing a recursive call on the tree walker.

Here's the relevant code:
````
        // Now do the children of this account
        // First determine how many there are
        bind_text(count_children, 1, account.guid);
        int n_children = one_row(count_children, &get_int).integer_value;
        if (n_children > 0)
        {
            Account[] children = new Account[n_children];
            bind_text(find_children, 1, account.guid);
            int i = 0;
            while (next_row_available_p(find_children, &reset_stmt))
            {
                if (i > n_children-1)
                    panic("walk_account_tree: child index exceeds n_children");
                children[i].name = fromStringz(sqlite3_column_text(find_children, 0)).dup;
                children[i].path = format("%s:%s", account.path, children[i].name);
                children[i].guid = fromStringz(sqlite3_column_text(find_children, 1)).dup; // guid
                children[i].commodity_guid = fromStringz(sqlite3_column_text(find_children, 2)).dup; // commodity_guid
                children[i].flags = sqlite3_column_int(find_children,
                        3) | account.flags & (account_flag_descendents_are_assets
                        | account_flag_descendents_are_liabilities | account_flag_descendents_are_income
                        | account_flag_descendents_are_expenses | account_flag_descendents_are_marketable
                        | account_flag_self_and_descendents_are_tax_related
                        | account_flag_descendents_need_commodity_link); // flags
                i = i + 1;
            }
            foreach (k, ref child; children)
                walk_account_tree(&child, ancestor_flags | account.flags);
        }
````

The problem is that at some point, the verifier starts spewing bogus error messages about what it is seeing in the tree. Oddly, putting in debugging writelns results in the error messages not occurring -- a Heisenbug.  But working with gdb, I found that the account structs after the error messages start are zeroed. Turning on gc profiling tells me that 3 gcs have occurred. Disabling the gc results in the program running correctly -- no error messages (and I know the database has no errors because the C version, which has been around for awhile, confirms that the db is well formed).

I could post a PR, but I'm not sure that this is a bug. It could easily be a misunderstanding by me of D and its memory management. So I thought I'd try this post first, hoping that one of you who knows the language better than I do could point out a problem with my code. I do need to resolve this or abandon this project.

Thanks in advance for any help.
October 21, 2020
On Wed, Oct 21, 2020 at 04:24:41PM +0000, donallen via Digitalmars-d wrote:
> I'm new to D, but not to programming (I wrote my first line of code 60 years ago and am retired from a long career as a developer and project manager).

Welcome aboard! ;-)


[...]
> ````
>         // Now do the children of this account
>         // First determine how many there are
>         bind_text(count_children, 1, account.guid);
>         int n_children = one_row(count_children, &get_int).integer_value;
>         if (n_children > 0)
>         {
>             Account[] children = new Account[n_children];
>             bind_text(find_children, 1, account.guid);
>             int i = 0;
>             while (next_row_available_p(find_children, &reset_stmt))
>             {
>                 if (i > n_children-1)
>                     panic("walk_account_tree: child index exceeds
> n_children");
>                 children[i].name =
> fromStringz(sqlite3_column_text(find_children, 0)).dup;
>                 children[i].path = format("%s:%s", account.path,
> children[i].name);
>                 children[i].guid =
> fromStringz(sqlite3_column_text(find_children, 1)).dup; // guid
>                 children[i].commodity_guid =
> fromStringz(sqlite3_column_text(find_children, 2)).dup; // commodity_guid
>                 children[i].flags = sqlite3_column_int(find_children,
>                         3) | account.flags &
> (account_flag_descendents_are_assets
>                         | account_flag_descendents_are_liabilities |
> account_flag_descendents_are_income
>                         | account_flag_descendents_are_expenses |
> account_flag_descendents_are_marketable
>                         | account_flag_self_and_descendents_are_tax_related
>                         | account_flag_descendents_need_commodity_link); //
> flags
>                 i = i + 1;
>             }
>             foreach (k, ref child; children)
>                 walk_account_tree(&child, ancestor_flags | account.flags);
>         }
> ````
> 
> The problem is that at some point, the verifier starts spewing bogus error messages about what it is seeing in the tree. Oddly, putting in debugging writelns results in the error messages not occurring -- a Heisenbug.  But working with gdb, I found that the account structs after the error messages start are zeroed. Turning on gc profiling tells me that 3 gcs have occurred.  Disabling the gc results in the program running correctly -- no error messages (and I know the database has no errors because the C version, which has been around for awhile, confirms that the db is well formed).

Without seeing the rest of your code, or a minimal (runnable) failing test case, it's hard to say for certain what the problem is.  But if I were to guess, it looks like the GC is prematurely collecting your arrays, though I can't imagine why it would if you still retain references to it.

One potential gotcha, since you mention that this is being ported from C, is that if you pass pointers to GC-allocated objects to C code which stores it somewhere, you need to make sure you retain a reference somewhere on the D side of things, or else inform the GC of the reference using core.memory.GC.addRoot.[1]  Otherwise, since the GC may not be aware of where the C code stores the pointer, it may fail to find it and wrongly believe that the object is dead, and thereby collect it prematurely.

[1] See: https://dlang.org/library/core/memory/gc.add_root.html

Another potential problem is if your C code (or C-style code ported to D) obscures pointers, e.g., using the XOR trick to implement a doubly-linked list with only a single pointer field, the GC will not be able to discover the reference, and thus may wrongly mark the referenced object as dead.

Another thing that stuck out to me while glancing over your code snippet, is that the `children` array doesn't seem to be stored anywhere; this means it will go out of scope at the end of the scope and possibly be collected, unless you retain at least one reference to an array element somewhere.  I can't tell if this is important without knowing the rest of your code, but it's something to look into.  (Though it puzzles me why this would be a problem, since obviously your code still has a reference to *something* in that array in order for the verifier code to check it. But it's something to look into if there are no other clues.)


> I could post a PR, but I'm not sure that this is a bug. It could easily be a misunderstanding by me of D and its memory management. So I thought I'd try this post first, hoping that one of you who knows the language better than I do could point out a problem with my code. I do need to resolve this or abandon this project.
[...]

Have you identified what's causing the problem? I.e., what would you put in your PR?


T

-- 
He who does not appreciate the beauty of language is not worthy to bemoan its flaws.
October 21, 2020
On Wednesday, 21 October 2020 at 16:24:41 UTC, donallen wrote:
> I'm new to D, but not to programming (I wrote my first line of code 60 years ago and am retired from a long career as a developer and project manager).
>
> [...]

What's your i and n_children on panic?
October 21, 2020
On Wednesday, 21 October 2020 at 16:24:41 UTC, donallen wrote:
> I'm new to D, but not to programming (I wrote my first line of code 60 years ago and am retired from a long career as a developer and project manager).

Very cool!

> Thanks in advance for any help.

Most likely GC clears prematurely some of your data (as H.S.Teoh pointed out).
You can try to disable GC for a single program run with --DRT-gcopt="disable:1" option. Just add it to your command line args and see what happens.

Another investigation can be done using memory checker like 'valgrind'.

Regards,
October 21, 2020
On Wednesday, 21 October 2020 at 18:55:50 UTC, H. S. Teoh wrote:
> On Wed, Oct 21, 2020 at 04:24:41PM +0000, donallen via Digitalmars-d wrote:
>> [...]
>
> Welcome aboard! ;-)
>
> [...]

Essentially what I wrote in my post, if I became convinced by you folks that the problem was not mine.

>
> [...]

October 21, 2020
On Wednesday, 21 October 2020 at 19:30:40 UTC, Imperatorn wrote:
> On Wednesday, 21 October 2020 at 16:24:41 UTC, donallen wrote:
>> I'm new to D, but not to programming (I wrote my first line of code 60 years ago and am retired from a long career as a developer and project manager).
>>
>> [...]
>
> What's your i and n_children on panic?

There is no panic. The program starts complaining about the account data that it is checking, spewing bogus error messages. This is because the Account structures have been zeroed at some point in the recursive descent.

But I did set a breakpoint in gdb at the place where the error messages are issued, and i was something like 230 of an n_children of something like 350. I don't think either number is particularly significant, other than that this was an account that had a lot of children and so was likely to trigger a gc.
October 21, 2020
On Wednesday, 21 October 2020 at 18:55:50 UTC, H. S. Teoh wrote:
> On Wed, Oct 21, 2020 at 04:24:41PM +0000, donallen via Digitalmars-d wrote:
>> [...]
>
> Welcome aboard! ;-)
>
>
> [...]
>> [...]
>
> Without seeing the rest of your code, or a minimal (runnable) failing test case, it's hard to say for certain what the problem is.  But if I were to guess, it looks like the GC is prematurely collecting your arrays, though I can't imagine why it would if you still retain references to it.
>
> One potential gotcha, since you mention that this is being ported from C, is that if you pass pointers to GC-allocated objects to C code which stores it somewhere, you need to make sure you retain a reference somewhere on the D side of things, or else inform the GC of the reference using core.memory.GC.addRoot.[1]  Otherwise, since the GC may not be aware of where the C code stores the pointer, it may fail to find it and wrongly believe that the object is dead, and thereby collect it prematurely.

I am not passing pointers to GC-allocated objects to C code. The only C code involved here is libsqlite3 and there the pointers I pass to it are those I obtained from it.

>
> [1] See: https://dlang.org/library/core/memory/gc.add_root.html
>
> Another potential problem is if your C code (or C-style code ported to D) obscures pointers, e.g., using the XOR trick to implement a doubly-linked list with only a single pointer field, the GC will not be able to discover the reference, and thus may wrongly mark the referenced object as dead.

Nope.

>
> Another thing that stuck out to me while glancing over your code snippet, is that the `children` array doesn't seem to be stored anywhere; this means it will go out of scope at the end of the scope and possibly be collected, unless you retain at least one reference to an array element somewhere.  I can't tell if this is important without knowing the rest of your code, but it's something to look into.  (Though it puzzles me why this would be a problem, since obviously your code still has a reference to *something* in that array in order for the verifier code to check it. But it's something to look into if there are no other clues.)

As I understand it from the documentation, dynamic arrays are represented as fat pointers that contain the array size and a pointer to the actual data in the heap. I assume, though the documentation doesn't say, that the size/pointer structure is on the stack. If so, every one of the children arrays has a pointer to it on the stack as the descent through the tree proceeds and therefore should not be garbage collectible until a particular call to the tree walker (of which I've provided the last bit of code, which deals with descending into the child accounts) returns to its caller.

It does look to me like those arrays are being collected prematurely and incorrectly.
October 21, 2020
On Wed, Oct 21, 2020 at 09:35:43PM +0000, donallen via Digitalmars-d wrote: [...]
> As I understand it from the documentation, dynamic arrays are represented as fat pointers that contain the array size and a pointer to the actual data in the heap. I assume, though the documentation doesn't say, that the size/pointer structure is on the stack.

That is correct, since you declared the `children` array as a local variable.


> If so, every one of the children arrays has a pointer to it on the stack as the descent through the tree proceeds and therefore should not be garbage collectible until a particular call to the tree walker (of which I've provided the last bit of code, which deals with descending into the child accounts) returns to its caller.

Also correct. Barring a bug in the GC.


> It does look to me like those arrays are being collected prematurely and incorrectly.

It would appear to be so, but it's really hard to say without being able to reproduce the problem locally.  Another possibility is memory corruption caused by incorrect translation of C code to D. Generally, this should be pretty rare: Walter designed D syntax to be similar to C syntax such that if it compiles, it should work exactly as it does in C, otherwise there should be a compile error.  But there *are* corner cases where this may inadvertently happen.

There are also some gotchas with interfacing D code with C libraries (sqlite, as you mentioned), which may appear to work at first but may introduce subtle memory problems if not used correctly.

Of course, there *is* the possibility that there's a bug in the GC that causes it to collect prematurely, but IMO the chances of that are pretty low, because D arrays are one of its most-used core features, and if something so basic doesn't work correctly, we would have seen (and fixed) the problem a long time ago.  At least, it would have surfaced in some of the large D projects out there.  But I haven't heard of any such problem recently, except in connection with code that interfaces with C, where sometimes mistakes can happen and memory corruption results.

But on the off-chance there *is* a bug in the GC: we'd greatly appreciate it if you could somehow reduce the problem to a minimal test case that we can reproduce locally so that we can investigate.


T

-- 
Не дорог подарок, дорога любовь.
October 21, 2020
On Wednesday, 21 October 2020 at 22:28:48 UTC, H. S. Teoh wrote:
> On Wed, Oct 21, 2020 at 09:35:43PM +0000, donallen via Digitalmars-d wrote: [...]
>> [...]
>
> That is correct, since you declared the `children` array as a local variable.
>
>
>> [...]
>
> Also correct. Barring a bug in the GC.
>
>
>> [...]
>
> It would appear to be so, but it's really hard to say without being able to reproduce the problem locally.  Another possibility is memory corruption caused by incorrect translation of C code to D. Generally, this should be pretty rare: Walter designed D syntax to be similar to C syntax such that if it compiles, it should work exactly as it does in C, otherwise there should be a compile error.  But there *are* corner cases where this may inadvertently happen.
>
> There are also some gotchas with interfacing D code with C libraries (sqlite, as you mentioned), which may appear to work at first but may introduce subtle memory problems if not used correctly.
>
> Of course, there *is* the possibility that there's a bug in the GC that causes it to collect prematurely, but IMO the chances of that are pretty low, because D arrays are one of its most-used core features, and if something so basic doesn't work correctly, we would have seen (and fixed) the problem a long time ago.  At least, it would have surfaced in some of the large D projects out there.  But I haven't heard of any such problem recently, except in connection with code that interfaces with C, where sometimes mistakes can happen and memory corruption results.
>
> But on the off-chance there *is* a bug in the GC: we'd greatly appreciate it if you could somehow reduce the problem to a minimal test case that we can reproduce locally so that we can investigate.

I will try.

My suspicion that there's a gc problem largely rests on the fact that when I run the program with the gc disabled, the problem goes away -- it behaves as it should.

Do you know if there's any way to configure the gc so it tells you what it's doing? It would be great if it could be told to announce when it's running and what it's collecting. I did a little looking in the documentation yesterday for that and did not find anything. If you can't help in that area, I'll have a look at the gc source code.

>
>
> T

October 21, 2020
On Wednesday, 21 October 2020 at 18:55:50 UTC, H. S. Teoh wrote:
> On Wed, Oct 21, 2020 at 04:24:41PM +0000, donallen via Digitalmars-d wrote:
>> [...]
>
> Welcome aboard! ;-)
>
> [...]

I have some D bolted into unreal engine, to avoid upsetting the beast I malloc and copy every GC'd (not performance code so the GC is fine) buffer into the games allocator.
« First   ‹ Prev
1 2 3 4 5 6 7