April 14, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #90 from bearophile_hugs@eml.cc 2011-04-14 03:58:52 PDT ---
(In reply to comment #89)
> (In reply to comment #88)
> 
> In order to support C compatibility, untagged unions must be supported by the type system and the GC.

Right, but this doesn't touch much what I have said.
Many programs that use unions contain some mean at runtime to tell what's
inside the union. This information is what onGC() uses.
Where the onGC() union method is not present, or the union contents are
otherwise unknown, the GC may fall back to being not precise and conservative.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 14, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463


Steven Schveighoffer <schveiguy@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy@yahoo.com


--- Comment #91 from Steven Schveighoffer <schveiguy@yahoo.com> 2011-04-14 05:16:41 PDT ---
(In reply to comment #87)
> The idea, as I understand it, is to supply a bit mask of where the pointers are. For me, the difficulties are:
> 
> 1. distinguishing real pointers from might-be-a-pointer (such as you might get from union { int a; void* p; }).

This is only a problem for a moving GC.  In practice, unions are not very common, so the conservative scanning for unions should be very small compared to the other cases.

> 
> 2. large static arrays, large structs/classes, structs with large static array members, etc.
> 
> The amount of static data dedicated to such bit arrays could get very large.

Yes and no.  Consider right now (although I think David fixed this), we
allocate a bit for every 16 bytes of a page, even if the whole page is a block.
 Even for that, the ratio is very small (1 bit per 16 bytes == 1/128).

I think in reality, the only common case for potentially large static data is arrays.  I think the code needs some way of saying "here is an array of X elements with this bitmask".  I think that should cover most cases.

> I see two solutions:
> 
> 1. if it is case (1) or (2), give up for that type and revert to the current
> method of scanning that object

This is exactly the opposite of what you want -- large arrays and structs are the biggest problem for the conservative GC.  Whether this translates to static arrays, I'm not sure.

> 2. devise a 'state machine' instead that the gc executes for a type. The state machine has instructions like "advance n bytes to the next pointer" and "the next pointer is ambiguous" and "execute the following sequence n times."

This could be a viable solution, but I think all we need would be one descriptor type:

offset x has bitmask Y repeated N times.

where the offset is optional, and describes the offset from the previous bitmask.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 14, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #92 from David Simcha <dsimcha@yahoo.com> 2011-04-14 06:04:04 PDT ---
(In reply to comment #91)
> Yes and no.  Consider right now (although I think David fixed this), we allocate a bit for every 16 bytes of a page, even if the whole page is a block.

Yes, I did fix this.  Now it's every 16 bytes for small (<1 page) allocations
and every page for large (>= 1 page) allocations.

> > 2. devise a 'state machine' instead that the gc executes for a type. The state machine has instructions like "advance n bytes to the next pointer" and "the next pointer is ambiguous" and "execute the following sequence n times."
> 
> This could be a viable solution, but I think all we need would be one descriptor type:
> 
> offset x has bitmask Y repeated N times.
> 
> where the offset is optional, and describes the offset from the previous bitmask.

This is roughly what I did in my initial patches.  The comments at the top of the file describe it in more detail.  See http://d.puremagic.com/issues/attachment.cgi?id=488&action=edit

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 14, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #93 from Leandro Lucarella <llucax@gmail.com> 2011-04-14 06:21:29 PDT ---
You can take a look at my concurrent D GC (CDGC), which is also precise. It is
based on the work done by nfxjfg@gmail.com (which is based on the work done by
David Simcha).

Here is the very modest (and not so useful) project page:
http://www.llucax.com.ar/proj/dgc/

Remember Druntime use to have a branch with CDGC merged: http://www.dsource.org/projects/druntime/browser/branches/CDGC

And here is a mini HOWTO to try CDGC (in Tango), but now that is merged, it's outdated if you use Tango trunk (in which case you just have to call bob with -g=cdgc), but in that post you can find the compiler patches too (which I think are the exact same nfxjfg@gmail.com did): http://llucax.com.ar/blog/blog/post/-4d1fefb4

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 14, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #94 from Walter Bright <bugzilla@digitalmars.com> 2011-04-14 11:54:00 PDT ---
(In reply to comment #92)
> This is roughly what I did in my initial patches.  The comments at the top of the file describe it in more detail.  See http://d.puremagic.com/issues/attachment.cgi?id=488&action=edit

I think that covers things, except for handling ambiguous pointers. But I think it still consumes a lot of static storage. I suggest a state machine, which will consume roughly 1 byte per pointer in the type.

The state machine is a list of instructions, one byte each:

00 end of state machine
01 pointer at this offset, advance offset by size_t
02 advance offset by size_t, pointer here, advance by size_t
03 advance offset by 2*size_t, pointer here, advance by size_t
..
0F advance offset by 0F*size_t, pointer here, advance by size_t
10 advance offset by the following two bytes
11 ambiguous pointer at this offset, advance offset by size_t
12 execute loop nn times, where nn is the next byte. loop starts after nn
13 same as 12, but nn is the value in the next 2 bytes
14 end of loop (loops can nest)
15 the following size_t bytes are a pointer to another state machine
to call like a function

Or something like that. I think the state machine would be very compact, and would be fast to execute. A function can be added to TypeInfo to return the state machine for that type. The compiler would generate the data for the state machine.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 14, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #95 from Steven Schveighoffer <schveiguy@yahoo.com> 2011-04-14 12:25:48 PDT ---
(In reply to comment #94)
> 
> I think that covers things, except for handling ambiguous pointers.

Can you explain why we care about ambiguous pointers?  That is, shouldn't we just always consider that an ambiguous pointer is a pointer?  Why do we need a separate designation?

> But I think
> it still consumes a lot of static storage. I suggest a state machine, which
> will consume roughly 1 byte per pointer in the type.
> 

I think a state machine would work.  I especially like the ability to call another state machine program, that helps tremendously where types include other types, and inheritance.

To optimize the state machine a bit, it might be good to include a "compression" feature where if you have just a dense bunch of pointer/value types, you give the state machine a bitmask.

probably the worst case is an array of array references (which alternate
between pointer and length).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 14, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #96 from Leandro Lucarella <llucax@gmail.com> 2011-04-14 12:32:49 PDT ---
(In reply to comment #95)
> (In reply to comment #94)
> > 
> > I think that covers things, except for handling ambiguous pointers.
> 
> Can you explain why we care about ambiguous pointers?  That is, shouldn't we just always consider that an ambiguous pointer is a pointer?  Why do we need a separate designation?

To allow moving collectors (you can't update a pointer that's not really a
pointer).

Anyway, this problem is taking care of in the implementation by nfxjfg, which Walter seem to keep ignoring, as he always did with this bug report and all the useful insight it have. I'm not saying the solution is the best, but it's completely ignored for no (publicly know reason).

I'll unsubscribe now from this bug report, as it really makes me sick =)

If you want to contact me, your know where to find me...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 14, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #97 from Andrei Alexandrescu <andrei@metalanguage.com> 2011-04-14 13:00:36 PDT ---
(In reply to comment #96)
> (In reply to comment #95)
> > (In reply to comment #94)
> > > 
> > > I think that covers things, except for handling ambiguous pointers.
> > 
> > Can you explain why we care about ambiguous pointers?  That is, shouldn't we just always consider that an ambiguous pointer is a pointer?  Why do we need a separate designation?
> 
> To allow moving collectors (you can't update a pointer that's not really a
> pointer).
> 
> Anyway, this problem is taking care of in the implementation by nfxjfg, which Walter seem to keep ignoring, as he always did with this bug report and all the useful insight it have. I'm not saying the solution is the best, but it's completely ignored for no (publicly know reason).
> 
> I'll unsubscribe now from this bug report, as it really makes me sick =)
> 
> If you want to contact me, your know where to find me...

That's a fairly random thing to say, particularly now that this bug report is again attracting interest. Walter has expressed a concern about the proposed implementation, which implies he's definitely not ignoring it. If you wanted to release some pent-up frustration accumulated from the past, fine. Otherwise, the reaction is rather irrational. It would be in everybody's interest (and it would definitely make you feel better) if you continued adding value to this discussion and code.

Generally, after having accumulated more information on conservative vs. precise collectors, I think D must aim at being as precise as possible - no ifs and buts. So this report and patch must be taken to completion. I'll get back with some more comments.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 14, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #98 from Andrei Alexandrescu <andrei@metalanguage.com> 2011-04-14 13:45:23 PDT ---
Though I understand its attractiveness, I oppose the state machine approach because it is hopelessly specific. After taking to completion this nontrivial task we'll have (a) one DSL to worry about, (b) a significant piece of infrastructure that only does one thing.

The ideal approach would be to improve introspection. At the end of the day each type (including array types) has a specific topology, which should be accessible to the garbage collector and not only. It would be great if the garbage collector would get the TypeInfo for each chunk of memory and use its introspection information for navigating pointers.

The work on improving introspection should be done anyway.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 14, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #99 from Walter Bright <bugzilla@digitalmars.com> 2011-04-14 13:50:21 PDT ---
(In reply to comment #98)
> The work on improving introspection should be done anyway.

The trouble with using runtime introspection for this is it'll be slow, and the scanning of data needs to be really fast.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------