July 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #50 from Leandro Lucarella <llucax@gmail.com> 2010-07-25 17:30:20 PDT ---
(In reply to comment #48)
> It looks like findPool() might be used much more often than before? For
> example, I noticed the mixin code calls findPool() very early, so maybe it's
> being called in some situations where it's not necessary. Also, with the patch
> sizeOf() needs to call findPool(), but I don't think that should make a lot of
> difference (unless is used by the runtime very often for array manipulation).

Seeing the callgraph, 95% of the findPool() calls are being made directly from
mark(), 4.5% from fullcollectshell().

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #51 from Leandro Lucarella <llucax@gmail.com> 2010-07-25 18:10:01 PDT ---
(In reply to comment #48)
> Here are the profiling results (using perf[1]):
> Non-precise scanning: http://pastebin.com/zRKyc9mW
> Precise scanning:     http://pastebin.com/jPJZLL8p
> 
> It looks like findPool() might be used much more often than before? For
> example, I noticed the mixin code calls findPool() very early, so maybe it's
> being called in some situations where it's not necessary. Also, with the patch
> sizeOf() needs to call findPool(), but I don't think that should make a lot of
> difference (unless is used by the runtime very often for array manipulation).

OTOH, note that all 3 functions have a *lot* of more samples in the precise case, they are doing a lot more of work. It doesn't look like the code added for precise scanning should make *that much* extra work, so that made me think, what if the overhead for storing the type information was significant enough to make the program eat more memory and those functions were doing more work because there is more memory to scan and not because scanning takes longer? And it looks like that might be the problem:

Precise scanning:
Maximum resident set size, in Kilobytes = 1289984
Elapsed real time, in seconds           = 68.56

Non precise scanning:
Maximum resident set size, in Kilobytes = 950400
Elapsed real time, in seconds           = 49.74

Note this pattern:
memory: 950400/1289984 = 0.7367533
time:   49.74/68.56    = 0.7254959

Scary close :)

Maybe dil uses a lot of small objects, and/or a lot of 16/32/etc objects, that doubles the memory needed to store them because of the bitmask pointer.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #52 from Leandro Lucarella <llucax@gmail.com> 2010-07-25 18:45:10 PDT ---
OK, here is the other side of the coin, a small fabricated benchmark (actually stolen from the NG, sightly modified by me to make false pointers much more likely):

// Written by Babele Dunnit <babele.dunnit@gmail.com>
// Found at
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=54084
// Sightly modified by Leandro Lucarella <llucax@gmail.com>
// (some readability improvements and output removed)

const IT = 300;
const N1 = 20_000;
const N2 = 40_000;

class Individual
{
        Individual[20] children;
        int[100] data = void;
        this() {
                foreach (i; data)
                        data[i] += (cast(int)&children) - i;
        }
}

class Population
{

        void grow()
        {
                foreach(inout individual; individuals)
                {
                        individual = new Individual;
                }
        }

        Individual[N1] individuals;
}

version = loseMemory;

int main(char[][] args)
{

        Population testPop1 = new Population;
        Population testPop2 = new Population;

        Individual[] indi = new Individual[N1];

        for(int i = 0; i < IT; i++)
        {
                testPop1.grow();
                testPop2.grow();

                version (loseMemory){
                        indi[] = testPop1.individuals ~ testPop2.individuals;
                }

                version (everythingOk){
                        indi[0..N1] = testPop1.individuals;
                        indi[N1..N2] = testPop2.individuals;
                }
        }

        return 0;
}

Here are the results:

Precise:
Maximum resident set size, in Kilobytes = 160192
Elapsed real time, in seconds           = 31.72
(seeing the memory consumption, it was very stable all the time)

Non-precise:
Maximum resident set size, in Kilobytes = 2202400
Elapsed real time, in seconds           = 97.02
(I have to kill it because it was eating up all my memory)

So maybe precise scanning should be configurable (but when not enabled, it shouldn't store the bitmask in the memory block to avoid the extra heap space)...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3463


nfxjfg@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #695 is|0                           |1
           obsolete|                            |


--- Comment #53 from nfxjfg@gmail.com 2010-07-25 19:08:48 PDT ---
Created an attachment (id=698)
D1 - patch for dmd for creating pointer bitmasks

Some changes to make the patch behave under 64 bit mode. Under 32 bits, the compiler should produce exactly the same as the patch before. Under 64 bit, both native 64 bit mode and cross compiling to 32 bit should work. The trick is that PTRSIZE is a variable, not a #define.


@Leandro:

(to comment #52)
I'm glad that this patch actually seems to help!
What do you think, is compile time configurability enough, or should it be
possible to chose at runtime?

(to comment #51)
I'm not sure what to do about this. One could try to store the bitmap inside
the memory block with just 1-2 bytes to save space. But in most cases, size
alignment would make that useless. Also possible: use the same bitmask for each
page that is subdivided into bins. It's well possible that due to additional
fragmentation this would use much more memory than it saves, though.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #54 from Leandro Lucarella <llucax@gmail.com> 2010-07-25 19:47:26 PDT ---
(In reply to comment #53)
> Created an attachment (id=698) [details]
> D1 - patch for dmd for creating pointer bitmasks
> 
> Some changes to make the patch behave under 64 bit mode. Under 32 bits, the compiler should produce exactly the same as the patch before. Under 64 bit, both native 64 bit mode and cross compiling to 32 bit should work. The trick is that PTRSIZE is a variable, not a #define.

Nice! I really hope the DMD patch is accepted :)

> @Leandro:
> 
> (to comment #52)
> I'm glad that this patch actually seems to help!
> What do you think, is compile time configurability enough, or should it be
> possible to chose at runtime?

I think something in between is the best choice. In the GC I'm working on, I've implemented runtime configurability but at startup-time, using environment variables. I find that being a good tradeoff, since you can make assumptions that would be true for all the program's lifetime and you don't have to recompile (the runtime!) to play with GC settings. But I thinks that doesn't belong to this patch, it can be added in the future :)

For now, I think that compile-time configurability is acceptable, but it would be better to not store type information at all if precise scanning is not enabled (otherwise I think it would hurt more than not disabling it).

> (to comment #51)
> I'm not sure what to do about this. One could try to store the bitmap inside
> the memory block with just 1-2 bytes to save space. But in most cases, size
> alignment would make that useless. Also possible: use the same bitmask for each
> page that is subdivided into bins. It's well possible that due to additional
> fragmentation this would use much more memory than it saves, though.

Yes, I think it's hard to improve things, at least under the current GC design, but the important thing (for me) is the compiler patch, not the runtime, because it open the doors for serious GC research. I think the runtime can be rethought and a good design takes advantage of this extra information should be possible in the future.

Thanks a lot for the good patches!

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #55 from Sean Kelly <sean@invisibleduck.org> 2010-07-25 20:45:32 PDT ---
If/when this makes it into D2 I'm going to store the info in a parallel array kind of like the bits are.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #56 from Leandro Lucarella <llucax@gmail.com> 2010-07-26 06:40:21 PDT ---
(In reply to comment #55)
> If/when this makes it into D2 I'm going to store the info in a parallel array kind of like the bits are.

Will you store a pointer to the pointer bitmask? In that case you'll have an overhead of 4096/16 = 256, 256*4 = 1KiB (32 bits; 2KiB for 64) for each allocated page, even when a page might have all its data marked as NO_SCAN. That's 25% overhead (50% for 64 bits). Even if that overhead is non-GC'ed memory, so it won't get scanned or anything, it looks like too much. If you plan to store it only when necessary and being dependent on the bin size, it would be very hard to address, I think.

I'll try storing the bits directly for small bins (16..128 for 32 bits, ..256 for 64) first, it seems easier and it looks like it could help.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 28, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #57 from Leandro Lucarella <llucax@gmail.com> 2010-07-27 21:19:01 PDT ---
I think there is a not-so-important bug in the DMD patch, the bits.length value looks like it needs to be divided by size_t.sizeof (which is odd, since in the patch it looks like it's already done in setSize()).

See this test program:

---
extern (C) int printf(char*, ...);

struct Test
{
        int x;
        void* p;
        char[15] s;
        Test* t;
        long l;
        union U {
                int* pi;
                int  i;
        }
        U u;
}

void main()
{
        Test* t = new Test;
        printf("sizeof = %zu\n", Test.sizeof);
        auto pm = typeid(Test).pointermap.bits;
        printf("PointerMap: ptr = %p, length = %zu, T words = %zu, scan bits =
%zx, ptr bits = %zx\n",
                        pm.ptr, pm.length, pm[0], pm[1], pm[2]);
}
---

The output is:

PointerMap: ptr = 0x805ab1c, length = 12, T words = 10, scan bits = 242, ptr bits = 42

All looks nice except for the length value, which should be 3 instead of 12.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 28, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #58 from Leandro Lucarella <llucax@gmail.com> 2010-07-27 21:23:34 PDT ---
And another small comment about the Tango runtime patch, you add a binSize() function, but there is already a binsize[] array for the same purpose, you can remove the binSize(bin) function and replace it with binsize[bin].

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 28, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3463


nfxjfg@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #698 is|0                           |1
           obsolete|                            |


--- Comment #59 from nfxjfg@gmail.com 2010-07-27 22:49:51 PDT ---
Created an attachment (id=700)
D1 - patch for dmd for creating pointer bitmasks

You're right, I stupidly write the byte length, not the array length.
This patch fixes it and also eliminates two warnings about void* arithmetic.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------