Thread overview
[Issue 4650] New: Static data that must be scanned by the GC should be grouped
Aug 15, 2010
Leandro Lucarella
Aug 16, 2010
Leandro Lucarella
Aug 16, 2010
nfxjfg@gmail.com
August 15, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4650

           Summary: Static data that must be scanned by the GC should be
                    grouped
           Product: D
           Version: D1 & D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: llucax@gmail.com


--- Comment #0 from Leandro Lucarella <llucax@gmail.com> 2010-08-15 16:56:49 PDT ---
Now the GC scans all the static data of the program, since it uses the libc variables __data_start and _end to get its limits.

There is a lot of stuff in the static data that doesn't need to be scanned, most notably the TypeInfos[*], which is a great portion of the static data. C libraries static data, for example, would be scanned too, when it makes no sense to do so.

I experience a 20% increment in the total static data of a small program[1] by just adding about 5 more types to the GC implementation, which translate to a appreciable loss in performance because of the extra scanning and probably the extra false pointers.

It would be nice if the compiler could group all the static that must really be scanned (programs static variables) together and make its limits available to the GC. It would be even nicer to leave static variables that have no pointers out of that group, and even much more nicer to create a pointer map like the one in the patch from bug 3463 to allow precise heap scanning. That way the only memory in the program that would have to be scanned conservatively will be the stack.

[*] This is not entirely true, since IIRC the TypeInfo store the .init property, which can be overwritten by the user, storing a pointer to the GC heap there, but I think is a rare enough case to be considered, I think imposing that limitation would be a problem in real life programs.

[1] http://codepad.org/xGDCS3KO

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 16, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4650



--- Comment #1 from Leandro Lucarella <llucax@gmail.com> 2010-08-15 17:30:32 PDT ---
I think you can omit the [*] entirely, I think .init being writable was, fortunately, just a product of my imagination...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 16, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4650


nfxjfg@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nfxjfg@gmail.com


--- Comment #2 from nfxjfg@gmail.com 2010-08-15 18:56:24 PDT ---
Generally the GC should only scan data for which at least hasPointers() returns
true, and that isn't logically constant (e.g. TypeInfo instances, even though
they can contain pointers/references).

Maybe implementation would be simplest by adding a a pointer range to ModuleInfo, that tells the GC what exactly should be scanned. Ideally, static variables for which hasPointers() is false would not be included in this range. This should drastically reduce the amount of data needed to be scanned by the GC, because the C data segment is not included.

An extended implementation could accompany the pointer range with a precise pointer map.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------