April 15, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #120 from Vladimir <thecybershadow@gmail.com> 2011-04-14 19:50:08 PDT ---
(In reply to comment #118)
> > I hope it is as you say it is, but without benchmarks it's hard to say anything, and this talk of state machines etc. is disconcerting.
> 
> Why?

It "sounds" slower. This is subjective and unscientific. That's why I said only a benchmark will show the real results.

> Also, even if the compiler emits the tables necessary for more precise gc, the gc implementation can ignore them and do it the old way. Emitting the tables makes it possible for people to experiment with various kinds of gc strategies.

I would like to avoid bloating executables any further, and giving reverse engineers more data about my code. (I know executable size doesn't affect performance, at least in theory, but it does matter and can't be completely neglected either.)

> True, and it works tolerably well. To do a moving gc, however, you need more precise information.

I don't want a moving GC. I want a fast GC.

("I" in this context means D users with the same requirements, mainly video
game developers.)

I understand the advantages of a moving GC - heap compaction allowing for an overall smaller managed heap etc., but I hope you understand that sacrificing speed for these goals is not an unilateral improvement for everyone.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 15, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #121 from David Simcha <dsimcha@yahoo.com> 2011-04-14 19:59:28 PDT ---
(In reply to comment #120)
> I understand the advantages of a moving GC - heap compaction allowing for an overall smaller managed heap etc., but I hope you understand that sacrificing speed for these goals is not an unilateral improvement for everyone.

Since we don't have any hard benchmarks conveniently available, let's assume for the sake of argument that a precise/moving/etc. GC would be slower.  Your case is a niche case and calls for a niche garbage collector implementation. D's GC is designed (IIRC) to allow selecting an implementation at link time. Eventually, someone could write a decent low-latency GC optimized for small heaps for game programmers.  In the mean time, we could fork the current one. The stock GC, though, should handle the common cases, not a few game programmers' corner case.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 15, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463


Leandro Lucarella <llucax@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |llucax@gmail.com


--- Comment #122 from Leandro Lucarella <llucax@gmail.com> 2011-04-14 20:06:11 PDT ---
(In reply to comment #121)
> (In reply to comment #120)
> > I understand the advantages of a moving GC - heap compaction allowing for an overall smaller managed heap etc., but I hope you understand that sacrificing speed for these goals is not an unilateral improvement for everyone.
> 
> Since we don't have any hard benchmarks conveniently available, let's assume for the sake of argument that a precise/moving/etc. GC would be slower.  Your case is a niche case and calls for a niche garbage collector implementation. D's GC is designed (IIRC) to allow selecting an implementation at link time. Eventually, someone could write a decent low-latency GC optimized for small heaps for game programmers.  In the mean time, we could fork the current one. The stock GC, though, should handle the common cases, not a few game programmers' corner case.

You mean like this one (except it's not optimized for small heaps, just for
small pauses and works)?
http://llucax.com.ar/blog/blog/post/-1a4bdfba

PS: Yeah, for some reason I still get the e-mails even when I removed myself from te Cc =/

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 15, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #123 from Vladimir <thecybershadow@gmail.com> 2011-04-14 20:09:13 PDT ---
(In reply to comment #121)
> Your case is a niche case and calls for a niche garbage collector implementation.

I would like to ask you to reconsider that opinion. Please take into consideration the size of the video game industry, including that for consoles and mobile devices. Also, keep in mind that a good amount of hype regarding D was generated by video games written in it - starting with Kenta Cho's old OpenGL ones, as well as that commercial D strategy game I can't find at the moment.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 15, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #124 from Walter Bright <bugzilla@digitalmars.com> 2011-04-14 20:26:14 PDT ---
(In reply to comment #122)
> PS: Yeah, for some reason I still get the e-mails even when I removed myself from te Cc =/

Just when I thought I was out... they pull me back in!
   -- Michael Coreclone, The D Father Part III

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 15, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #125 from Walter Bright <bugzilla@digitalmars.com> 2011-04-14 20:33:11 PDT ---
(In reply to comment #120)
> I understand the advantages of a moving GC - heap compaction allowing for an overall smaller managed heap etc., but I hope you understand that sacrificing speed for these goals is not an unilateral improvement for everyone.

In a gc, speed really is the paramount consideration. (The problem with excessive memory consumption is speed, after all.)

Unfortunately, the speed of any gc strategy cannot be determined in advance. One has to try it, profile it, and tune it to see if it is an overall improvement.

The theoretical speed improvement of more precise gc comes from:

1. the collector doing less work because it knows where the actual pointers are rather than having to scan for them

2. not following and scanning memory that is falsely pointed to by an integer

3. reducing the working set of memory, meaning less work for scanning

The theoretical speed decrease comes from:

1. more work to read the pointer tables

2. possibly a large hit from using a virtual function

How this will play out will require actually trying it.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 15, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463


Jacob Carlborg <doob@me.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |doob@me.com


--- Comment #126 from Jacob Carlborg <doob@me.com> 2011-04-15 00:27:53 PDT ---
Why not just add an additional garbage collector with this new implementation and leave the old one as it is and then developers can choose which one to use at link time.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 15, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463


Sean Cavanaugh <WorksOnMyMachine@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |WorksOnMyMachine@gmail.com


--- Comment #127 from Sean Cavanaugh <WorksOnMyMachine@gmail.com> 2011-04-15 01:40:35 PDT ---
(In reply to comment #120)
> (In reply to comment #118)
> > True, and it works tolerably well. To do a moving gc, however, you need more precise information.
> 
> I don't want a moving GC. I want a fast GC.
> 
> ("I" in this context means D users with the same requirements, mainly video
> game developers.)
> 
> I understand the advantages of a moving GC - heap compaction allowing for an overall smaller managed heap etc., but I hope you understand that sacrificing speed for these goals is not an unilateral improvement for everyone.

I am a game developer, and this thread is fairly fascinating to me, as memory management and good support for Intel SSE2(and AVX) or PowerPC VMX are two of the biggest issues to me when considering alternative languages or the question of 'will this language be suitable in the future'.  The SSE problem seems workable with extern C'd C++ DLLs code to handle the heavy math, which leaves the GC as a big 'what does this mean' when evaluating the landscape.

The reality is a lot of game engines allocate a surprising amount of memory at run time.  The speed of malloc itself is rarely an issue as most searches take reasonably similar amount of time.  The real problems with heavy use of malloc are thread lock contention in the allocator, and fragmentation.  Fragmentation causes two problems: large allocation failures when memory is low (say 1 MB allocation when 30 MB is 'free'), and virtual pages are unable to be reclaimed due to a stray allocation or two within the page.

Lock contention is solved by making separate heaps.  Fragmentation is fought also fought by separating the heaps, but organizing the allocations coherently either time-wise or by allocation type where like sized objects pooled into a special pool for objects of that size.  As a bonus fixed size object pools have const time for allocation, except when the pool has to grow, but we try real hard to pre-size these to the worst case values.  On my last project we had about 8 dlmalloc based heaps and 15 fixed sized allocator pools, to solve these problems.

I would greatly prefer a GC to compact the heap to keep the peak memory down, because in embeded (console) environments memory is a constant but time is fungible.  VM might be available on the environments, but it isn't going to be backed by disk.  Instead the idea of the VM is that it is a tool to fight fragmentation of the underlying physical pages, and to help you get contiguous space to work with.  There is also pressure to use larger (64k, 1MB, 4MB pages) pages to keep the TLB lookups fast, which hurts even more with fragmentation. Tiny allocations holding onto these big pages prevents them from being reclaimed, which makes getting those allocations moved somewhere better pretty important.

Now the good news is a huge amount of resources in a game do not need to be allocated into a garbage collected space.  For the most part anything you send to the GPU data is far better off being written into its memory system and left alone.  Physics data and Audio data have similar behaviors for the most part and can be allocated through malloc or aligned forms of malloc (for SSE friendlies).

So from a game's developers point of I need to know when the GC will run either by configuration or by manually driving it.  Both allow me to run a frame with most of the AI and physics disabled to give more of the time to the collector. A panic execution GC pass that I wasn't expecting is acceptable, provided I get notified of it, as I would expect this to be an indicator memory is getting tight to the point an Out of Memory is imminent.  A panic GC is a QA problem as we can tell them where and how often the are occurring and they can in turn tell the designers making the art/levels that they need to start trimming the memory usage a bit.

Ideally the GC would be able to run in less time than a single frame (say 10-15ms for a 30fps game).  Taking away some amount of time every frame is also acceptable.  For example spending 1ms of every frame to do 1ms worth of data movement or analysis for compacting would be a reasonable thing to allow, even if it was in addition to the multi-millisecond spikes at some time interval (30 frames, 30 seconds whatever).  Making the whole thing friendly to having lots of CPU cores wouldn't hurt either.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 15, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463



--- Comment #128 from Vladimir <thecybershadow@gmail.com> 2011-04-15 02:14:17 PDT ---
(In reply to comment #127)

Thank you for your insight!

> So from a game's developers point of I need to know when the GC will run either by configuration or by manually driving it.

You can disable automatic garbage collection and manually invoke a collection right now.

> Both allow me to run a frame with most of the AI and physics disabled to give more of the time to the collector.

This won't work for multiplayer games where the game state must be kept in sync on all sides.

> Ideally the GC would be able to run in less time than a single frame (say
> 10-15ms for a 30fps game).

Moving GCs are bound to be slower than the current one, but heap compaction probably doesn't need to happen as often as a simple GC run to reclaim memory.

> Taking away some amount of time every frame is also acceptable.
> For example spending 1ms of every frame to do 1ms worth of data
> movement or analysis for compacting would be a reasonable thing to allow,

The current GC doesn't support incremental runs. Jeremie Pelletier has written a garbage collector some time ago which can do a shallow scan and only collect objects with no immediate references: http://pastebin.com/f7a3b4c4a

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 18, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3463


Trass3r <mrmocool@gmx.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mrmocool@gmx.de


--- Comment #128 from Vladimir Panteleev <thecybershadow@gmail.com> 2011-04-15 02:14:17 PDT ---
(In reply to comment #127)

Thank you for your insight!

> So from a game's developers point of I need to know when the GC will run either by configuration or by manually driving it.

You can disable automatic garbage collection and manually invoke a collection right now.

> Both allow me to run a frame with most of the AI and physics disabled to give more of the time to the collector.

This won't work for multiplayer games where the game state must be kept in sync on all sides.

> Ideally the GC would be able to run in less time than a single frame (say
> 10-15ms for a 30fps game).

Moving GCs are bound to be slower than the current one, but heap compaction probably doesn't need to happen as often as a simple GC run to reclaim memory.

> Taking away some amount of time every frame is also acceptable.
> For example spending 1ms of every frame to do 1ms worth of data
> movement or analysis for compacting would be a reasonable thing to allow,

The current GC doesn't support incremental runs. Jeremie Pelletier has written a garbage collector some time ago which can do a shallow scan and only collect objects with no immediate references: http://pastebin.com/f7a3b4c4a

--- Comment #129 from Trass3r <mrmocool@gmx.de> 2011-07-18 05:15:44 PDT ---
What's the status of this? Why is every patch marked obsolete?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------