June 09, 2006
Walter Bright wrote:
> Jeff Parsons wrote:
>> My _only_ significant beef with D so far as been the unbound time for a garbage collection; therefore, if an incremental garbage collector built into D could guarantee me a set maximum time would be taken for each garbage collection (which I would manually induce as often as possible) then unless the efficiency hit is TERRIBLE I would be all over it.
> 
> If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable.

It seems that a fairly common desire is to use D for writing games. Games don't typically have hard real-time requirements, but it also doesn't look good if your screen freezes for a second or two in the middle of play.

This is an area where an incremental collector or some kind of reference counting solution would be handy, since (especially in large programs) it would be difficult to avoid using garbage collected code.  I realize that the programmer has the responsibility to manage memory intelligently, but it would be nice to not avoid using GC code entirely.  As others have stated, it would even be worth giving up some overall performance to gain a smooth playing experience.
June 09, 2006
David Gileadi wrote:
> Walter Bright wrote:
>> Jeff Parsons wrote:
>>> My _only_ significant beef with D so far as been the unbound time for a garbage collection; therefore, if an incremental garbage collector built into D could guarantee me a set maximum time would be taken for each garbage collection (which I would manually induce as often as possible) then unless the efficiency hit is TERRIBLE I would be all over it.
>>
>> If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable.
> 
> It seems that a fairly common desire is to use D for writing games. Games don't typically have hard real-time requirements, but it also doesn't look good if your screen freezes for a second or two in the middle of play.
> 
> This is an area where an incremental collector or some kind of reference counting solution would be handy, since (especially in large programs) it would be difficult to avoid using garbage collected code.  I realize that the programmer has the responsibility to manage memory intelligently, but it would be nice to not avoid using GC code entirely.  As others have stated, it would even be worth giving up some overall performance to gain a smooth playing experience.


Didn't Walter point out that that, in the current gc implementation, you just have to be conscientious about when you call 'new' (which appears to initiate a collect) if you want to avoid the gc initiated pauses.

So in game programming, I assume it's just a matter of good organization and planning.  You make sure you allocate sufficient memory ahead of time and avoid using 'new' in any location that would cause an ugly pause in the game.  Furthermore, you might choose appropriate times for when you want to call a full collect manually.

As I see it, the obligation for careful design doesn't disappear when a gc is in place.  It certainly makes life easier in many ways.  But one still has to be aware of the gc shortcomings and adapt to the situation.  I still see it as a tremendous improvement over having to clean up after every memory allocation you make (C/C++).

In short, using a gc doesn't mean you don't have to do your homework. :)

-JJR
June 09, 2006
John Reimer wrote:
> David Gileadi wrote:
>> Walter Bright wrote:
>>> Jeff Parsons wrote:
>>>> My _only_ significant beef with D so far as been the unbound time for a garbage collection; therefore, if an incremental garbage collector built into D could guarantee me a set maximum time would be taken for each garbage collection (which I would manually induce as often as possible) then unless the efficiency hit is TERRIBLE I would be all over it.
>>>
>>> If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable.
>>
>> It seems that a fairly common desire is to use D for writing games. Games don't typically have hard real-time requirements, but it also doesn't look good if your screen freezes for a second or two in the middle of play.
>>
>> This is an area where an incremental collector or some kind of reference counting solution would be handy, since (especially in large programs) it would be difficult to avoid using garbage collected code.  I realize that the programmer has the responsibility to manage memory intelligently, but it would be nice to not avoid using GC code entirely.  As others have stated, it would even be worth giving up some overall performance to gain a smooth playing experience.
> 
> 
> Didn't Walter point out that that, in the current gc implementation, you just have to be conscientious about when you call 'new' (which appears to initiate a collect) if you want to avoid the gc initiated pauses.
> 

Yea, in the current GC, allocation is the only thing that may implicitly initiate a collection.

> So in game programming, I assume it's just a matter of good organization and planning.  You make sure you allocate sufficient memory ahead of time and avoid using 'new' in any location that would cause an ugly pause in the game.  Furthermore, you might choose appropriate times for when you want to call a full collect manually.
> 

And D's easy array slicing along with array operations (like re-initializing an array with 'arr[] = 0;') makes the code to use pre-allocated buffers a lot easier to write too.

> As I see it, the obligation for careful design doesn't disappear when a gc is in place.  It certainly makes life easier in many ways.  But one still has to be aware of the gc shortcomings and adapt to the situation.  I still see it as a tremendous improvement over having to clean up after every memory allocation you make (C/C++).
> 
> In short, using a gc doesn't mean you don't have to do your homework. :)
> 
> -JJR
June 10, 2006
Sean Kelly wrote:
> Bruno Medeiros wrote:
>> Walter Bright wrote:
>>>> Only if the reference is stored,
>>>> the ref counter has to be incremented. This is only possible if this is
>>>> done by the compiler. No smart pointer can do this :)
>>>>
>>>>> 3) in a multithreaded app, the incs and decs must be synchronized
>>>>
>>>> Isn't a atomic inc/dec enough?
>>>
>>> That requires synchronizing.
>>>
>>
>> Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
>> "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?
> 
> These are synchronizing operations.
> 
> 
> Sean

By "That requires synchronizing." I thought Walter meant stuff with mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware synchronizing operations be called "atomic inc/dec" ?


-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
June 10, 2006
Bruno Medeiros wrote:
> Sean Kelly wrote:
>> Bruno Medeiros wrote:
>>> Walter Bright wrote:
>>>>> Only if the reference is stored,
>>>>> the ref counter has to be incremented. This is only possible if this is
>>>>> done by the compiler. No smart pointer can do this :)
>>>>>
>>>>>> 3) in a multithreaded app, the incs and decs must be synchronized
>>>>>
>>>>> Isn't a atomic inc/dec enough?
>>>>
>>>> That requires synchronizing.
>>>>
>>>
>>> Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
>>> "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?
>>
>> These are synchronizing operations.
> 
> By "That requires synchronizing." I thought Walter meant stuff with mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware synchronizing operations be called "atomic inc/dec" ?

For the most part.  There are really two issues here.  First, the operation must be atomic with respect to other threads.  The x86 actually guarantees this for all load/store operations on aligned data <= the bus size.  Second, the operations must be observed in the proper order by other threads--ie. the compiler and all involved CPUs (reader and writer) are not allowed to reorder the operations or do any other fancy tricks that may result in a load/store order that isn't the same as program order.  For the compiler, this is where volatile comes in, and at the hardware level, a synchronizing operation may be required depending on the situation and on the hardware involved.  Again, with x86 you can get away without a LOCK a lot of the time, but not always. So you really must assume that any competing operation has a worst case cost of... well, a lot.


Sean
June 12, 2006
David Gileadi wrote:
> 
> It seems that a fairly common desire is to use D for writing games. Games don't typically have hard real-time requirements, but it also doesn't look good if your screen freezes for a second or two in the middle of play.
> 
> This is an area where an incremental collector or some kind of reference counting solution would be handy, since (especially in large programs) it would be difficult to avoid using garbage collected code.  I realize that the programmer has the responsibility to manage memory intelligently, but it would be nice to not avoid using GC code entirely.  As others have stated, it would even be worth giving up some overall performance to gain a smooth playing experience.

This is what I'm screaming.

It is becoming popular nowadays to allow people to modify games in various ways, one of which is modding the game's code.  I don't think you will see much of this happening inside a 3d engine, but the game logic that sits next to the 3d engine and dictates gameplay may be open.  This allows people to change things like how the AI behaves, how much damage you take when hit with a certain weapon, how that certain weapon behaves, how an economy behaves, etc.

Modders may or may not be experienced programmers, so it's probably best to assume they don't have significant experience.  This has caused plenty of games to use scripting languages for modding, since C plus plus is tough to learn and not good for this stuff.  I think D could do it though.  It would kick butt if D could do this, because it would make the mods run much faster than the interpreted code that is probably going to be used nowadays, even if you have like 50% reference counting overhead.

Automatic memory management is a huge boon to that sort of ease of code writing that is expected for inexperienced coders to get in on the action and do constructive stuff.  Even without the modding argument, being able to use automatic memory management in some parts of the game would be very advantageous in development time.  It seems to me though, that if you want to do non-pausing games in D, you just use C style memory management.  Bummer.

As David Gileadi said, games don't have strict latency requirements. For games, garbage collection doesn't need to be deterministic, and it doesn't need to finish in under 100 microseconds, it just needs to not be noticable.  For a few years I've messed with RunUO which is, afaik, written entirely in C#, and I never noticed the pauses (though the worldsaves were annoying!).  This is doable.

Also, think of all the new D programmers there would be if someone were to release a popular game that exposed D as a scripting language :)
June 12, 2006
Sean Kelly wrote:
> Bruno Medeiros wrote:
>> Sean Kelly wrote:
>>> Bruno Medeiros wrote:
>>>> Walter Bright wrote:
>>>>>> Only if the reference is stored,
>>>>>> the ref counter has to be incremented. This is only possible if this is
>>>>>> done by the compiler. No smart pointer can do this :)
>>>>>>
>>>>>>> 3) in a multithreaded app, the incs and decs must be synchronized
>>>>>>
>>>>>> Isn't a atomic inc/dec enough?
>>>>>
>>>>> That requires synchronizing.
>>>>>
>>>>
>>>> Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
>>>> "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?
>>>
>>> These are synchronizing operations.
>>
>> By "That requires synchronizing." I thought Walter meant stuff with mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware synchronizing operations be called "atomic inc/dec" ?
> 
> For the most part.  There are really two issues here.  First, the operation must be atomic with respect to other threads.  The x86 actually guarantees this for all load/store operations on aligned data <= the bus size.  Second, the operations must be observed in the proper order by other threads--ie. the compiler and all involved CPUs (reader and writer) are not allowed to reorder the operations or do any other fancy tricks that may result in a load/store order that isn't the same as program order.  For the compiler, this is where volatile comes in, and at the hardware level, a synchronizing operation may be required depending on the situation and on the hardware involved.  Again, with x86 you can get away without a LOCK a lot of the time, but not always. So you really must assume that any competing operation has a worst case cost of... well, a lot.
> 
> 
> Sean

Ah, I see. I had recently read about relative out-of-order execution problems in the Memory Barrier wikipedia entry, and I got the impression (out of nowhere) that the hardware took care of that transparently (i.e., without program intervention), but then, that is not the case, not allways at least?

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
June 14, 2006
Bruno Medeiros wrote:
> 
> Ah, I see. I had recently read about relative out-of-order execution problems in the Memory Barrier wikipedia entry, and I got the impression (out of nowhere) that the hardware took care of that transparently (i.e., without program intervention), but then, that is not the case, not allways at least?

A single CPU is allowed to do whatever it wants so long as it can fool the user into thinking it's executing instructions in a purely sequential manner.  However, the only information other CPUs in the system have is what they observe on the system bus.  Obviously, so long as CPUs aren't sharing data there aren't any problems.  But things get sticky when this isn't the case.  The memory model defines observable behavior allowed for a given architecture, as well as any methods offered for affecting that behavior.

Say a specific architecture can operate much more quickly if it is allowed to perform writes to memory (from cache) in order of ascending address rather than in the order they were issued in code.  There's no way to lie to other CPUs about the order in which writes occur and still have the optimization have any effect, so the designers state in the memory model spec that this architecture is allowed to reorder writes and then typically provide some means for overriding this behavior should it prove necessary.

Now let's suppose you have two threads doing something like this:

  thread/CPU 1:

    A = 1;
    B = 2;

  thread/CPU 2:

    if( A == 1 )
    {
      assert( B == 2 );
    }

Given the order in which a and b were declared, and therefore the order in which the writes occur, this assert may or may not fail.

Enter memory barriers.  Memory barriers are simply a way for the programmer to tell the CPU "I don't care if your way is faster, A simply must be written to memory before B or thread 2 will explode."  So the CPU behind thread 1 does as it's told at great expense to performance and thread 2 doesn't melt down.

The sticky part is that hardware designers don't agree with one another on how things should work and they never take the advice of the software people, so all architectures have different sets of observable behavior and different methods for working around it when necessary.  However, the common concept is that memory barriers all constrain the order in which memory accesses may occur with respect to each other.  Think of it as an assertion that "X may not occur before Y" or "X may not occur after Y" at the instruction level.

The x86 is actually a bit weird in this regard as it has no formal memory barriers for normal operations (though it has the FENCE instructions for SSE use).  I think this is largely for historical reasons--x86 PCs couldn't do SMP at all until fairly recently so none of this mattered, and the memory model has always been fairly strict (it was actually sequential until not terribly long ago).  Also, the LOCK instruction acts as a heavy-handed sort of memory barrier as well, so there has been little motivation to add new instructions for finer-grained control.


Sean
June 14, 2006

Sean Kelly wrote:
> Bruno Medeiros wrote:
>>
>> Ah, I see. I had recently read about relative out-of-order execution problems in the Memory Barrier wikipedia entry, and I got the impression (out of nowhere) that the hardware took care of that transparently (i.e., without program intervention), but then, that is not the case, not allways at least?
> 
> A single CPU is allowed to do whatever it wants so long as it can fool the user into thinking it's executing instructions in a purely sequential manner.  However, the only information other CPUs in the system have is what they observe on the system bus.  Obviously, so long as CPUs aren't sharing data there aren't any problems.  But things get sticky when this isn't the case.  The memory model defines observable behavior allowed for a given architecture, as well as any methods offered for affecting that behavior.
> 
> Say a specific architecture can operate much more quickly if it is allowed to perform writes to memory (from cache) in order of ascending address rather than in the order they were issued in code.  There's no way to lie to other CPUs about the order in which writes occur and still have the optimization have any effect, so the designers state in the memory model spec that this architecture is allowed to reorder writes and then typically provide some means for overriding this behavior should it prove necessary.
> 
> Now let's suppose you have two threads doing something like this:
> 
>   thread/CPU 1:
> 
>     A = 1;
>     B = 2;
> 
>   thread/CPU 2:
> 
>     if( A == 1 )
>     {
>       assert( B == 2 );
>     }
> 
> Given the order in which a and b were declared, and therefore the order in which the writes occur, this assert may or may not fail.
> 
> Enter memory barriers.  Memory barriers are simply a way for the programmer to tell the CPU "I don't care if your way is faster, A simply must be written to memory before B or thread 2 will explode."  So the CPU behind thread 1 does as it's told at great expense to performance and thread 2 doesn't melt down.
> 
> The sticky part is that hardware designers don't agree with one another on how things should work and they never take the advice of the software people, so all architectures have different sets of observable behavior and different methods for working around it when necessary.  However, the common concept is that memory barriers all constrain the order in which memory accesses may occur with respect to each other.  Think of it as an assertion that "X may not occur before Y" or "X may not occur after Y" at the instruction level.
> 
> The x86 is actually a bit weird in this regard as it has no formal memory barriers for normal operations (though it has the FENCE instructions for SSE use).  I think this is largely for historical reasons--x86 PCs couldn't do SMP at all until fairly recently so none of this mattered, and the memory model has always been fairly strict (it was actually sequential until not terribly long ago).  Also, the LOCK instruction acts as a heavy-handed sort of memory barrier as well, so there has been little motivation to add new instructions for finer-grained control.
> 
> 
> Sean

Cool; learn something new every day.  Thanks for the informative post.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
June 14, 2006
Sean Kelly wrote:
> Bruno Medeiros wrote:
>>
>> Ah, I see. I had recently read about relative out-of-order execution problems in the Memory Barrier wikipedia entry, and I got the impression (out of nowhere) that the hardware took care of that transparently (i.e., without program intervention), but then, that is not the case, not allways at least?
> 
> A single CPU is allowed to do whatever it wants so long as it can fool the user into thinking it's executing instructions in a purely sequential manner.  However, the only information other CPUs in the system have is what they observe on the system bus.  Obviously, so long as CPUs aren't sharing data there aren't any problems.  But things get sticky when this isn't the case.  The memory model defines observable behavior allowed for a given architecture, as well as any methods offered for affecting that behavior.
> 
> Say a specific architecture can operate much more quickly if it is allowed to perform writes to memory (from cache) in order of ascending address rather than in the order they were issued in code.  There's no way to lie to other CPUs about the order in which writes occur and still have the optimization have any effect, so the designers state in the memory model spec that this architecture is allowed to reorder writes and then typically provide some means for overriding this behavior should it prove necessary.
> 
> Now let's suppose you have two threads doing something like this:
> 
>   thread/CPU 1:
> 
>     A = 1;
>     B = 2;
> 
>   thread/CPU 2:
> 
>     if( A == 1 )
>     {
>       assert( B == 2 );
>     }
> 
> Given the order in which a and b were declared, and therefore the order in which the writes occur, this assert may or may not fail.
> 
> Enter memory barriers.  Memory barriers are simply a way for the programmer to tell the CPU "I don't care if your way is faster, A simply must be written to memory before B or thread 2 will explode."  So the CPU behind thread 1 does as it's told at great expense to performance and thread 2 doesn't melt down.
> 
> The sticky part is that hardware designers don't agree with one another on how things should work and they never take the advice of the software people, so all architectures have different sets of observable behavior and different methods for working around it when necessary.  However, the common concept is that memory barriers all constrain the order in which memory accesses may occur with respect to each other.  Think of it as an assertion that "X may not occur before Y" or "X may not occur after Y" at the instruction level.
> 
> The x86 is actually a bit weird in this regard as it has no formal memory barriers for normal operations (though it has the FENCE instructions for SSE use).  I think this is largely for historical reasons--x86 PCs couldn't do SMP at all until fairly recently so none of this mattered, and the memory model has always been fairly strict (it was actually sequential until not terribly long ago).  Also, the LOCK instruction acts as a heavy-handed sort of memory barrier as well, so there has been little motivation to add new instructions for finer-grained control.
> 
> 
> Sean

Nice post!
Makes me think, how does one keep up with this? I mean, one who isn't (nor wishes to be) a hardware expert, but wants to keep up with the general developments in this area, thus maintaining an overview of it.

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D