View mode: basic / threaded / horizontal-split · Log in · Help
June 09, 2006
Re: GC, the simple solution
Walter Bright wrote:
> Jeff Parsons wrote:
>> My _only_ significant beef with D so far as been the unbound time for 
>> a garbage collection; therefore, if an incremental garbage collector 
>> built into D could guarantee me a set maximum time would be taken for 
>> each garbage collection (which I would manually induce as often as 
>> possible) then unless the efficiency hit is TERRIBLE I would be all 
>> over it.
> 
> If you're looking for a *guarantee*, then you cannot even use malloc(). 
> Malloc has unbounded time, and is not predictable.

It seems that a fairly common desire is to use D for writing games. 
Games don't typically have hard real-time requirements, but it also 
doesn't look good if your screen freezes for a second or two in the 
middle of play.

This is an area where an incremental collector or some kind of reference 
counting solution would be handy, since (especially in large programs) 
it would be difficult to avoid using garbage collected code.  I realize 
that the programmer has the responsibility to manage memory 
intelligently, but it would be nice to not avoid using GC code entirely. 
 As others have stated, it would even be worth giving up some overall 
performance to gain a smooth playing experience.
June 09, 2006
Re: GC, the simple solution
David Gileadi wrote:
> Walter Bright wrote:
>> Jeff Parsons wrote:
>>> My _only_ significant beef with D so far as been the unbound time for 
>>> a garbage collection; therefore, if an incremental garbage collector 
>>> built into D could guarantee me a set maximum time would be taken for 
>>> each garbage collection (which I would manually induce as often as 
>>> possible) then unless the efficiency hit is TERRIBLE I would be all 
>>> over it.
>>
>> If you're looking for a *guarantee*, then you cannot even use 
>> malloc(). Malloc has unbounded time, and is not predictable.
> 
> It seems that a fairly common desire is to use D for writing games. 
> Games don't typically have hard real-time requirements, but it also 
> doesn't look good if your screen freezes for a second or two in the 
> middle of play.
> 
> This is an area where an incremental collector or some kind of reference 
> counting solution would be handy, since (especially in large programs) 
> it would be difficult to avoid using garbage collected code.  I realize 
> that the programmer has the responsibility to manage memory 
> intelligently, but it would be nice to not avoid using GC code entirely. 
>  As others have stated, it would even be worth giving up some overall 
> performance to gain a smooth playing experience.


Didn't Walter point out that that, in the current gc implementation, you 
just have to be conscientious about when you call 'new' (which appears 
to initiate a collect) if you want to avoid the gc initiated pauses.

So in game programming, I assume it's just a matter of good organization 
and planning.  You make sure you allocate sufficient memory ahead of 
time and avoid using 'new' in any location that would cause an ugly 
pause in the game.  Furthermore, you might choose appropriate times for 
when you want to call a full collect manually.

As I see it, the obligation for careful design doesn't disappear when a 
gc is in place.  It certainly makes life easier in many ways.  But one 
still has to be aware of the gc shortcomings and adapt to the situation. 
 I still see it as a tremendous improvement over having to clean up 
after every memory allocation you make (C/C++).

In short, using a gc doesn't mean you don't have to do your homework. :)

-JJR
June 09, 2006
Re: GC, the simple solution
John Reimer wrote:
> David Gileadi wrote:
>> Walter Bright wrote:
>>> Jeff Parsons wrote:
>>>> My _only_ significant beef with D so far as been the unbound time 
>>>> for a garbage collection; therefore, if an incremental garbage 
>>>> collector built into D could guarantee me a set maximum time would 
>>>> be taken for each garbage collection (which I would manually induce 
>>>> as often as possible) then unless the efficiency hit is TERRIBLE I 
>>>> would be all over it.
>>>
>>> If you're looking for a *guarantee*, then you cannot even use 
>>> malloc(). Malloc has unbounded time, and is not predictable.
>>
>> It seems that a fairly common desire is to use D for writing games. 
>> Games don't typically have hard real-time requirements, but it also 
>> doesn't look good if your screen freezes for a second or two in the 
>> middle of play.
>>
>> This is an area where an incremental collector or some kind of 
>> reference counting solution would be handy, since (especially in large 
>> programs) it would be difficult to avoid using garbage collected 
>> code.  I realize that the programmer has the responsibility to manage 
>> memory intelligently, but it would be nice to not avoid using GC code 
>> entirely.  As others have stated, it would even be worth giving up 
>> some overall performance to gain a smooth playing experience.
> 
> 
> Didn't Walter point out that that, in the current gc implementation, you 
> just have to be conscientious about when you call 'new' (which appears 
> to initiate a collect) if you want to avoid the gc initiated pauses.
> 

Yea, in the current GC, allocation is the only thing that may implicitly 
initiate a collection.

> So in game programming, I assume it's just a matter of good organization 
> and planning.  You make sure you allocate sufficient memory ahead of 
> time and avoid using 'new' in any location that would cause an ugly 
> pause in the game.  Furthermore, you might choose appropriate times for 
> when you want to call a full collect manually.
> 

And D's easy array slicing along with array operations (like 
re-initializing an array with 'arr[] = 0;') makes the code to use 
pre-allocated buffers a lot easier to write too.

> As I see it, the obligation for careful design doesn't disappear when a 
> gc is in place.  It certainly makes life easier in many ways.  But one 
> still has to be aware of the gc shortcomings and adapt to the situation. 
>  I still see it as a tremendous improvement over having to clean up 
> after every memory allocation you make (C/C++).
> 
> In short, using a gc doesn't mean you don't have to do your homework. :)
> 
> -JJR
June 10, 2006
Re: GC, the simple solution
Sean Kelly wrote:
> Bruno Medeiros wrote:
>> Walter Bright wrote:
>>>> Only if the reference is stored,
>>>> the ref counter has to be incremented. This is only possible if this is
>>>> done by the compiler. No smart pointer can do this :)
>>>>
>>>>> 3) in a multithreaded app, the incs and decs must be synchronized
>>>>
>>>> Isn't a atomic inc/dec enough?
>>>
>>> That requires synchronizing.
>>>
>>
>> Are you sure? It says in 
>> http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
>> "In a multi-threaded system, if threads are preemptively scheduled, 
>> reference counting requires additional care to ensure that the updates 
>> to reference counts are atomic. This is straightforward using hardware 
>> primitives such as fetch-and-add, compare-and-swap, or 
>> load-linked/store-conditional, but it is definitely necessary to pay 
>> attention to this detail." ?
> 
> These are synchronizing operations.
> 
> 
> Sean

By "That requires synchronizing." I thought Walter meant stuff with 
mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware 
synchronizing operations be called "atomic inc/dec" ?


-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
June 10, 2006
Re: GC, the simple solution
Bruno Medeiros wrote:
> Sean Kelly wrote:
>> Bruno Medeiros wrote:
>>> Walter Bright wrote:
>>>>> Only if the reference is stored,
>>>>> the ref counter has to be incremented. This is only possible if 
>>>>> this is
>>>>> done by the compiler. No smart pointer can do this :)
>>>>>
>>>>>> 3) in a multithreaded app, the incs and decs must be synchronized
>>>>>
>>>>> Isn't a atomic inc/dec enough?
>>>>
>>>> That requires synchronizing.
>>>>
>>>
>>> Are you sure? It says in 
>>> http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
>>> "In a multi-threaded system, if threads are preemptively scheduled, 
>>> reference counting requires additional care to ensure that the 
>>> updates to reference counts are atomic. This is straightforward using 
>>> hardware primitives such as fetch-and-add, compare-and-swap, or 
>>> load-linked/store-conditional, but it is definitely necessary to pay 
>>> attention to this detail." ?
>>
>> These are synchronizing operations.
> 
> By "That requires synchronizing." I thought Walter meant stuff with 
> mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware 
> synchronizing operations be called "atomic inc/dec" ?

For the most part.  There are really two issues here.  First, the 
operation must be atomic with respect to other threads.  The x86 
actually guarantees this for all load/store operations on aligned data 
<= the bus size.  Second, the operations must be observed in the proper 
order by other threads--ie. the compiler and all involved CPUs (reader 
and writer) are not allowed to reorder the operations or do any other 
fancy tricks that may result in a load/store order that isn't the same 
as program order.  For the compiler, this is where volatile comes in, 
and at the hardware level, a synchronizing operation may be required 
depending on the situation and on the hardware involved.  Again, with 
x86 you can get away without a LOCK a lot of the time, but not always. 
So you really must assume that any competing operation has a worst case 
cost of... well, a lot.


Sean
June 12, 2006
Re: GC, the simple solution
David Gileadi wrote:
> 
> It seems that a fairly common desire is to use D for writing games. 
> Games don't typically have hard real-time requirements, but it also 
> doesn't look good if your screen freezes for a second or two in the 
> middle of play.
> 
> This is an area where an incremental collector or some kind of reference 
> counting solution would be handy, since (especially in large programs) 
> it would be difficult to avoid using garbage collected code.  I realize 
> that the programmer has the responsibility to manage memory 
> intelligently, but it would be nice to not avoid using GC code entirely. 
>  As others have stated, it would even be worth giving up some overall 
> performance to gain a smooth playing experience.

This is what I'm screaming.

It is becoming popular nowadays to allow people to modify games in 
various ways, one of which is modding the game's code.  I don't think 
you will see much of this happening inside a 3d engine, but the game 
logic that sits next to the 3d engine and dictates gameplay may be open. 
 This allows people to change things like how the AI behaves, how much 
damage you take when hit with a certain weapon, how that certain weapon 
behaves, how an economy behaves, etc.

Modders may or may not be experienced programmers, so it's probably best 
to assume they don't have significant experience.  This has caused 
plenty of games to use scripting languages for modding, since C plus 
plus is tough to learn and not good for this stuff.  I think D could do 
it though.  It would kick butt if D could do this, because it would make 
the mods run much faster than the interpreted code that is probably 
going to be used nowadays, even if you have like 50% reference counting 
overhead.

Automatic memory management is a huge boon to that sort of ease of code 
writing that is expected for inexperienced coders to get in on the 
action and do constructive stuff.  Even without the modding argument, 
being able to use automatic memory management in some parts of the game 
would be very advantageous in development time.  It seems to me though, 
that if you want to do non-pausing games in D, you just use C style 
memory management.  Bummer.

As David Gileadi said, games don't have strict latency requirements. 
For games, garbage collection doesn't need to be deterministic, and it 
doesn't need to finish in under 100 microseconds, it just needs to not 
be noticable.  For a few years I've messed with RunUO which is, afaik, 
written entirely in C#, and I never noticed the pauses (though the 
worldsaves were annoying!).  This is doable.

Also, think of all the new D programmers there would be if someone were 
to release a popular game that exposed D as a scripting language :)
June 12, 2006
Re: GC, the simple solution
Sean Kelly wrote:
> Bruno Medeiros wrote:
>> Sean Kelly wrote:
>>> Bruno Medeiros wrote:
>>>> Walter Bright wrote:
>>>>>> Only if the reference is stored,
>>>>>> the ref counter has to be incremented. This is only possible if 
>>>>>> this is
>>>>>> done by the compiler. No smart pointer can do this :)
>>>>>>
>>>>>>> 3) in a multithreaded app, the incs and decs must be synchronized
>>>>>>
>>>>>> Isn't a atomic inc/dec enough?
>>>>>
>>>>> That requires synchronizing.
>>>>>
>>>>
>>>> Are you sure? It says in 
>>>> http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting 
>>>> that:
>>>> "In a multi-threaded system, if threads are preemptively scheduled, 
>>>> reference counting requires additional care to ensure that the 
>>>> updates to reference counts are atomic. This is straightforward 
>>>> using hardware primitives such as fetch-and-add, compare-and-swap, 
>>>> or load-linked/store-conditional, but it is definitely necessary to 
>>>> pay attention to this detail." ?
>>>
>>> These are synchronizing operations.
>>
>> By "That requires synchronizing." I thought Walter meant stuff with 
>> mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware 
>> synchronizing operations be called "atomic inc/dec" ?
> 
> For the most part.  There are really two issues here.  First, the 
> operation must be atomic with respect to other threads.  The x86 
> actually guarantees this for all load/store operations on aligned data 
> <= the bus size.  Second, the operations must be observed in the proper 
> order by other threads--ie. the compiler and all involved CPUs (reader 
> and writer) are not allowed to reorder the operations or do any other 
> fancy tricks that may result in a load/store order that isn't the same 
> as program order.  For the compiler, this is where volatile comes in, 
> and at the hardware level, a synchronizing operation may be required 
> depending on the situation and on the hardware involved.  Again, with 
> x86 you can get away without a LOCK a lot of the time, but not always. 
> So you really must assume that any competing operation has a worst case 
> cost of... well, a lot.
> 
> 
> Sean

Ah, I see. I had recently read about relative out-of-order execution 
problems in the Memory Barrier wikipedia entry, and I got the impression 
(out of nowhere) that the hardware took care of that transparently 
(i.e., without program intervention), but then, that is not the case, 
not allways at least?

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
June 14, 2006
Re: GC, the simple solution
Bruno Medeiros wrote:
> 
> Ah, I see. I had recently read about relative out-of-order execution 
> problems in the Memory Barrier wikipedia entry, and I got the impression 
> (out of nowhere) that the hardware took care of that transparently 
> (i.e., without program intervention), but then, that is not the case, 
> not allways at least?

A single CPU is allowed to do whatever it wants so long as it can fool 
the user into thinking it's executing instructions in a purely 
sequential manner.  However, the only information other CPUs in the 
system have is what they observe on the system bus.  Obviously, so long 
as CPUs aren't sharing data there aren't any problems.  But things get 
sticky when this isn't the case.  The memory model defines observable 
behavior allowed for a given architecture, as well as any methods 
offered for affecting that behavior.

Say a specific architecture can operate much more quickly if it is 
allowed to perform writes to memory (from cache) in order of ascending 
address rather than in the order they were issued in code.  There's no 
way to lie to other CPUs about the order in which writes occur and still 
have the optimization have any effect, so the designers state in the 
memory model spec that this architecture is allowed to reorder writes 
and then typically provide some means for overriding this behavior 
should it prove necessary.

Now let's suppose you have two threads doing something like this:

  thread/CPU 1:

    A = 1;
    B = 2;

  thread/CPU 2:

    if( A == 1 )
    {
      assert( B == 2 );
    }

Given the order in which a and b were declared, and therefore the order 
in which the writes occur, this assert may or may not fail.

Enter memory barriers.  Memory barriers are simply a way for the 
programmer to tell the CPU "I don't care if your way is faster, A simply 
must be written to memory before B or thread 2 will explode."  So the 
CPU behind thread 1 does as it's told at great expense to performance 
and thread 2 doesn't melt down.

The sticky part is that hardware designers don't agree with one another 
on how things should work and they never take the advice of the software 
people, so all architectures have different sets of observable behavior 
and different methods for working around it when necessary.  However, 
the common concept is that memory barriers all constrain the order in 
which memory accesses may occur with respect to each other.  Think of it 
as an assertion that "X may not occur before Y" or "X may not occur 
after Y" at the instruction level.

The x86 is actually a bit weird in this regard as it has no formal 
memory barriers for normal operations (though it has the FENCE 
instructions for SSE use).  I think this is largely for historical 
reasons--x86 PCs couldn't do SMP at all until fairly recently so none of 
this mattered, and the memory model has always been fairly strict (it 
was actually sequential until not terribly long ago).  Also, the LOCK 
instruction acts as a heavy-handed sort of memory barrier as well, so 
there has been little motivation to add new instructions for 
finer-grained control.


Sean
June 14, 2006
Re: GC, the simple solution
Sean Kelly wrote:
> Bruno Medeiros wrote:
>>
>> Ah, I see. I had recently read about relative out-of-order execution
>> problems in the Memory Barrier wikipedia entry, and I got the
>> impression (out of nowhere) that the hardware took care of that
>> transparently (i.e., without program intervention), but then, that is
>> not the case, not allways at least?
> 
> A single CPU is allowed to do whatever it wants so long as it can fool
> the user into thinking it's executing instructions in a purely
> sequential manner.  However, the only information other CPUs in the
> system have is what they observe on the system bus.  Obviously, so long
> as CPUs aren't sharing data there aren't any problems.  But things get
> sticky when this isn't the case.  The memory model defines observable
> behavior allowed for a given architecture, as well as any methods
> offered for affecting that behavior.
> 
> Say a specific architecture can operate much more quickly if it is
> allowed to perform writes to memory (from cache) in order of ascending
> address rather than in the order they were issued in code.  There's no
> way to lie to other CPUs about the order in which writes occur and still
> have the optimization have any effect, so the designers state in the
> memory model spec that this architecture is allowed to reorder writes
> and then typically provide some means for overriding this behavior
> should it prove necessary.
> 
> Now let's suppose you have two threads doing something like this:
> 
>   thread/CPU 1:
> 
>     A = 1;
>     B = 2;
> 
>   thread/CPU 2:
> 
>     if( A == 1 )
>     {
>       assert( B == 2 );
>     }
> 
> Given the order in which a and b were declared, and therefore the order
> in which the writes occur, this assert may or may not fail.
> 
> Enter memory barriers.  Memory barriers are simply a way for the
> programmer to tell the CPU "I don't care if your way is faster, A simply
> must be written to memory before B or thread 2 will explode."  So the
> CPU behind thread 1 does as it's told at great expense to performance
> and thread 2 doesn't melt down.
> 
> The sticky part is that hardware designers don't agree with one another
> on how things should work and they never take the advice of the software
> people, so all architectures have different sets of observable behavior
> and different methods for working around it when necessary.  However,
> the common concept is that memory barriers all constrain the order in
> which memory accesses may occur with respect to each other.  Think of it
> as an assertion that "X may not occur before Y" or "X may not occur
> after Y" at the instruction level.
> 
> The x86 is actually a bit weird in this regard as it has no formal
> memory barriers for normal operations (though it has the FENCE
> instructions for SSE use).  I think this is largely for historical
> reasons--x86 PCs couldn't do SMP at all until fairly recently so none of
> this mattered, and the memory model has always been fairly strict (it
> was actually sequential until not terribly long ago).  Also, the LOCK
> instruction acts as a heavy-handed sort of memory barrier as well, so
> there has been little motivation to add new instructions for
> finer-grained control.
> 
> 
> Sean

Cool; learn something new every day.  Thanks for the informative post.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
June 14, 2006
Re: GC, the simple solution
Sean Kelly wrote:
> Bruno Medeiros wrote:
>>
>> Ah, I see. I had recently read about relative out-of-order execution 
>> problems in the Memory Barrier wikipedia entry, and I got the 
>> impression (out of nowhere) that the hardware took care of that 
>> transparently (i.e., without program intervention), but then, that is 
>> not the case, not allways at least?
> 
> A single CPU is allowed to do whatever it wants so long as it can fool 
> the user into thinking it's executing instructions in a purely 
> sequential manner.  However, the only information other CPUs in the 
> system have is what they observe on the system bus.  Obviously, so long 
> as CPUs aren't sharing data there aren't any problems.  But things get 
> sticky when this isn't the case.  The memory model defines observable 
> behavior allowed for a given architecture, as well as any methods 
> offered for affecting that behavior.
> 
> Say a specific architecture can operate much more quickly if it is 
> allowed to perform writes to memory (from cache) in order of ascending 
> address rather than in the order they were issued in code.  There's no 
> way to lie to other CPUs about the order in which writes occur and still 
> have the optimization have any effect, so the designers state in the 
> memory model spec that this architecture is allowed to reorder writes 
> and then typically provide some means for overriding this behavior 
> should it prove necessary.
> 
> Now let's suppose you have two threads doing something like this:
> 
>   thread/CPU 1:
> 
>     A = 1;
>     B = 2;
> 
>   thread/CPU 2:
> 
>     if( A == 1 )
>     {
>       assert( B == 2 );
>     }
> 
> Given the order in which a and b were declared, and therefore the order 
> in which the writes occur, this assert may or may not fail.
> 
> Enter memory barriers.  Memory barriers are simply a way for the 
> programmer to tell the CPU "I don't care if your way is faster, A simply 
> must be written to memory before B or thread 2 will explode."  So the 
> CPU behind thread 1 does as it's told at great expense to performance 
> and thread 2 doesn't melt down.
> 
> The sticky part is that hardware designers don't agree with one another 
> on how things should work and they never take the advice of the software 
> people, so all architectures have different sets of observable behavior 
> and different methods for working around it when necessary.  However, 
> the common concept is that memory barriers all constrain the order in 
> which memory accesses may occur with respect to each other.  Think of it 
> as an assertion that "X may not occur before Y" or "X may not occur 
> after Y" at the instruction level.
> 
> The x86 is actually a bit weird in this regard as it has no formal 
> memory barriers for normal operations (though it has the FENCE 
> instructions for SSE use).  I think this is largely for historical 
> reasons--x86 PCs couldn't do SMP at all until fairly recently so none of 
> this mattered, and the memory model has always been fairly strict (it 
> was actually sequential until not terribly long ago).  Also, the LOCK 
> instruction acts as a heavy-handed sort of memory barrier as well, so 
> there has been little motivation to add new instructions for 
> finer-grained control.
> 
> 
> Sean

Nice post!
Makes me think, how does one keep up with this? I mean, one who isn't 
(nor wishes to be) a hardware expert, but wants to keep up with the 
general developments in this area, thus maintaining an overview of it.

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
1 2 3 4 5 6 7
Top | Discussion index | About this forum | D home