February 25, 2012
On 2/25/2012 4:08 PM, Paulo Pinto wrote:
> Am 25.02.2012 21:26, schrieb Peter Alexander:
>> On Saturday, 25 February 2012 at 20:13:42 UTC, so wrote:
>>> On Saturday, 25 February 2012 at 18:47:12 UTC, Nick Sabalausky wrote:
>>>
>>>> Interesting. I wish he'd elaborate on why it's not an option for his
>>>> daily
>>>> work.
>>>
>>> Not the design but the implementation, memory management would be the
>>> first.
>>
>> Memory management is not a problem. You can manage memory just as easily
>> in D as you can in C or C++. Just don't use global new, which they'll
>> already be doing.
>
> I couldn't agree more.
>
> The GC issue comes around often, but I personally think that the main
> issue is that the GC needs to be optimized, not that manual memory
> management is required.
>
> Most standard compiler malloc()/free() implementations are actually
> slower than most advanced GC algorithms.
>

Games do basically everything in 33.3 or 16.6 ms intervals (30 or 60 fps respectively).  20fps and lower is doable but the input gets extra-laggy very easily, and it is visually choppy.

Ideally GC needs to run in a real-time manner, say periodically every 10 or 20 seconds and taking at most 10ms.  Continuously would be better, something like 1-2ms of overhead spread out over 16 or 32 ms.  Also, a periodic GC that freezes everything needs to run at a predictable/controllable time, so you can do things like skip AI updates for that frame and keep the frame from being 48ms or worse.

These time constraints are going to limit the heap size of a GC heap to the slower of speed of memory/computation, until the GC can be made into some variety of a real-time collector.  This is less of a problem for games, because you can always allocate non-gc memory with malloc/free or store your textures and meshes exclusively in video memory as d3d/opengl resources.

The fact malloc/free and the overhead of refcounting takes longer is largely meaningless, because the cost is spread out.  If the perf of malloc/free is a problem you can always make more heaps, as the main cost is usually lock contention.

The STL containers are pretty much unusable due to how much memory they waste, how many allocations they require, and the inability to replace their allocators in any meaningful way that allows you to used fixed size block allocators.  Hashes for instance require multiple different kinds of allocations but they are forced to all go through the same allocator.  Also, the STL containers tend to allocate huge amounts of slack that is hard to get rid of.

Type traits and algorithms are about the only usable parts of the STL.

February 25, 2012
On Sat, Feb 25, 2012 at 4:29 PM, Paulo Pinto <pjmlp@progtools.org> wrote:
> Am 25.02.2012 23:17, schrieb Peter Alexander:
>
>> On Saturday, 25 February 2012 at 22:08:31 UTC, Paulo Pinto wrote:
>>>
>>> Am 25.02.2012 21:26, schrieb Peter Alexander:
>>>>
>>>> On Saturday, 25 February 2012 at 20:13:42 UTC, so wrote:
>>>>>
>>>>> On Saturday, 25 February 2012 at 18:47:12 UTC, Nick Sabalausky wrote:
>>>>>
>>>>>> Interesting. I wish he'd elaborate on why it's not an option for his
>>>>>> daily
>>>>>> work.
>>>>>
>>>>>
>>>>> Not the design but the implementation, memory management would be the first.
>>>>
>>>>
>>>> Memory management is not a problem. You can manage memory just as easily in D as you can in C or C++. Just don't use global new, which they'll already be doing.
>>>
>>>
>>> I couldn't agree more.
>>>
>>> The GC issue comes around often, but I personally think that the main issue is that the GC needs to be optimized, not that manual memory management is required.
>>>
>>> Most standard compiler malloc()/free() implementations are actually
>>> slower than most advanced GC algorithms.
>>
>>
>> If you require realtime performance then you don't use either the GC or malloc/free. You allocate blocks up front and use those when you need consistent high performance.
>>
>> It doesn't matter how optimised the GC is. The eventual collection is inevitable and if it takes anything more than a small fraction of a second then it will be too slow for realtime use.
>>
>
> There are GC realtime algorithms, which are actually in use, in systems like the French Ground Master 400 missile radar system.
>
> There is no more realtime than that. I surely would not like that such systems had a pause the world GC.
>

Can you give any description of how that is done (or any relevant
papers), and how it can be made to function reasonably on low end
consumer hardware and standard operating systems? Without that, your
example is irrelevant.
Azul has already shown that realtime non-pause GC is certainly
possible, but only with massive servers, lots of CPUs, and large
kernel modifications.

And, as far as I'm aware, that still didn't solve the generally memory-hungry behaviors of the JVM.
February 25, 2012
On 2/25/2012 2:08 PM, Paulo Pinto wrote:
> Most standard compiler malloc()/free() implementations are actually slower than
> most advanced GC algorithms.

Most straight up GC vs malloc/free benchmarks miss something crucial. A GC allows one to do substantially *fewer* allocations. It's a lot faster to not allocate than to allocate.

Consider C strings. You need to keep track of ownership of it. That often means creating extra copies, rather than sharing a single copy.

Enter C++'s shared_ptr. But that works by, for each object, allocating a *second* chunk of memory to hold the reference count. Right off the bat, you've got twice as many allocations & frees with shared_ptr than a GC would have.
February 25, 2012
Am 25.02.2012 23:40, schrieb Andrew Wiley:
> On Sat, Feb 25, 2012 at 4:29 PM, Paulo Pinto<pjmlp@progtools.org>  wrote:
>> Am 25.02.2012 23:17, schrieb Peter Alexander:
>>
>>> On Saturday, 25 February 2012 at 22:08:31 UTC, Paulo Pinto wrote:
>>>>
>>>> Am 25.02.2012 21:26, schrieb Peter Alexander:
>>>>>
>>>>> On Saturday, 25 February 2012 at 20:13:42 UTC, so wrote:
>>>>>>
>>>>>> On Saturday, 25 February 2012 at 18:47:12 UTC, Nick Sabalausky wrote:
>>>>>>
>>>>>>> Interesting. I wish he'd elaborate on why it's not an option for his
>>>>>>> daily
>>>>>>> work.
>>>>>>
>>>>>>
>>>>>> Not the design but the implementation, memory management would be the
>>>>>> first.
>>>>>
>>>>>
>>>>> Memory management is not a problem. You can manage memory just as easily
>>>>> in D as you can in C or C++. Just don't use global new, which they'll
>>>>> already be doing.
>>>>
>>>>
>>>> I couldn't agree more.
>>>>
>>>> The GC issue comes around often, but I personally think that the main
>>>> issue is that the GC needs to be optimized, not that manual memory
>>>> management is required.
>>>>
>>>> Most standard compiler malloc()/free() implementations are actually
>>>> slower than most advanced GC algorithms.
>>>
>>>
>>> If you require realtime performance then you don't use either the GC or
>>> malloc/free. You allocate blocks up front and use those when you need
>>> consistent high performance.
>>>
>>> It doesn't matter how optimised the GC is. The eventual collection is
>>> inevitable and if it takes anything more than a small fraction of a
>>> second then it will be too slow for realtime use.
>>>
>>
>> There are GC realtime algorithms, which are actually in use, in systems
>> like the French Ground Master 400 missile radar system.
>>
>> There is no more realtime than that. I surely would not like that such
>> systems had a pause the world GC.
>>
>
> Can you give any description of how that is done (or any relevant
> papers), and how it can be made to function reasonably on low end
> consumer hardware and standard operating systems? Without that, your
> example is irrelevant.
> Azul has already shown that realtime non-pause GC is certainly
> possible, but only with massive servers, lots of CPUs, and large
> kernel modifications.
>
> And, as far as I'm aware, that still didn't solve the generally
> memory-hungry behaviors of the JVM.

Sure.

http://www.militaryaerospace.com/articles/2009/03/thales-chooses-aonix-perc-virtual-machine-software-for-ballistic-missile-radar.html

http://www.atego.com/products/aonix-perc-raven/

--
Paulo
February 25, 2012
On Sat, Feb 25, 2012 at 4:08 PM, Paulo Pinto <pjmlp@progtools.org> wrote:
> Am 25.02.2012 21:26, schrieb Peter Alexander:
>
>> On Saturday, 25 February 2012 at 20:13:42 UTC, so wrote:
>>>
>>> On Saturday, 25 February 2012 at 18:47:12 UTC, Nick Sabalausky wrote:
>>>
>>>> Interesting. I wish he'd elaborate on why it's not an option for his
>>>> daily
>>>> work.
>>>
>>>
>>> Not the design but the implementation, memory management would be the first.
>>
>>
>> Memory management is not a problem. You can manage memory just as easily in D as you can in C or C++. Just don't use global new, which they'll already be doing.
>
>
> I couldn't agree more.
>
> The GC issue comes around often, but I personally think that the main issue is that the GC needs to be optimized, not that manual memory management is required.
>
> Most standard compiler malloc()/free() implementations are actually slower
> than most advanced GC algorithms.

That's not the issue here. The issue is that when your game is required to render at 60fps, you've got 16.67ms for each frame and no time for 100ms+ GC cycle. In this environment, it's mostly irrelevant that you'll spend more time total in malloc than you would have spent in the GC because you can only spare the time in small chunks, not large ones.

One simple solution is to avoid all dynamic allocation, but as a few mostly unanswered NG posts have shown, the compiler is currently implicitly generating dynamic allocations in a few places, and there's no simple way to track them down or do anything about them.
February 25, 2012
On Sat, Feb 25, 2012 at 5:01 PM, Paulo Pinto <pjmlp@progtools.org> wrote:
> Am 25.02.2012 23:40, schrieb Andrew Wiley:
>>
>> On Sat, Feb 25, 2012 at 4:29 PM, Paulo Pinto<pjmlp@progtools.org>  wrote:
>>>
>>> Am 25.02.2012 23:17, schrieb Peter Alexander:
>>>
>>>
>>>> On Saturday, 25 February 2012 at 22:08:31 UTC, Paulo Pinto wrote:
>>>>>
>>>>>
>>>>> Am 25.02.2012 21:26, schrieb Peter Alexander:
>>>>>>
>>>>>>
>>>>>> On Saturday, 25 February 2012 at 20:13:42 UTC, so wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Saturday, 25 February 2012 at 18:47:12 UTC, Nick Sabalausky wrote:
>>>>>>>
>>>>>>>> Interesting. I wish he'd elaborate on why it's not an option for his
>>>>>>>> daily
>>>>>>>> work.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Not the design but the implementation, memory management would be the first.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Memory management is not a problem. You can manage memory just as
>>>>>> easily
>>>>>> in D as you can in C or C++. Just don't use global new, which they'll
>>>>>> already be doing.
>>>>>
>>>>>
>>>>>
>>>>> I couldn't agree more.
>>>>>
>>>>> The GC issue comes around often, but I personally think that the main issue is that the GC needs to be optimized, not that manual memory management is required.
>>>>>
>>>>> Most standard compiler malloc()/free() implementations are actually
>>>>> slower than most advanced GC algorithms.
>>>>
>>>>
>>>>
>>>> If you require realtime performance then you don't use either the GC or malloc/free. You allocate blocks up front and use those when you need consistent high performance.
>>>>
>>>> It doesn't matter how optimised the GC is. The eventual collection is inevitable and if it takes anything more than a small fraction of a second then it will be too slow for realtime use.
>>>>
>>>
>>> There are GC realtime algorithms, which are actually in use, in systems like the French Ground Master 400 missile radar system.
>>>
>>> There is no more realtime than that. I surely would not like that such systems had a pause the world GC.
>>>
>>
>> Can you give any description of how that is done (or any relevant
>> papers), and how it can be made to function reasonably on low end
>> consumer hardware and standard operating systems? Without that, your
>> example is irrelevant.
>> Azul has already shown that realtime non-pause GC is certainly
>> possible, but only with massive servers, lots of CPUs, and large
>> kernel modifications.
>>
>> And, as far as I'm aware, that still didn't solve the generally memory-hungry behaviors of the JVM.
>
>
> Sure.
>
> http://www.militaryaerospace.com/articles/2009/03/thales-chooses-aonix-perc-virtual-machine-software-for-ballistic-missile-radar.html
>
> http://www.atego.com/products/aonix-perc-raven/

Neither of those links have any information on how this actually works. In fact, the docs on Atego's site pretty much state that their JVM is highly specialized and requires programmers to follow very different rules from typical Java, which makes this technology look less and less viable for general usage.

I don't see how this example is relevant for D. I can't find any details on the system you're mentioning, but assuming they developed something similar to Azul, the fundamental problem is that D has to target platforms in general use, not highly specialized server environments with modified kernels and highly parallel hardware. Until such environments come into general use (assuming they do at all; Azul seems to be having trouble getting their virtual memory manipulation techniques merged into the Linux kernel), D can't make use of them, and we're right back to saying that GCs have unacceptably long pause times for realtime applications.
February 26, 2012
On 25/02/2012 22:55, Walter Bright wrote:
> On 2/25/2012 2:08 PM, Paulo Pinto wrote:
>> Most standard compiler malloc()/free() implementations are actually
>> slower than
>> most advanced GC algorithms.
>
> Most straight up GC vs malloc/free benchmarks miss something crucial. A
> GC allows one to do substantially *fewer* allocations. It's a lot faster
> to not allocate than to allocate.
>
> Consider C strings. You need to keep track of ownership of it. That
> often means creating extra copies, rather than sharing a single copy.
>
> Enter C++'s shared_ptr. But that works by, for each object, allocating a
> *second* chunk of memory to hold the reference count. Right off the bat,
> you've got twice as many allocations & frees with shared_ptr than a GC
> would have.

http://www.boost.org/doc/libs/1_43_0/libs/smart_ptr/make_shared.html

so you don't have to have twice as many allocations.

-- 
My enormous talent is exceeded only by my outrageous laziness.
http://www.ssTk.co.uk
February 26, 2012
On 2/25/2012 4:01 PM, Simon wrote:
> On 25/02/2012 22:55, Walter Bright wrote:
>> Enter C++'s shared_ptr. But that works by, for each object, allocating a
>> *second* chunk of memory to hold the reference count. Right off the bat,
>> you've got twice as many allocations & frees with shared_ptr than a GC
>> would have.
>
> http://www.boost.org/doc/libs/1_43_0/libs/smart_ptr/make_shared.html
>
> so you don't have to have twice as many allocations.

There are many ways to do shared pointers, including one where the reference count is part of the object being allocated. But the C++11 standard share_ptr does an extra allocation.
February 26, 2012
Am 26.02.2012 00:45, schrieb Andrew Wiley:
> On Sat, Feb 25, 2012 at 5:01 PM, Paulo Pinto<pjmlp@progtools.org>  wrote:
>> Am 25.02.2012 23:40, schrieb Andrew Wiley:
>>>
>>> On Sat, Feb 25, 2012 at 4:29 PM, Paulo Pinto<pjmlp@progtools.org>    wrote:
>>>>
>>>> Am 25.02.2012 23:17, schrieb Peter Alexander:
>>>>
>>>>
>>>>> On Saturday, 25 February 2012 at 22:08:31 UTC, Paulo Pinto wrote:
>>>>>>
>>>>>>
>>>>>> Am 25.02.2012 21:26, schrieb Peter Alexander:
>>>>>>>
>>>>>>>
>>>>>>> On Saturday, 25 February 2012 at 20:13:42 UTC, so wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Saturday, 25 February 2012 at 18:47:12 UTC, Nick Sabalausky wrote:
>>>>>>>>
>>>>>>>>> Interesting. I wish he'd elaborate on why it's not an option for his
>>>>>>>>> daily
>>>>>>>>> work.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Not the design but the implementation, memory management would be the
>>>>>>>> first.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Memory management is not a problem. You can manage memory just as
>>>>>>> easily
>>>>>>> in D as you can in C or C++. Just don't use global new, which they'll
>>>>>>> already be doing.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I couldn't agree more.
>>>>>>
>>>>>> The GC issue comes around often, but I personally think that the main
>>>>>> issue is that the GC needs to be optimized, not that manual memory
>>>>>> management is required.
>>>>>>
>>>>>> Most standard compiler malloc()/free() implementations are actually
>>>>>> slower than most advanced GC algorithms.
>>>>>
>>>>>
>>>>>
>>>>> If you require realtime performance then you don't use either the GC or
>>>>> malloc/free. You allocate blocks up front and use those when you need
>>>>> consistent high performance.
>>>>>
>>>>> It doesn't matter how optimised the GC is. The eventual collection is
>>>>> inevitable and if it takes anything more than a small fraction of a
>>>>> second then it will be too slow for realtime use.
>>>>>
>>>>
>>>> There are GC realtime algorithms, which are actually in use, in systems
>>>> like the French Ground Master 400 missile radar system.
>>>>
>>>> There is no more realtime than that. I surely would not like that such
>>>> systems had a pause the world GC.
>>>>
>>>
>>> Can you give any description of how that is done (or any relevant
>>> papers), and how it can be made to function reasonably on low end
>>> consumer hardware and standard operating systems? Without that, your
>>> example is irrelevant.
>>> Azul has already shown that realtime non-pause GC is certainly
>>> possible, but only with massive servers, lots of CPUs, and large
>>> kernel modifications.
>>>
>>> And, as far as I'm aware, that still didn't solve the generally
>>> memory-hungry behaviors of the JVM.
>>
>>
>> Sure.
>>
>> http://www.militaryaerospace.com/articles/2009/03/thales-chooses-aonix-perc-virtual-machine-software-for-ballistic-missile-radar.html
>>
>> http://www.atego.com/products/aonix-perc-raven/
>
> Neither of those links have any information on how this actually
> works. In fact, the docs on Atego's site pretty much state that their
> JVM is highly specialized and requires programmers to follow very
> different rules from typical Java, which makes this technology look
> less and less viable for general usage.
>
> I don't see how this example is relevant for D. I can't find any
> details on the system you're mentioning, but assuming they developed
> something similar to Azul, the fundamental problem is that D has to
> target platforms in general use, not highly specialized server
> environments with modified kernels and highly parallel hardware. Until
> such environments come into general use (assuming they do at all; Azul
> seems to be having trouble getting their virtual memory manipulation
> techniques merged into the Linux kernel), D can't make use of them,
> and we're right back to saying that GCs have unacceptably long pause
> times for realtime applications.

In Java's case they are following the Java's specification for real time
applications.

http://java.sun.com/javase/technologies/realtime/index.jsp

I did not mention any specific algorithm, because like most companies, I
am sure Atego patents most of it.

Still a quick search in Google reveals a few papers:

http://research.microsoft.com/apps/video/dl.aspx?id=103698&amp;l=i

http://www.cs.cmu.edu/~spoons/gc/vee05.pdf

http://domino.research.ibm.com/comm/research_people.nsf/pages/bacon.presentations.html/$FILE/Bacon05BravelyTalk.ppt

http://www.cs.technion.ac.il/~erez/Papers/real-time-pldi.pdf

http://www.cs.purdue.edu/homes/lziarek/pldi10.pdf

I know GC use is a bit of a religious debate but C++ was the very last systems programming language without automatic memory management. And
even C++ has got some form in C++11.

At least in the desktop area, in a decade from now, most likely system
programming in desktop OS will either make use of reference counting (WinRT or ARC), or it will use a GC (similar to Spin, Inferno, Singularity, Oberon).

This is how I see the trend going, but hey, I am just a simple person
and I get to be wrong lots of time.

--
Paulo
February 26, 2012
On 02/25/2012 02:29 PM, Paulo Pinto wrote:

> There are GC realtime algorithms, which are actually in use, in systems
> like the French Ground Master 400 missile radar system.

I just can't resist... :) I hope they are not going to keep that software in the French Ground Master 500. ;)

Ali

[*] http://archive.eiffel.com/doc/manuals/technology/contract/ariane/

<quote>
Ariane 5 launcher crashed [due to] a reuse error. The SRI horizontal bias module was reused from a 10-year-old software, the software from Ariane 4.
</quote>