April 07, 2012
It shouldn't be hard to retrofit the Boehm collector into D if anyone cares to make a comparison.

On Apr 6, 2012, at 11:36 AM, bearophile <bearophileHUGS@lycos.com> wrote:

> Andrei Alexandrescu:
> 
>> A few more samples of people's perception of the two languages:
>> 
>> http://news.ycombinator.com/item?id=3805302
> 
> Some of the comments in that ycombinator thread are a bit unnerving.
> 
> There is a similar thread on Reddit too: http://www.reddit.com/r/programming/comments/rvwj0/go_severe_memory_problems_on_32bit_systems/
> 
> Two quotes from the Reddit page:
> 
>> The Boehm GC attempts to mitigate this by detecting false references to free blocks and blacklisting before they become references to live blocks, restricting their use to low-impact situations.
> Also Boehm supports typed allocation(gc_typed.h/GC_malloc_explicitly_typed) where you tell GC that pointers are located only at specific offsets and everything other should be ignored.<
> 
>> tag data as no-pointers and allocate in separate section. The garbage collector can avoid scanning this section, with reduces collection time as well as the number of false positives.<
> 
> Bye,
> bearophile
April 07, 2012
In the Reddit thread they have also linked this paper, "Precise Garbage Collection for C", by Jon Rafkind, Adam Wick,  John Regehr and Matthew Flatt:

http://www.cs.utah.edu/~regehr/papers/ismm15-rafkind.pdf

It contains some ideas (and it seems my idea of a standard optioanl onGC() method for unions/structs/classes is not so bad).

Bye,
bearophile
April 07, 2012
On 07.04.2012 2:08, Rainer Schuetze wrote:
>
>
> On 4/6/2012 8:01 PM, Walter Bright wrote:
>> On 4/6/2012 10:37 AM, Rainer Schuetze wrote:
>>> I hope there is something wrong with my reasoning, and that you could
>>> give me
>>> some hints to avoid the memory bloat and the application stalls.
>>
>> A couple of things you can try (they are workarounds, not solutions):
>>
>> 1. Actively delete memory you no longer need, rather than relying on the
>> gc to catch it. Yes, this is as unsafe as using C's free().
>
> Actually, having to deal with lifetime issues myself takes away the
> biggest plus of the GC, so I am a bit reluctant to do this.
>

How about this:
http://blog.thecybershadow.net/2010/07/15/data-d-unmanaged-memory-wrapper-for-d/

Or you can wrap-up something similar along the same lines.

-- 
Dmitry Olshansky
April 07, 2012
On 2012-04-06 19:37, Rainer Schuetze wrote:
>
> GC issues like this are currently blocking development of Visual D (a
> Win32 project): when just adding spaces to a file, parsing the new file
> every other second often needs 10 or more parsings until an equal amount
> of memory is collected compared to the allocated memory. AFAICT Visual D
> just keeps a reference to the root of the most recent AST of a source file.
>
> What's even worse: when the allocated memory gets larger (say > 200MB),
> the garbage collection itself takes more than a second stalling the
> application, which is a real pain if it happens while you are typing
> source text (it does happen quite often).

Can you pause the GC when the user is typing? When you're finished with the processing you can start it again.

-- 
/Jacob Carlborg
April 07, 2012

On 4/7/2012 12:44 AM, Manu wrote:
> On 7 April 2012 01:08, Rainer Schuetze <r.sagitario@gmx.de
> <mailto:r.sagitario@gmx.de>> wrote:
>
>     I don't think there are many places in the code where these hints
>     might apply. Are there known ways of hunting down false references?
>
>     Still, my main concern are the slow collections that stall the
>     application when a decent amount of memory is used. Removing false
>     pointers won't change that, just make it happen a little later.
>
>
> An obvious best-practise is to allocate in fewer-larger blocks. Ie, more
> compounds and aggregates where possible.
> I also expect you are doing a lot of string processing. Using D strings
> directly? I presume you have a stack-string class? Put as many working
> strings on the stack as possible...

There isn't a lot of string processing involved: tokens take a slice on the original text, and nodes of the AST seldomly save more than the identifier which just the same slice. So the full text always remains in memory, but this is only small part of the actual footprint, the AST is a lot bigger.

The nodes have child and parent references, so you keep the whole AST once there is a false pointer to any node. I could try breaking up this dependencies when I think the AST is no longer used, but that brings me back to manual memory management and thread synchronization (parsing uses std.parallelism).
April 07, 2012

On 4/7/2012 8:24 AM, Dmitry Olshansky wrote:
> On 07.04.2012 2:08, Rainer Schuetze wrote:
>>
>>
>> On 4/6/2012 8:01 PM, Walter Bright wrote:
>>> On 4/6/2012 10:37 AM, Rainer Schuetze wrote:
>>>> I hope there is something wrong with my reasoning, and that you could
>>>> give me
>>>> some hints to avoid the memory bloat and the application stalls.
>>>
>>> A couple of things you can try (they are workarounds, not solutions):
>>>
>>> 1. Actively delete memory you no longer need, rather than relying on the
>>> gc to catch it. Yes, this is as unsafe as using C's free().
>>
>> Actually, having to deal with lifetime issues myself takes away the
>> biggest plus of the GC, so I am a bit reluctant to do this.
>>
>
> How about this:
> http://blog.thecybershadow.net/2010/07/15/data-d-unmanaged-memory-wrapper-for-d/
>
>
> Or you can wrap-up something similar along the same lines.
>

Thanks for your and other's hints on reducing garbage collected memory, but I find it hard to isolate larger blocks of memory for manual management. Most of the structs and classes are rather small.

I'm rather unhappy to sell D with the hint "Go back to manual memory management if you need more than 64MB of memory and want your application to be responsive."
April 07, 2012
On 7 April 2012 17:03, Jacob Carlborg <doob@me.com> wrote:

> On 2012-04-06 19:37, Rainer Schuetze wrote:
>
>>
>> GC issues like this are currently blocking development of Visual D (a Win32 project): when just adding spaces to a file, parsing the new file every other second often needs 10 or more parsings until an equal amount of memory is collected compared to the allocated memory. AFAICT Visual D just keeps a reference to the root of the most recent AST of a source file.
>>
>> What's even worse: when the allocated memory gets larger (say > 200MB), the garbage collection itself takes more than a second stalling the application, which is a real pain if it happens while you are typing source text (it does happen quite often).
>>
>
> Can you pause the GC when the user is typing? When you're finished with the processing you can start it again.


There's a bit of a problem there though, when you're coding, when are you
NOT typing? :)
I don't ever stop and sit there patiently for a few seconds for no reason.


April 07, 2012
On 07.04.2012 18:43, Rainer Schuetze wrote:
>
>
> On 4/7/2012 8:24 AM, Dmitry Olshansky wrote:
>> On 07.04.2012 2:08, Rainer Schuetze wrote:
>>>
>>>
>>> On 4/6/2012 8:01 PM, Walter Bright wrote:
>>>> On 4/6/2012 10:37 AM, Rainer Schuetze wrote:
>>>>> I hope there is something wrong with my reasoning, and that you could
>>>>> give me
>>>>> some hints to avoid the memory bloat and the application stalls.
>>>>
>>>> A couple of things you can try (they are workarounds, not solutions):
>>>>
>>>> 1. Actively delete memory you no longer need, rather than relying on
>>>> the
>>>> gc to catch it. Yes, this is as unsafe as using C's free().
>>>
>>> Actually, having to deal with lifetime issues myself takes away the
>>> biggest plus of the GC, so I am a bit reluctant to do this.
>>>
>>
>> How about this:
>> http://blog.thecybershadow.net/2010/07/15/data-d-unmanaged-memory-wrapper-for-d/
>>
>>
>>
>> Or you can wrap-up something similar along the same lines.
>>
>
> Thanks for your and other's hints on reducing garbage collected memory,
> but I find it hard to isolate larger blocks of memory for manual
> management. Most of the structs and classes are rather small.
>

Then clearly you need a custom allocator/allocation scheme. Most likely a mark/release or a free list kind of thing. Like say TempAlloc by David. As standard allocator design is still in motion you'd have to do your own thing ATM.

Parsers and lexers are notable examples where doing custom allocation pays off nicely.

> I'm rather unhappy to sell D with the hint "Go back to manual memory
> management if you need more than 64MB of memory and want your
> application to be responsive."

I totally understand this sentiment, and unless GC improves by an order of magnitude it is not going to work well with large to medium-scale apps. Particularly long running ones, I once had been running VisualD for about 16 hours straight (back in the days of GSOC 2011) ;)

-- 
Dmitry Olshansky
April 07, 2012

On 4/6/2012 6:20 PM, deadalnix wrote:
> Le 06/04/2012 18:07, Andrei Alexandrescu a écrit :
>> A few more samples of people's perception of the two languages:
>>
>> http://news.ycombinator.com/item?id=3805302
>>
>>
>> Andrei
>
> I did some measurement on that point for D lately :
> http://www.deadalnix.me/2012/03/05/impact-of-64bits-vs-32bits-when-using-non-precise-gc/
>

I studied the GC a bit more and noticed a possible issue:

- memory allocations are aligned up to a power of 2 <= page size
- the memory area beyond the actually requested size is left untouched when allocating
- when the memory is collected, it is also untouched
- the marking of references during collection does not know the requested size, so it scans the full memory block

Result: When a collected memory block is reused by a smaller allocation, there might still be false pointers in the unused area.

When I clear this data, my first impression is that it has improved the situation, but not enough. I'll have to create some non-interactive test to verify.

Rainer
April 07, 2012
On 7 April 2012 19:04, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:

> On 07.04.2012 18:43, Rainer Schuetze wrote:
>
>>
>>
>> On 4/7/2012 8:24 AM, Dmitry Olshansky wrote:
>>
>>> On 07.04.2012 2:08, Rainer Schuetze wrote:
>>>
>>>>
>>>>
>>>> On 4/6/2012 8:01 PM, Walter Bright wrote:
>>>>
>>>>> On 4/6/2012 10:37 AM, Rainer Schuetze wrote:
>>>>>
>>>>>> I hope there is something wrong with my reasoning, and that you could
>>>>>> give me
>>>>>> some hints to avoid the memory bloat and the application stalls.
>>>>>>
>>>>>
>>>>> A couple of things you can try (they are workarounds, not solutions):
>>>>>
>>>>> 1. Actively delete memory you no longer need, rather than relying on
>>>>> the
>>>>> gc to catch it. Yes, this is as unsafe as using C's free().
>>>>>
>>>>
>>>> Actually, having to deal with lifetime issues myself takes away the biggest plus of the GC, so I am a bit reluctant to do this.
>>>>
>>>>
>>> How about this:
>>> http://blog.thecybershadow.**net/2010/07/15/data-d-**
>>> unmanaged-memory-wrapper-for-**d/<http://blog.thecybershadow.net/2010/07/15/data-d-unmanaged-memory-wrapper-for-d/>
>>>
>>>
>>>
>>> Or you can wrap-up something similar along the same lines.
>>>
>>>
>> Thanks for your and other's hints on reducing garbage collected memory, but I find it hard to isolate larger blocks of memory for manual management. Most of the structs and classes are rather small.
>>
>>
> Then clearly you need a custom allocator/allocation scheme. Most likely a mark/release or a free list kind of thing. Like say TempAlloc by David. As standard allocator design is still in motion you'd have to do your own thing ATM.
>
> Parsers and lexers are notable examples where doing custom allocation pays off nicely.
>
>
>  I'm rather unhappy to sell D with the hint "Go back to manual memory
>> management if you need more than 64MB of memory and want your application to be responsive."
>>
>
> I totally understand this sentiment, and unless GC improves by an order of magnitude it is not going to work well with large to medium-scale apps. Particularly long running ones, I once had been running VisualD for about 16 hours straight (back in the days of GSOC 2011) ;)


Yeeesss.. I run VisualD for days at a time, and it just leaks memory until
my computer chokes and crashes.
It hovers between 1gb and 2gb usage under 'normal' usage for me, on a
relatively small project (only 20-ish files).
I am now in the habit if killing and restarting it regularly, but that's
clearly not a good sign for real-world D apps...