GC question (page 2)

On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote: > - Automatic but conservative. Can leak at any time. All GCs are prone to leak, including precise ones. The point of garbage collection is not to prevent leaks, but rather to prevent use-after-free bugs. Granted, the D 32 bit GC is more prone to leak than most others (including D 64 bit), but this isn't as horrible as you're believing, it does it's *main* job pretty well just at the cost of higher memory consumption, which we can often afford. And if you can't, manual management of large arrays tends to be relatively simple anyway. For example, my png.d used to leak something nasty in 32 bit because it used GC-allocated large temporary buffers while decompressing images. But, since they were temporary buffers, it was really easy to just `scope(exit) free(buffer);` after allocating to let them be freed at the end of the function. Then the memory consumption cut in half.

> All GCs are prone to leak, including precise ones. The point of garbage collection is not to prevent leaks, but rather to prevent use-after-free bugs. Of course I can have leaks in a GC environment, but having non-deterministic leaks is another thing, and I'd rather make sure to delete my references to let GC do its thing than to pray and hope some random number on my stack won't be in the range of my heap. I don't agree that the point is just preventing use-after-free, which can be guaranteed statically even in a non-GC language (see e.g. Rust).

On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote: > - Automatic but conservative. Can leak at any time. You have to implement manual management (managed heaps) to avoid leaks. Leaks are hard to find as any heap value may be causing it. By "managed heap" I just meant the GC heap, the one used by "new" operator. Besides it, there are already other allocators and container libraries available, don't need to implement this stuff manually. > the worst of both worlds. It may look so from a distance. But in my experience it's not that bad. In most software I did in D it did not matter really (it's either 64-bit or short lived programs) and the control D gives to choose how to deal with everything makes it all quite manageable, I can decide what to take from both worlds and hence pick the best, not the worst.

On 05/02/2017 5:02 PM, thedeemon wrote: snip > It may look so from a distance. But in my experience it's not that bad. > In most software I did in D it did not matter really (it's either 64-bit > or short lived programs) and the control D gives to choose how to deal > with everything makes it all quite manageable, I can decide what to take > from both worlds and hence pick the best, not the worst. The best of both worlds can be done quite simply. Instead of a chain of input ranges like: int[] data = input.filter!"a != 7".map!"a * 2".array; Use: int[] data; data.length = input.length; size_t i; foreach(v; input.filter!"a != 7".map!"a * 2") { data[i] = v; i++; } data.length = i; Of course this is dirt simple example, but instead look at it for e.g. a csv parser with some complex data structure creation + manipulation. I have some real world code here[0] that uses it. Not only is there less allocations and uses the GC but also it ends up being significantly faster! [0] https://gist.github.com/rikkimax/42c3dfa6500155c5e441cbb1437142ea#file-reports-d-L124

February 05, 2017

Re: GC question

Posted by Cym13
in reply to rikki cattermole

Permalink

Cym13

Posted in reply to rikki cattermole

Permalink

On Sunday, 5 February 2017 at 04:22:30 UTC, rikki cattermole wrote:
> On 05/02/2017 5:02 PM, thedeemon wrote:
>
> snip
>
>> It may look so from a distance. But in my experience it's not that bad.
>> In most software I did in D it did not matter really (it's either 64-bit
>> or short lived programs) and the control D gives to choose how to deal
>> with everything makes it all quite manageable, I can decide what to take
>> from both worlds and hence pick the best, not the worst.
>
> The best of both worlds can be done quite simply.
>
> Instead of a chain of input ranges like:
>
> int[] data = input.filter!"a != 7".map!"a * 2".array;
>
> Use:
>
> int[] data;
> data.length = input.length;
>
> size_t i;
> foreach(v; input.filter!"a != 7".map!"a * 2") {
> 	data[i] = v;
> 	i++;
> }
>
> data.length = i;
>
> Of course this is dirt simple example, but instead look at it for e.g. a csv parser with some complex data structure creation + manipulation.
>
> I have some real world code here[0] that uses it. Not only is there less allocations and uses the GC but also it ends up being significantly faster!
>
> [0] https://gist.github.com/rikkimax/42c3dfa6500155c5e441cbb1437142ea#file-reports-d-L124

Some data to weigh that in order to compare different memory management strategies on that simple case:

#!/usr/bin/env rdmd

import std.conv;
import std.stdio;
import std.array;
import std.range;
import std.algorithm;

auto input = [1, 2, 7, 3, 7, 8, 8, 9, 7, 1, 0];


void naive() {
    int[] data = input.filter!(a => a!= 7).map!(a => a*2).array;
    assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void maxReallocs() {
    int[] data;

    size_t i;
    foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
        data ~= v;
    }

    assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void betterOfTwoWorlds() {
    int[] data;
    data.length = input.length;

    size_t i;
    foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
        data[i] = v;
        i++;
    }
    data.length = i;

    assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void explicitNew() {
    int[] data = new int[input.length];
    scope(exit) delete data;

    size_t i;
    foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
        data[i] = v;
        i++;
    }
    data.length = i;

    assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void cStyle() @nogc {
    import std.c.stdlib;

    int* data = cast(int*)malloc(input.length * int.sizeof);
    scope(exit) free(data);

    size_t i;
    foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
        data[i++] = v;
    }

    debug assert(data[0..i] == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void onTheStack() @nogc {
    int[100] data;

    size_t i;
    foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
        data[i++] = v;
    }

    debug assert(data[0..i] == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void main(string[] args) {
    import std.datetime;
    benchmark!(
        naive,
        maxReallocs,
        betterOfTwoWorlds,
        explicitNew,
        cStyle,
        onTheStack
    )(100000).each!writeln;
}

/* Results:

Compiled with dmd -profile=gc test.d
====================================

TickDuration(385731143)  // naive,
TickDuration(575673615)  // maxReallocs,
TickDuration(255928562)  // betterOfTwoWorlds,
TickDuration(270497154)  // explicitNew,
TickDuration(97596363)   // cStyle,
TickDuration(96467459)   // onTheStack

GC usage:

bytes allocated, allocations, type, function, file:line
       17600000          100000 int[] test.explicitNew test.d:43
        4400000          100000 int[] test.betterOfTwoWorlds test.d:30
        3200000          800000 int[] test.maxReallocs test.d:22
        3200000          100000 int[] test.maxReallocs test.d:25
        3200000          100000 int[] test.explicitNew test.d:51
        3200000          100000 int[] test.explicitNew test.d:53
        3200000          100000 int[] test.betterOfTwoWorlds test.d:37
        3200000          100000 int[] test.betterOfTwoWorlds test.d:39
        3200000          100000 std.array.Appender!(int[]).Appender.Data std.array.Appender!(int[]).Appender.this /usr/include/dlang/dmd/std/array.d:2675
        3200000          100000 int[] test.naive test.d:14

Compiled with dmd -O -inline test.d
===================================

TickDuration(159383005)  // naive,
TickDuration(187192137)  // maxReallocs,
TickDuration(94094585)   // betterOfTwoWorlds,
TickDuration(102374657)  // explicitNew,
TickDuration(41801695)   // cStyle,
TickDuration(45613954)   // onTheStack

Compiled with dmd -O -inline -release -boundscheck=off test.d
=============================================================

TickDuration(152151439)  // naive,
TickDuration(140870515)  // maxReallocs,
TickDuration(46740440)   // betterOfTwoWorlds,
TickDuration(59089016)   // explicitNew,
TickDuration(26038060)   // cStyle,
TickDuration(25984371)   // onTheStack

*/

On Saturday, 4 February 2017 at 15:23:53 UTC, Adam D. Ruppe wrote: > On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote: >> - Automatic but conservative. Can leak at any time. > > All GCs are prone to leak, including precise ones. The point of garbage collection is not to prevent leaks, but rather to prevent use-after-free bugs. No, the main point of GC is to prevent leaks in the case where you have circular references. Precise GCs don't leak, by definition. If the object is reachable then it isn't a leak. Now, you might claim that objects that provably won't be touched again should be classified as dead and freed and that this is a bug that exhibit the same behaviour as a leak (running out of memory). But it's really nothing like the leaks you experience with manual memory management (e.g. circular references preventing memory from being released in a reference counting management scheme)

Forums