Jump to page: 1 27  
Page
Thread overview
Memory leak with dynamic array
Apr 10, 2010
Joseph Wakeling
Apr 10, 2010
bearophile
Apr 10, 2010
bearophile
Apr 10, 2010
bearophile
Apr 10, 2010
bearophile
Apr 11, 2010
bearophile
Apr 11, 2010
bearophile
Apr 11, 2010
Joseph Wakeling
Apr 12, 2010
bearophile
Apr 12, 2010
Ellery Newcomer
Apr 12, 2010
bearophile
Apr 12, 2010
bearophile
Apr 12, 2010
Ellery Newcomer
Apr 12, 2010
bearophile
Apr 12, 2010
bearophile
Apr 12, 2010
Joseph Wakeling
Apr 12, 2010
Joseph Wakeling
Apr 12, 2010
Joseph Wakeling
Apr 13, 2010
Joseph Wakeling
Apr 13, 2010
Joseph Wakeling
Apr 13, 2010
Joseph Wakeling
Apr 13, 2010
Don
Code speed [was: Re: Memory leak with dynamic array]
Apr 13, 2010
Joseph Wakeling
Apr 14, 2010
bearophile
Re: Code speed
Apr 14, 2010
bearophile
Apr 14, 2010
bearophile
Apr 14, 2010
bearophile
Apr 14, 2010
bearophile
Re: Code speed (and back to the memory leaks...)
Apr 15, 2010
Joseph Wakeling
Apr 15, 2010
bearophile
Apr 15, 2010
bearophile
Re: Code speed and C++
Apr 17, 2010
Joseph Wakeling
Apr 16, 2010
Joseph Wakeling
Apr 14, 2010
bearophile
Apr 14, 2010
Don
Apr 14, 2010
Don
^^ implementation [Was: Re: Code speed]
Apr 14, 2010
bearophile
Apr 14, 2010
Don
Apr 14, 2010
Don
Apr 14, 2010
Robert Clipsham
Apr 14, 2010
bearophile
Tango2 and Phobos2 [was: Re: Code speed]
Apr 15, 2010
Joseph Wakeling
Apr 15, 2010
Ary Borenszweig
Apr 15, 2010
Don
Apr 15, 2010
Ary Borenszweig
Apr 15, 2010
bearophile
Apr 16, 2010
Ary Borenszweig
Apr 16, 2010
Don
Apr 16, 2010
Ary Borenszweig
Apr 14, 2010
bearophile
Apr 14, 2010
bearophile
Apr 12, 2010
Brad Roberts
Apr 12, 2010
Don
April 10, 2010
Hello all,

This one is another 'oh God it doesn't work like C++' query ... :-)

I've been having a problem with a memory leak that I've tied down to a dynamic array that is repeatedly resized to zero and then extended within a loop.  The following code gives a minimal example:

////////////////////////////////////////////////////////////////////
import std.array;
import std.stdio;

void main()
{
    real[] x;

    x.length = 5000000;

    foreach(uint i;0..100) {
        x.length = 0;

        foreach(uint j;0..5000000)
            x ~= j;

        writefln("At iteration %u, x has %u elements.",i,x.length);
    }
}
////////////////////////////////////////////////////////////////////

If I run this on my system, the memory usage explodes within a very short time, despite the fact that the length of the array is set early on, so the memory should all be reserved.  The writefln() command confirms that the actual number of elements always stays within the original bounds set.

I'm guessing this is down to concatenation ~ working differently from C++
vectors' push_back() command.  I have also tried using the Appender and its
put() command, but the same result happens.

I even tried moving the array declaration within the first foreach() loop, and putting an explicit delete at the end, but to no avail.

Can anyone advise? :-)

Thanks & best wishes,

    -- Joe
April 10, 2010
Joseph Wakeling:

> Can anyone advise? :-)

Dynamic arrays in D are not black boxes, they are an abstraction that leaks a lot, so you must know how dynamic arrays are implemented. Recently the implementation of dynamic arrays being changed, has someone a link to a description of how they work now?

Few notes on your code: there is no need to import std.array into that little program. Often there is no need to add an "uint" inside the foreach. I suggest to put a space after every comma. I also suggest you to write big numbers like this: 5_000_000 because they can avoid some mistakes.

From the little I know about the new arrays this doesn't work anymore, zeroing the length produces a deallocation:
x.length = 5_000_000;
x.length = 0;

So a basic strategy to face your problem is to not let the GC work:


import std.stdio: writefln;

void main() {
    auto x = new real[5_000_000];

    foreach (i; 0 .. 1_000) {
        int x_len = 0;

        foreach (j; 0 .. 5_000_000) {
            x[x_len] = j;
            x_len++;
        }

        writefln("At iteration %u, x has %u elements.", i, x_len);
    }
}


When the GC is not good enough one thing you can do is to manage the memory in a less automatic way...

Bye,
bearophile
April 10, 2010
Joseph Wakeling:

> Can anyone advise? :-)

My precedent answer was ignorant. Read about the capacity, reserve and assumeSafeAppend here: http://www.digitalmars.com/d/2.0/phobos/object.html

I think a short page about the design of the new dynamic arrays can be added to the D site.

This is a version of the program that doesn't blow up the memory (it uses more and more memory but just a bit):

import std.stdio: writeln, writefln;

void main() {
    enum int N = 5_000_000;
    real[] arr;
    writeln("A) arr.capacity: ", arr.capacity);
    arr.reserve(N); // optional
    writeln("B) arr.capacity: ", arr.capacity);

    foreach (i; 0 .. 100) {
        arr.length = 0;
        writeln("1) arr.capacity: ", arr.capacity);
        assumeSafeAppend(arr); // not optional
        writeln("2) arr.capacity: ", arr.capacity);

        foreach (j; 0 .. N)
            arr ~= j;

        writefln("At iteration %u, arr has %u elements.", i, arr.length);
    }
}

Bye,
bearophile
April 10, 2010
Few dynamic array benchmarks that can show you more differences between C++ and D2.

Timings dmd, N=5_000_000, NLOOPS=15, T=double, seconds:
  C++ v1: 0.67
  C++ v1: 0.72
  D v1:   6.47 (uses >1 GB RAM)
  D v2:   0.76
  D v3:   5.25

dmd v2.043, dmd -O -release -inline
g++ 4.4.1, -O3 -Wall -s

-----------------

// D v.1
import std.c.stdio: printf;

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 15;

    T[] arr;
    arr.length = N;

    foreach (i; 0 .. NLOOPS) {
        arr.length = 0;

        foreach (j; 0 .. N)
            arr ~= j;

        printf("At iteration %d, arr has %u elements.\n", i, arr.length);
    }
}

-----------------

// D v.2
import std.c.stdio: printf;

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 15;

    auto arr = new T[N];

    foreach (i; 0 .. NLOOPS) {
        uint arr_len = 0;

        foreach (j; 0 .. N) {
            arr[arr_len] = j;
            arr_len++;
        }

        printf("At iteration %d, arr has %u elements.\n", i, arr_len);
    }
}

-----------------

// D v.3
import std.c.stdio: printf;

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 15;

    T[] arr;
    arr.reserve(N);

    foreach (i; 0 .. NLOOPS) {
        arr.length = 0;
        arr.assumeSafeAppend;

        foreach (j; 0 .. N)
            arr ~= j;

        printf("At iteration %d, arr has %u elements.\n", i, arr.length);
    }
}

-----------------

// C++ v.1
#include "stdio.h"
#include <vector>

int main() {
    typedef double T;
    const int N = 5000000;
    const int NLOOPS = 15;

    std::vector<T> arr;
    arr.reserve(N);

    for (int i = 0; i < NLOOPS; i++) {
        arr.clear();

        for (int j = 0; j < N; j++)
            arr.push_back(j);

        printf("At iteration %d, arr has %u elements.\n", i, arr.size());
    }
}

-----------------

// C++ v.2
#include "stdio.h"
#include <vector>

int main() {
    typedef double T;
    const int N = 5000000;
    const int NLOOPS = 15;

    std::vector<T> arr(N);

    for (int i = 0; i < NLOOPS; i++) {
        size_t arr_len = 0;

        for (int j = 0; j < N; j++) {
            arr[arr_len] = j;
            arr_len++;
        }

        printf("At iteration %d, arr has %u elements.\n", i, arr_len);
    }
}

-----------------

Bye,
bearophile
April 10, 2010
This can help confuse your mind a bit:

import std.stdio: writeln;
void main() {
    {
        int[] arr = [0, 1, 2, 3, 4, 5, 6].dup;
        int[] slice = arr[2 .. 4];
        writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 4 5 6 | 2 3
        slice ~= 10; slice ~= 20;
        writeln("arr.capacity, slice.capacity: ", arr.capacity, " ", slice.capacity); // arr.capacity, slice.capacity: 7 7
        writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 4 5 6 | 2 3 10 20
    }

    {
        int[] arr = [0, 1, 2, 3, 4, 5, 6].dup;
        int[] slice = arr[2 .. 4];
        writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 4 5 6 | 2 3

        slice.assumeSafeAppend;
        slice ~= 10; slice ~= 20; // causes stomping
        writeln("arr.capacity, slice.capacity: ", arr.capacity, " ", slice.capacity); // arr.capacity, slice.capacity: 0 5
        writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 10 20 6 | 2 3 10 20
        slice ~= 30; slice ~= 40;
        writeln("arr.capacity, slice.capacity: ", arr.capacity, " ", slice.capacity); // arr.capacity, slice.capacity: 7 7
        writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 10 20 30 | 2 3 10 20 30 40
    }
}


The  slice.capacity = 7 in the first case is just a coincidence, it's the result of the overallocation.

But I don't know why arr.capacity is zero and then seven in the second and third case.

Bye,
bearophile
April 11, 2010
On Sat, 10 Apr 2010 18:17:12 -0400, bearophile <bearophileHUGS@lycos.com> wrote:

> This can help confuse your mind a bit:
>
> import std.stdio: writeln;
> void main() {
>     {
>         int[] arr = [0, 1, 2, 3, 4, 5, 6].dup;
>         int[] slice = arr[2 .. 4];
>         writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 4 5 6 | 2 3
>         slice ~= 10; slice ~= 20;
>         writeln("arr.capacity, slice.capacity: ", arr.capacity, " ", slice.capacity); // arr.capacity, slice.capacity: 7 7
>         writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 4 5 6 | 2 3 10 20
>     }
>
>     {
>         int[] arr = [0, 1, 2, 3, 4, 5, 6].dup;
>         int[] slice = arr[2 .. 4];
>         writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 4 5 6 | 2 3
>
>         slice.assumeSafeAppend;
>         slice ~= 10; slice ~= 20; // causes stomping
>         writeln("arr.capacity, slice.capacity: ", arr.capacity, " ", slice.capacity); // arr.capacity, slice.capacity: 0 5
>         writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 10 20 6 | 2 3 10 20
>         slice ~= 30; slice ~= 40;
>         writeln("arr.capacity, slice.capacity: ", arr.capacity, " ", slice.capacity); // arr.capacity, slice.capacity: 7 7
>         writeln("arr, slice: ", arr, " | ", slice); // arr, slice: 0 1 2 3 10 20 30 | 2 3 10 20 30 40
>     }
> }
>
>
> The  slice.capacity = 7 in the first case is just a coincidence, it's the result of the overallocation.

Yes, to allocate an array of 4 integers, you need 16 bytes, but there needs to be a padding byte to prevent cross-block pointer problems, so it uses a 32-byte block.  The 32-byte block also needs a padding byte, so really you can only put in 7 elements, not 8.

> But I don't know why arr.capacity is zero and then seven in the second and third case.

0 means if you append to the array, it will reallocate.  It will return 0 for stack-allocated arrays also.  This makes sense since the slice has taken over the "allocated" length of the block.  Essentially, capacity indicates how many elements can be appended.  The function gives up and returns 0 if it determines the array does not end at the allocated part of a block.  Technically, I could return the length of the array, but I'm not sure whether that is as useful.

For fun, add one more element to slice, and the arr will now have a valid capacity :)

FYI, using arr after assuming safe append on the slice is undefined.

-Steve
April 11, 2010
On Sat, 10 Apr 2010 17:19:17 -0400, bearophile <bearophileHUGS@lycos.com> wrote:

> Few dynamic array benchmarks that can show you more differences between C++ and D2.
>
> Timings dmd, N=5_000_000, NLOOPS=15, T=double, seconds:
>   C++ v1: 0.67
>   C++ v1: 0.72
>   D v1:   6.47 (uses >1 GB RAM)
>   D v2:   0.76
>   D v3:   5.25
>
> dmd v2.043, dmd -O -release -inline
> g++ 4.4.1, -O3 -Wall -s

That is to be expected.  The append function is not the most efficient thing.  It still must do a lot of work just to look up the valid length and verify appending can be done, whereas the v2 "append" is a single instruction!

The append function is good when you don't know what the length of an array is, and you are appending a small number of elements, or another array.  I don't advise using it for one-at-a-time appending if you know you have a large number of elements to append.  A better method is to set the length, and then write the data.

-Steve
April 11, 2010
Steven Schveighoffer:

Thank you for your answers Steven, I always appreciate your efforts in both trying to improve D runtime and in teaching me :-)

The new dynamic array regime has the main advantage of avoiding some stomping-related bugs (a secondary advantage is a little higher append speed). One its disadvantage is its increased complexity, that makes it harder for the programmer to understand what dynamic arrays are actually doing under the cover. Sometimes you don't need to know what's happening under the cover, but in this case arrays are so basic and common data structures, and their abstraction is leaking so much, that I think a D programmer must know what's happening inside them to use them at their best.


> >   C++ v1: 0.67
> >   C++ v1: 0.72

That the 0.72 timing was from the second C++ program, sorry.


> That is to be expected.  The append function is not the most efficient thing.  It still must do a lot of work just to look up the valid length and verify appending can be done, whereas the v2 "append" is a single instruction!

The v2 "append" seems two instructions:
arr[arr_len] = j;
arr_len++;

And the C++ v1 too has to verify if the appending can be done, comparing the size with the capacity and if so assigning the item and increasing the size.
I understand that the D code has to work more here, and I presume your code can't be improved. But those timings are a little sad anyway :-|


> A better method is to set the length, and then write the data.

What I have done in the D v2 example.


>0 means if you append to the array, it will reallocate.  It will return 0
for stack-allocated arrays also.  This makes sense since the slice has taken over the "allocated" length of the block.  Essentially, capacity indicates how many elements can be appended.  The function gives up and returns 0 if it determines the array does not end at the allocated part of a block.<
>For fun, add one more element to slice, and the arr will now have a valid capacity :)<

I will have to read this some more times, and probably I will have to read the source code of the append implementation :-)


>Technically, I could return the length of the array, but I'm not sure whether that is as useful.<

I like this.


>FYI, using arr after assuming safe append on the slice is undefined.<

/me nods.

Can you please write a short page, to be added to the D site (for example in the "Articles" section) that explains the data structures used under the cover by the dynamic arrays and how such data structures work (and maybe why this is the chosen design of arrays instead of other possible designs, how it iteracts with the current GC)? The purpose of this short text is to help D programmers understand what and why dynamic arrays work this way in their programs. Dynamic arrays are among the most common data structure in D programs, so they are pretty important and basic.

If you are willing to write a first draft of this small document I can offer you any kind of help you want, for example I can read and comment your text, fix typos, convert it to html, I can even draw for you small data structures with their little pointers, I can beg Walter to add the resulting html page to his site, and so on :-) It's not a big work, despite all it can't be a very complex data structure.

Bye and thank you,
bearophile
April 11, 2010
One more small thing I was forgetting. In my dlibs1 there are functions similar to:


T[][] NewVoidGCMatrix(T)(int nr, int nc) {
    // Part of the code by Marius Muja <mariusm@cs.ubc.ca>
    assert(nr > 0, "NewVoidCGMatrix: nr must be > 0.");
    assert(nc > 0, "NewVoidCGMatrix: nc must be > 0.");
    void* mem = cast(void*)gcmalloc(nr * (T[]).sizeof + nr * nc * T.sizeof);
    hasNoPointers(mem);
    T[]* index = cast(T[]*)mem;
    T* mat = cast(T*)(mem + nr * (T[]).sizeof);

    for (int i = 0; i < nr; ++i) {
        index[i] = mat[0 .. nc];
        mat += nc;
    }

    return index[0 .. nr];
}

void delVoidGC(T)(ref T[] a) {
    gcrealloc(a.ptr, 0);
    a = null;
}


They allow to allocate only one chunk of memory for a matrix, and in some situations they give a little more performance (because they give a little more CPU cache coherence). I guess such functions can't be used in D2. But maybe once I know the underlying data structure I can write something similar for D2 too.

Bye,
bearophile
April 11, 2010
Thanks very much for the extended and detailed explanations.  The assumeSafeAppend
function
indeed fixes the memory leakage. :-)

In reply to some comments ...

> > That is to be expected.  The append function is not the most efficient thing.  It still must do a lot of work just to look up the valid length and verify appending can be done, whereas the v2 "append" is a single instruction!
>
> The v2 "append" seems two instructions:
> arr[arr_len] = j;
> arr_len++;
>
> And the C++ v1 too has to verify if the appending can be done, comparing the size with the capacity and if so assigning the item and increasing the size.
>
> I understand that the D code has to work more here, and I presume your code can't be improved. But those timings are a little sad anyway :-|
>
> > A better method is to set the length, and then write the data.
>
> What I have done in the D v2 example.

I was very happy to see that D _does_ have a 'reserve' function for arrays, which I had been missing compared to C++ (it's not mentioned in the array docs).

Still, I don't think that pre-reserving the memory per se is the influencing factor on the differences in performance.  To see why, compare these two pieces of code -- a non-leaking D code, and the equivalent in C++:

////////////// D example
/////////////////////////
import std.stdio;

void main()
{
    double[] x;

    foreach(uint i;0..100) {
        x.length = 0;

        writefln("Array capacity (1) = %u",x.capacity);
        assumeSafeAppend(x);
        writefln("Array capacity (2) = %u",x.capacity);

        foreach(uint j;0..5000000)
            x ~= j;

        writefln("At iteration %u, x has %u elements.",i,x.length);
    }
}
/////////////////////////

//////////// C++ example
/////////////////////////
#include <vector>
#include <iostream>

using namespace std;

int main()
{
    vector<double> x;

    for(unsigned int i=0;i!=100;++i) {
        x.resize(0);

        cout << "Vector capacity: " << x.capacity() << endl;

        for(unsigned int j=0;j!=5000000;++j)
            x.push_back(j);

        cout << "At iteration " << i << ", x has " << x.size() << " elements." <<
endl;
    }

    return 0;
}
/////////////////////////

Note that in C++ the memory is not preassigned either.  The difference between the performance of these pieces of code is striking -- on my machine the D example takes about 70--75 seconds to run, whereas the C++ example speeds through it in 10s or less.

D also uses about 20% more memory than the C++ even though the C++ code
declares a higher capacity for the vector (above 8 million) than D does
for the array (a little over 5 million).

I don't know if it was naive to assume that D's dynamic arrays would be equivalent to vectors in C++.  But it's clear that the D array appender ~= is much slower than the C++ vector push_back().

Using an Appender class and put() (from std.array) is even slower,
despite the std.array docs recommending this over ~. :-(

It's disappointing because I'm loving so much of D's syntax, but I can write far more efficient code (with less explicit memory management) in C++ ...
« First   ‹ Prev
1 2 3 4 5 6 7