Jump to page: 1 2
Thread overview
Running out of memory
Dec 27, 2012
FG
Dec 27, 2012
bearophile
Dec 27, 2012
FG
Dec 27, 2012
bearophile
Dec 27, 2012
Jonathan M Davis
Dec 27, 2012
FG
Dec 27, 2012
bearophile
Dec 27, 2012
FG
Dec 27, 2012
bearophile
Dec 27, 2012
FG
Dec 28, 2012
Jonathan M Davis
Dec 28, 2012
FG
Dec 28, 2012
evilrat
Dec 27, 2012
evilrat
December 27, 2012
Hello,
I'm having memory problems with a string concatenation program.
I'm on Windows 7 64-bit with DMD32 v2.060 and cygwin gcc-3.4.4 32-bit.
Consider this example in C++:

    #include <iostream>
    int main(int argc, char **args) {
        std::string s;
        for (int i=0, j=0; i < 500000000; i++) {
            s += "....";
            if (i % 10000000 == 0) {
                std::cout << "loop " << ++j << std::endl;
                s = "";
            }
        }
        return 0;
    }

After the first "loop" (10 million iterations) the memory footprint is stable at 40 MB, which is expected. Now, rewriting that to D:

    import std.stdio;
    class C { string x; }
    void main(string[] args) {
        char[] s;
        // s.reserve(40_000_000);
        for (int i=0, j=0; i < 500_000_000; i++) {
            s ~= "....";
            if (i % 10_000_000 == 0) {
                writeln("loop ", ++j); stdout.flush();
                s = "".dup;
            }
        }
    }

It runs 6 times faster thanks to a better string implementation than C++ (hats off), but memory keeps growing long after the first "loop" message, reaching around 350-430 MB, then sometimes near the end it starts growing again, reaching even 700 MB. Why doesn't the GC take care of it? Interestingly, when I uncomment the s.reserve(...) line, it's much worse - memory keeps growing until the program dies with a core.exception.OutOfMemoryError.

Then I have rewritten this code to use an Appender:

    import std.stdio, std.array;
    class C { string x; }
    void main(string[] args) {
        char[] s;
        s.reserve(40_000_000);
        auto app = appender(s);
        for (int i=0, j=0; i < 500_000_000; i++) {
            app.put("....");
            if (i % 10_000_000 == 0) {
                writeln("loop ", ++j); stdout.flush();
                app.clear();
            }
        }
    }

This is what I was after - it runs 4 times faster than the first D code and uses only 40 MB of memory. It's close to the efficiency of operating on the array using indexing. I get, that it's by design much faster than s ~= "....", but why is so much memory wasted in the first D example?
Am I doing something wrong? I thought s ~= a is rewritten as an equivalent of s.push_back(a) (from C++'s vector) and not s = s ~ a.

best regards,
FG
December 27, 2012
FG:

> but why is so much memory wasted in the first D example?

There is also assumeSafeAppend, for your experiments :-)

Beside the experiments, there is also this to read:
http://dlang.org/d-array-article.html

Bye,
bearophile
December 27, 2012
On 2012-12-27 13:52, bearophile wrote:
> There is also assumeSafeAppend, for your experiments :-)

This is it! Makes appends in-place. The following code uses only 40 MB of RAM (or 75 MB without prior call to reserve):

    import std.stdio;
    class C { string x; }
    void main(string[] args) {
        char[] s;
        s.reserve(40_000_000);
        for (int i=0, j=0; i < 500_000_000; i++) {
            if (i % 10_000_000 == 0) {
                writeln("loop ", ++j); stdout.flush();
                s.length = 0;
                s.assumeSafeAppend();
            }
            s ~= "....";
        }
    }

But I also had to write s.length = 0 instead of either s = [] or s = "".dup.
I wonder why GC didn't reclaim the regions previously used by the array before it an empty one was assigned to s. GC has started freeing those regions only after the 21'st loop when used up RAM has gone up to 780 MB.
I have 8 GB of RAM, 4 GB free but I doubt 780 MB is the right moment to start clearing space. :)

Oddly enough I can keep it at 40 MB (or 75 MB when not reserving space) without assumeSafeAppend(), when using manual delete:

    import std.stdio;
    class C { string x; }
    void main(string[] args) {
        char[] s;
        s.reserve(40_000_000);
        for (int i=0, j=0; i < 500_000_000; i++) {
            if (i % 10_000_000 == 0) {
                writeln("loop ", ++j); stdout.flush();
                delete s;
                s = [];
                s.reserve(40_000_000);
            }
            s ~= "....";
        }
    }

Now, this delete looks like Python's kind of GC cheating. :)
Personally, I don't have anything against it, but is it a valid practice?


December 27, 2012
FG:

> But I also had to write s.length = 0 instead of either s = [] or s = "".dup.

Also try:
s = null;


> I wonder why GC didn't reclaim the regions previously used by the array before it an empty one was assigned to s.

Some explanations are in the article I've linked previously. Most of the other part of the answer comes from the fact that currently the D GC is not a precise GC. So randomly inbound pointers keep large memory chunks alive.


> Now, this delete looks like Python's kind of GC cheating. :)
> Personally, I don't have anything against it, but is it a valid practice?

delete is deprecated in D. There is destroy(), and a replacement for delete in the memory/GC module.

Bye,
bearophile
December 27, 2012
On Thursday, 27 December 2012 at 14:15:51 UTC, FG wrote:
> ...
>
> Now, this delete looks like Python's kind of GC cheating. :)
> Personally, I don't have anything against it, but is it a valid practice?

delete itself is considered deprecated in D2, though it was used widely, instead there is destroy function but i still use delete too and never tested that func.

also u can call GC.collect() from core.memory module to tell GC to manually free mem

but don't accept this as absolutely truth because i'm a bit noob
December 27, 2012
On Thursday, December 27, 2012 15:31:45 bearophile wrote:
> delete is deprecated in D. There is destroy(), and a replacement
> for delete in the memory/GC module.

Indeed. But in this particular case, destroy would be of no help, as it specifically destroys objects _without_ freeing their memory. And at the moment, for strings, I don't believe that it's any different from setting them to null.

- Jonathan M Davis
December 27, 2012
On 2012-12-27 15:31, bearophile wrote:
> delete is deprecated in D. There is destroy(), and a replacement for delete in
> the memory/GC module.

destroy() had no effect on program's behavior, even the docs state that "it does not initiate a GC cycle or free any GC memory".
So I hope delete will still remain an option.
Very useful for freeing such large chunks like in this example.

What do you mean as the delete replacement?
December 27, 2012
FG:

> What do you mean as the delete replacement?

Take a look here (at free()):

http://dlang.org/phobos/core_memory.html

Bye,
bearophile
December 27, 2012
On 2012-12-27 16:14, bearophile wrote:
> FG:
>
>> What do you mean as the delete replacement?
>
> Take a look here (at free()):
>
> http://dlang.org/phobos/core_memory.html

Oh, I didn't expect this to work on arrays but it does as good as delete:

GC.free(cast(void*)s);

December 27, 2012
FG:

> GC.free(cast(void*)s);

Don't cast arrays arrays to pointers, it's unclean, because dynamic arrays are two words long. Use the .ptr property:

GC.free(s.ptr);

Or if the precedent doesn't work:

GC.free(cast(void*)s.ptr);

Bye,
bearophile
« First   ‹ Prev
1 2