Thread overview | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
February 02, 2013 Why are commands executing out of order? | ||||
---|---|---|---|---|
| ||||
Here's my code: import std.datetime; import std.file; import std.stdio; struct info { string name; bool isDir; ulong size; SysTime timeCreated; SysTime timeLastAccessed; SysTime timeLastModified; this(DirEntry d) { this.name = d.name; this.isDir = d.isDir; this.size = d.size; this.timeCreated = d.timeCreated; this.timeLastAccessed = d.timeLastAccessed; this.timeLastModified = d.timeLastModified; } } void main() { writeln("Scanning drives..."); info[] results; for (char c = 'A'; c <= 'Z'; c++) { if (exists(c ~ ":\\")) results ~= info(DirEntry(c ~ ":\\")) ~ scan(c ~ ":\\"); } File f = File("driveInfo.txt", "w"); foreach (i; results) { f.writeln(i); } f.close(); writeln(memSize(results)); } info[] scan(string dir) { info[] results; try { auto de = dirEntries(dir, SpanMode.shallow); foreach (d; de) { if (d.isDir()) results ~= info(d) ~ scan(d.name); else results ~= info(d); } } catch (FileException fe){} return results; } size_t memSize(T)(T[] t) { return t.length * T.sizeof; } But main's first writeln actually outputs after f.close(). The program uses ~1GB of RAM overall, even though main.results is ~50MB according to memSize. Any attempts to call the GC do nothing, but I'm probably doing it wrong though. Is it possible that I have a memory leak somewhere that is causing this delay, or is this a DMD problem, or something else I haven't even thought of? DMD v2.060, Windows 7 x64 Thanks, Josh |
February 02, 2013 Re: Why are commands executing out of order? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Josh | On 2013-02-02 07:49, Josh wrote: > But main's first writeln actually outputs after f.close(). Maybe because of output buffering and you need to add flush? > The program uses ~1GB of RAM overall, even though main.results is ~50MB > according to memSize. Wrong comparison. You should have compared to the output file's size. info.sizeof doesn't include the size of info.name. In my test results were 30 MB but the output file was 101 MB. > Any attempts to call the GC do nothing, but I'm probably > doing it wrong though. Is it possible that I have a memory leak somewhere that > is causing this delay, or is this a DMD problem, or something else I haven't > even thought of? It's a problem with array concatenation and memory fragmentation. There are two ways out of this mess. First, there's the "Unix Way" - don't use the results array at all and print out info records immediately as they appear. Your program will use under 3 MB of RAM. :) But if you really have to do some extra processing of results, then you can reduce the memory used like 6 times by using an Appender instead of concatenating arrays and by reserving a reasonable number of records, to limit possible initial fragmentation when the array is resized. My program used 145 MB whereas the output file was 101 MB. I think it's good enough. But that's scanning just one, non-system drive. Won't even try to scan them all. Try it out: import std.datetime; import std.file; import std.stdio; import std.array; struct info { string name; bool isDir; ulong size; SysTime timeCreated; SysTime timeLastAccessed; SysTime timeLastModified; this(DirEntry d) { this.name = d.name; this.isDir = d.isDir; this.size = d.size; this.timeCreated = d.timeCreated; this.timeLastAccessed = d.timeLastAccessed; this.timeLastModified = d.timeLastModified; } } alias Appender!(info[]) InfoAppender; void main() { writeln("Scanning drives..."); info[] results; InfoAppender app = appender(results); app.reserve(512*1024); // for (char c = 'D'; c <= 'D'; c++) { if (exists(c ~ ":\\")) { app.put(info(DirEntry(c ~ ":\\"))); scan(c ~ ":\\", app); } } File f = File("driveInfo.txt", "w"); foreach (i; app.data) { f.writeln(i); } f.close(); writeln(memSize(app.data)); } void scan(string dir, ref InfoAppender app) { try { auto de = dirEntries(dir, SpanMode.shallow); foreach (d; de) { app.put(info(d)); if (d.isDir()) scan(d.name, app); } } catch (FileException fe){} } size_t memSize(T)(T[] t) { return t.length * T.sizeof; } And the preferred version to scan whole filesystem without wasting RAM: :) import std.datetime; import std.file; import std.stdio; struct info { string name; bool isDir; ulong size; SysTime timeCreated; SysTime timeLastAccessed; SysTime timeLastModified; this(DirEntry d) { this.name = d.name; this.isDir = d.isDir; this.size = d.size; this.timeCreated = d.timeCreated; this.timeLastAccessed = d.timeLastAccessed; this.timeLastModified = d.timeLastModified; } } File f; void main() { writeln("Scanning drives..."); f = File("driveInfo.txt", "w"); for (char c = 'D'; c <= 'D'; c++) { if (exists(c ~ ":\\")) { f.writeln(info(DirEntry(c ~ ":\\"))); scan(c ~ ":\\"); } } f.close(); } void scan(string dir) { try { auto de = dirEntries(dir, SpanMode.shallow); foreach (d; de) { f.writeln(info(d)); if (d.isDir()) scan(d.name); } } catch (FileException fe){} } |
February 02, 2013 Re: Why are commands executing out of order? | ||||
---|---|---|---|---|
| ||||
Posted in reply to FG | I'm was quite surprised as I had heard the first time of these 'appender'. Therefore I'm asking me: Why is the 'appender' method so much more efficient? |
February 02, 2013 Re: Why are commands executing out of order? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Namespace | On Saturday, 2 February 2013 at 11:33:07 UTC, Namespace wrote:
> I'm was quite surprised as I had heard the first time of these 'appender'.
> Therefore I'm asking me:
> Why is the 'appender' method so much more efficient?
Because concatenation likely will end up relocating the memory over and over again (possibly lots of abandoned slices too) whereas the appender will allocated & reuse the same buffer. On the other hand you might be better off (depending on the need) with just reserving a lot more memory (close to what you need or slightly larger) and then concatenation will perform much better. But if it need to reallocate then you've still got the same problem with slices where with the appender you probably won't.
I tend to reserve string concatenation for assert messages, and quicker & dirty text manipulation/building like during ctfe. Outside of those I use alternatives first.
|
February 02, 2013 Re: Why are commands executing out of order? | ||||
---|---|---|---|---|
| ||||
Posted in reply to FG | On Saturday, 2 February 2013 at 11:16:50 UTC, FG wrote: > On 2013-02-02 07:49, Josh wrote: >> But main's first writeln actually outputs after f.close(). > > Maybe because of output buffering and you need to add flush? How do I do that? I've never heard of flush before. >> The program uses ~1GB of RAM overall, even though main.results is ~50MB >> according to memSize. > > Wrong comparison. You should have compared to the output file's size. > info.sizeof doesn't include the size of info.name. > In my test results were 30 MB but the output file was 101 MB. Yeah, my output file was 126MB. I figured strings had a set amount of memory they could use, like an int can use exactly 4 bytes. > But if you really have to do some extra processing of results, then you can reduce the memory used like 6 times by using an Appender instead of concatenating arrays and by reserving a reasonable number of records, to limit possible initial fragmentation when the array is resized. My program used 145 MB whereas the output file was 101 MB. I think it's good enough. I've never come across Appenders before. Could you please explain them a little bit, and what each call in your modified code does? Thanks, Josh |
February 02, 2013 Re: Why are commands executing out of order? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Josh | > I've never come across Appenders before. Could you please explain them a little bit, and what each call in your modified code does? > > Thanks, > > Josh http://dlang.org/phobos/std_array.html#.Appender And read Era's post. |
February 02, 2013 Array concatenation vs. Appender | ||||
---|---|---|---|---|
| ||||
Posted in reply to Era Scarecrow | On 2013-02-02 13:50, Era Scarecrow wrote:
> On Saturday, 2 February 2013 at 11:33:07 UTC, Namespace wrote:
>> I'm was quite surprised as I had heard the first time of these 'appender'.
>> Therefore I'm asking me:
>> Why is the 'appender' method so much more efficient?
>
> Because concatenation likely will end up relocating the memory over and over
> again (possibly lots of abandoned slices too) whereas the appender will
> allocated & reuse the same buffer. On the other hand you might be better off
> (depending on the need) with just reserving a lot more memory (close to what you
> need or slightly larger) and then concatenation will perform much better. But if
> it need to reallocate then you've still got the same problem with slices where
> with the appender you probably won't.
Yeah but let us move reallocation out of the equation.
Reserving space limits the amount of RAM used and can avoid reallocations all together but in a little test it came out that still appender is 2.5-4 times faster than tab ~= str, where tab is char[] (same when it is ubyte[]). Why is that?
Let's say we have this code:
char[] tab;
string fill = "1234";
uint loops = 100_000_000;
tab.reserve(loops * fill.length);
foreach (i; 0..loops) tab ~= fill;
Using Appender changes the above loop into something like this:
size_t capacity = tab.capacity;
foreach (i; 0..loops) {
size_t flen = fill.length;
size_t len = tab.length;
if (len + flen > capacity) {
...
// use GC.extend or GC.qalloc & memcpy
// update capacity and assign tab = memory_block[0..len]
}
tab = tab.ptr[0..len+flen];
tab.ptr[len..len+flen] = fill[];
}
Why cannot tab ~= fill achieve similar performance as this code?
Am I missing something important?
|
February 02, 2013 Re: Why are commands executing out of order? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Josh | On 2013-02-02 16:22, Josh wrote: >> Maybe because of output buffering and you need to add flush? > > How do I do that? I've never heard of flush before. writeln("..."); stdout.flush(); C also has flush, in the form of fflush(stdout); > I've never come across Appenders before. Could you please explain them a little > bit, and what each call in your modified code does? I'm baffled myself as to why they perform better than normal array concatenation. We'll see if some interesting explanation appears. That's why I changed the topic to "Array concatenation vs. Appender". |
February 02, 2013 Re: Array concatenation vs. Appender | ||||
---|---|---|---|---|
| ||||
Posted in reply to FG | On Sat, 02 Feb 2013 10:47:39 -0500, FG <home@fgda.pl> wrote:
> On 2013-02-02 13:50, Era Scarecrow wrote:
>> On Saturday, 2 February 2013 at 11:33:07 UTC, Namespace wrote:
>>> I'm was quite surprised as I had heard the first time of these 'appender'.
>>> Therefore I'm asking me:
>>> Why is the 'appender' method so much more efficient?
>>
>> Because concatenation likely will end up relocating the memory over and over
>> again (possibly lots of abandoned slices too) whereas the appender will
>> allocated & reuse the same buffer. On the other hand you might be better off
>> (depending on the need) with just reserving a lot more memory (close to what you
>> need or slightly larger) and then concatenation will perform much better. But if
>> it need to reallocate then you've still got the same problem with slices where
>> with the appender you probably won't.
>
> Yeah but let us move reallocation out of the equation.
> Reserving space limits the amount of RAM used and can avoid reallocations all together but in a little test it came out that still appender is 2.5-4 times faster than tab ~= str, where tab is char[] (same when it is ubyte[]). Why is that?
>
> Let's say we have this code:
>
> char[] tab;
> string fill = "1234";
> uint loops = 100_000_000;
> tab.reserve(loops * fill.length);
> foreach (i; 0..loops) tab ~= fill;
>
> Using Appender changes the above loop into something like this:
>
> size_t capacity = tab.capacity;
> foreach (i; 0..loops) {
> size_t flen = fill.length;
> size_t len = tab.length;
> if (len + flen > capacity) {
> ...
> // use GC.extend or GC.qalloc & memcpy
> // update capacity and assign tab = memory_block[0..len]
> }
> tab = tab.ptr[0..len+flen];
> tab.ptr[len..len+flen] = fill[];
> }
>
> Why cannot tab ~= fill achieve similar performance as this code?
> Am I missing something important?
Appender is built for appending and has higher up-front costs, and higher memory costs.
Normal arrays can be used for appending, but also can be used for so many other things.
In other words, Appender cannot have the features that standard arrays have in order to make appending as fast as possible.
-Steve
|
February 03, 2013 Re: Array concatenation vs. Appender | ||||
---|---|---|---|---|
| ||||
Posted in reply to FG | On Saturday, 2 February 2013 at 15:47:37 UTC, FG wrote: > Yeah but let us move reallocation out of the equation. Reserving space limits the amount of RAM used and can avoid reallocations all together but in a little test it came out that still appender is 2.5-4 times faster than tab ~= str, where tab is char[] (same when it is ubyte[]). Why is that? Reserving doesn't mean it has to allocate any memory at all; That's up to the implementation of the runtime and OS, it may just say 'any allocations not related to this object start after address xxx', then as the reserved space needs more it requests the larger space from the OS. The limitations of how much you can use memory-wise doesn't change as that's determined by the infrastructure. (VM and harddrive space doesn't count). > Why cannot tab ~= fill achieve similar performance as this code? > Am I missing something important? As for how many checks and overhead the runtime has I'm not sure. Try compiling the code again with full optimizations on and see if it makes a difference. Concatenation may perform better like that (as long as relocation/copying isn't done) or maybe not... |
Copyright © 1999-2021 by the D Language Foundation