Jump to page: 1 2 3
Thread overview
read() performance - Linux.too?
Jul 24, 2006
Bob W
Jul 24, 2006
Bob W
Jul 24, 2006
Dave
Jul 24, 2006
Karen Lanrap
Jul 24, 2006
Bob W
Jul 24, 2006
Dave
Jul 24, 2006
Derek Parnell
Jul 24, 2006
Lionello Lunesu
Jul 25, 2006
Derek Parnell
Jul 25, 2006
Regan Heath
Jul 25, 2006
Derek Parnell
Jul 24, 2006
Bob W
Jul 25, 2006
Lionello Lunesu
Jul 25, 2006
Derek Parnell
Jul 25, 2006
Lionello Lunesu
July 24, 2006
/*
The std.file.read() function in dmd causes a performance
issue after reading large files from 100MB upwards.
Reading the file seems to be no problem, but cleanup
afterwards takes forever.

I am therefore using std.mmfile which works fine in the Windows version of D, but using read() would be more convenient in several cases.

Now a few questions:

1) Does anyone know if the read() performance problem
occurs in the Linux version of D as well?

2) Is there any info available where the real problem
sits? Allocating a few 100MB does not show the same
phenomenon and dmc's fread() function is also painless.

3) I did not find anything about this issue in Bugzilla. Did I overlook the respective entry?

*/


// Try reading a 100MB+ file with the following
// program (some patience required):

import std.stdio, std.file;

alias writefln wrl;

void main(char[][] av) {
  wrl();
  if (av.length<2)  {
    wrl("Need file name to test read() !");
    return;
  }
  char[] fn=av[1];
  wrl("Reading '%s' ...", fn);
  char[] bf=cast(char[])read(fn);
  wrl("%d bytes read.",bf.length);
  wrl("Doing something ...");
  int n=0;
  foreach(c;bf)  n+=c;
  wrl("Result: %s, done.",n);
  wrl("Expect a delay here after reading a huge file ...");
  wrl();
}


July 24, 2006
Why do you need to read the entire file into memory at once?

Anyway, this may be a Phobos problem, but most likely it's the garbage collector.  What if you manually delete the buffer returned by read()?

-[Unknown]


> /*
> The std.file.read() function in dmd causes a performance
> issue after reading large files from 100MB upwards.
> Reading the file seems to be no problem, but cleanup
> afterwards takes forever.
> 
> I am therefore using std.mmfile which works fine in the
> Windows version of D, but using read() would be more
> convenient in several cases.
> 
> Now a few questions:
> 
> 1) Does anyone know if the read() performance problem
> occurs in the Linux version of D as well?
> 
> 2) Is there any info available where the real problem
> sits? Allocating a few 100MB does not show the same
> phenomenon and dmc's fread() function is also painless.
> 
> 3) I did not find anything about this issue in Bugzilla.
> Did I overlook the respective entry?
> 
> */
> 
> 
> // Try reading a 100MB+ file with the following
> // program (some patience required):
> 
> import std.stdio, std.file;
> 
> alias writefln wrl;
> 
> void main(char[][] av) {
>   wrl();
>   if (av.length<2)  {
>     wrl("Need file name to test read() !");
>     return;
>   }
>   char[] fn=av[1];
>   wrl("Reading '%s' ...", fn);
>   char[] bf=cast(char[])read(fn);
>   wrl("%d bytes read.",bf.length);
>   wrl("Doing something ...");
>   int n=0;
>   foreach(c;bf)  n+=c;
>   wrl("Result: %s, done.",n);
>   wrl("Expect a delay here after reading a huge file ...");
>   wrl();
> }
> 
> 
July 24, 2006
Bob W wrote:
> /*
> The std.file.read() function in dmd causes a performance
> issue after reading large files from 100MB upwards.
> Reading the file seems to be no problem, but cleanup
> afterwards takes forever.
> 
> I am therefore using std.mmfile which works fine in the
> Windows version of D, but using read() would be more
> convenient in several cases.
> 
> Now a few questions:
> 
> 1) Does anyone know if the read() performance problem
> occurs in the Linux version of D as well?
> 
> 2) Is there any info available where the real problem
> sits? Allocating a few 100MB does not show the same
> phenomenon and dmc's fread() function is also painless.
> 
> 3) I did not find anything about this issue in Bugzilla.
> Did I overlook the respective entry?
> 
> */
> 
> 
> // Try reading a 100MB+ file with the following
> // program (some patience required):
> 
> import std.stdio, std.file;
> 
> alias writefln wrl;
> 
> void main(char[][] av) {
>   wrl();
>   if (av.length<2)  {
>     wrl("Need file name to test read() !");
>     return;
>   }
>   char[] fn=av[1];
>   wrl("Reading '%s' ...", fn);
>   char[] bf=cast(char[])read(fn);
>   wrl("%d bytes read.",bf.length);
>   wrl("Doing something ...");
>   int n=0;
>   foreach(c;bf)  n+=c;
>   wrl("Result: %s, done.",n);
>   wrl("Expect a delay here after reading a huge file ...");
>   wrl();
> }
> 
> 

It's more than likely the GC, the same happens w/ a program like this:

import std.outbuffer;
import std.string : atoi;
import std.stdio  : wrl = writefln;

void main(char[][] args)
{
    int n = args.length > 1 ? atoi(args[1]) : 10_000_000;
    OutBuffer b = new OutBuffer;
    for(int i = 0; i < n; i++)
    {
        b.write("hello\n");
    }
    wrl(b.toString.length);
}

Run w/o an argument (n = 10_000_000), on Windows it takes forever (starts swapping), on Linux it takes about a second.
July 24, 2006
On Mon, 24 Jul 2006 04:55:17 +0200, Bob W wrote:

> /*
> The std.file.read() function in dmd causes a performance
> issue after reading large files from 100MB upwards.
> Reading the file seems to be no problem, but cleanup
> afterwards takes forever.

Its a GC effect. The GC is scanning through the buffer looking for addresses to clean up.

A simple delete of the buffer will prevent the GC from trying so hard.

 // Try reading a 100MB+ file with the following
 // program (some patience required):

 import std.stdio, std.file;

 alias writefln wrl;

 void main(char[][] av) {
   wrl();
   if (av.length<2)  {
     wrl("Need file name to test read() !");
     return;
   }
   char[] fn=av[1];
   wrl("Reading '%s' ...", fn);
   char[] bf=cast(char[])read(fn);
   wrl("%d bytes read.",bf.length);
   wrl("Doing something ...");
   int n=0;
   foreach(c;bf)  n+=c;
   wrl("Result: %s, done.",n);

   delete bf;

   wrl("No delay here now after reading a huge file ...");
   wrl();
 }

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
24/07/2006 2:11:34 PM
July 24, 2006
Dave wrote:

>          b.write("hello\n");

Funny. Changing that to

>          b.write("helloo");

lets it run fast.
July 24, 2006
"Derek Parnell" <derek@nomail.afraid.org> wrote in message news:dg25ykpt8kxw$.1r2mhu0u851l0.dlg@40tude.net...
> On Mon, 24 Jul 2006 04:55:17 +0200, Bob W wrote:
>
>> /*
>> The std.file.read() function in dmd causes a performance
>> issue after reading large files from 100MB upwards.
>> Reading the file seems to be no problem, but cleanup
>> afterwards takes forever.
>
> Its a GC effect. The GC is scanning through the buffer looking for addresses to clean up.

Wouldn't it be possible to add some way of telling the GC not to scan something? Perhaps there's already something in std.gc, I didn't check, but I actually think the compiler could be doing this by checking the TypeInfo. I wouldn't go so far as to expect it to only scan the pointer fields of a struct, but at least it could ignore char[] and float[] (and other arrays containing non-pointer types).

I've made that Universal Machine of the programming contest (see thread below) and am running into memory problems. I have the feeling that a lot of the opcodes in the machine code are considered as pointers. Memory just keeps growing and the GC cycles take longer and longer.

It was great to write the UM without having to worry about memory, but now I'll have to worry about it and in a totally new way: trying to outsmart the GC. Either that, or malloc/memset/free : (

L.


July 24, 2006
"Unknown W. Brackets" <unknown@simplemachines.org> wrote in message news:ea1dd2$l4p$1@digitaldaemon.com...
> Why do you need to read the entire file into memory at once?

Funny question - I have a minimum of 1GB of main memory in my computers and I intend to use it in order to get the best performance possible.

You are probably aware of the fact that "read()" will read the entire file at once, it also takes care of opening and closing the file.


>
> Anyway, this may be a Phobos problem, but most likely it's the garbage collector.  What if you manually delete the buffer returned by read()?

That'll work. But I am pretty reluctant to accept my obligation to assist the GC before it falls into coma.



July 24, 2006
"Dave" <Dave_member@pathlink.com> wrote in message news:ea1g68$nhl$1@digitaldaemon.com...
>
> It's more than likely the GC, the same happens w/ a program like this:
>
> import std.outbuffer;
> import std.string : atoi;
> import std.stdio  : wrl = writefln;
>
> void main(char[][] args)
> {
>     int n = args.length > 1 ? atoi(args[1]) : 10_000_000;
>     OutBuffer b = new OutBuffer;
>     for(int i = 0; i < n; i++)
>     {
>         b.write("hello\n");
>     }
>     wrl(b.toString.length);
> }
>
> Run w/o an argument (n = 10_000_000), on Windows it takes forever (starts swapping), on Linux it takes about a second.


Thanks for your info - I'll remember it as a warning.

But this is probably a different case. Your program is
dynamically resizing b's buffer. This requires more
overhead than just releasing a piece of memory which
was allocated in one single step.



July 24, 2006
"Derek Parnell" <derek@nomail.afraid.org> wrote in message news:dg25ykpt8kxw$.1r2mhu0u851l0.dlg@40tude.net...
>
> Its a GC effect. The GC is scanning through the buffer looking for addresses to clean up.

Sounds like the GC isn't overly smart. It would be nice to have this fixed before the dmd 0.163 to dmd 1.0 transition.


> A simple delete of the buffer will prevent the GC from trying so hard.

Yes, I know. But as already mentioned in another post,
I am pretty reluctant to accept my responsibility to assist
the GC in performing an elegant exit.



July 24, 2006
Bob W wrote:
> "Dave" <Dave_member@pathlink.com> wrote in message news:ea1g68$nhl$1@digitaldaemon.com...
>> It's more than likely the GC, the same happens w/ a program like this:
>>
>> import std.outbuffer;
>> import std.string : atoi;
>> import std.stdio  : wrl = writefln;
>>
>> void main(char[][] args)
>> {
>>     int n = args.length > 1 ? atoi(args[1]) : 10_000_000;
>>     OutBuffer b = new OutBuffer;
>>     for(int i = 0; i < n; i++)
>>     {
>>         b.write("hello\n");
>>     }
>>     wrl(b.toString.length);
>> }
>>
>> Run w/o an argument (n = 10_000_000), on Windows it takes forever (starts swapping), on Linux it takes about a second.
> 
> 
> Thanks for your info - I'll remember it as a warning.
> 
> But this is probably a different case. Your program is
> dynamically resizing b's buffer. This requires more
> overhead than just releasing a piece of memory which
> was allocated in one single step.
> 

It seems to be "thrashing" during the full collection at program exit, but I haven't looked into it fully. I think that is probably what is happening in your case as well (that's why I mentioned it, but I should have explained that better).
« First   ‹ Prev
1 2 3