Jump to page: 1 2
Thread overview
stdio very slow
Aug 12, 2004
Heinz Saathoff
Aug 13, 2004
Walter
Aug 13, 2004
Heinz Saathoff
Aug 13, 2004
Jan Knepper
Aug 16, 2004
Heinz Saathoff
Aug 16, 2004
Scott Michel
Aug 17, 2004
Heinz Saathoff
Aug 17, 2004
Scott Michel
Aug 18, 2004
Heinz Saathoff
Aug 18, 2004
Scott Michel
Aug 13, 2004
Walter
Aug 16, 2004
Heinz Saathoff
Aug 16, 2004
Walter
August 12, 2004
Hello,

I wrote a small program (NT console app) to search for filenames in my Eudora mailbox files. The first attempt was to use (fopen, fgetc, flcose) functions, files were opened in binary mode. The resulting program worked but took a long time to run. Ok, might be my straight- forward naive algorithm. Then I experimented with memory mapped files and the otherwise same program was very fast. Ok, might be the OS overhead, so I tried the (_open, _read, _close) functions with a own small buffering (1K buffer). It was a bit slower than the memory mapped approch but still very fast. Here are the times measured:

stdio :   14.8 seconds
_read :    1.8 seconds
mmap  :    0.8 seconds

Not that it matters for my small program, but I'm wondering why the stdio fgetc function takes so much time.


- Heinz
August 13, 2004
It could be that the optimal buffer size you need is not the default one used by stdio.

"Heinz Saathoff" <hsaat@despammed.com> wrote in message news:MPG.1b85610ac0558da69896e3@news.digitalmars.com...
> Hello,
>
> I wrote a small program (NT console app) to search for filenames in my Eudora mailbox files. The first attempt was to use (fopen, fgetc, flcose) functions, files were opened in binary mode. The resulting program worked but took a long time to run. Ok, might be my straight- forward naive algorithm. Then I experimented with memory mapped files and the otherwise same program was very fast. Ok, might be the OS overhead, so I tried the (_open, _read, _close) functions with a own small buffering (1K buffer). It was a bit slower than the memory mapped approch but still very fast. Here are the times measured:
>
> stdio :   14.8 seconds
> _read :    1.8 seconds
> mmap  :    0.8 seconds
>
> Not that it matters for my small program, but I'm wondering why the stdio fgetc function takes so much time.
>
>
> - Heinz


August 13, 2004
Hello Walter,

Walter wrote ...
> It could be that the optimal buffer size you need is not the default one used by stdio.

I only do a sequential read using fgetc. As far as I know stdio also
uses a buffer of at least 512 bytes.
Decreasing the buffer size from 1024 byte to 512 byte in my
(_open,_read,_close)-buffered version increases the runtime from 1.8
seconds to 1.95 seconds. That's not too much for the smallest praxtical
buffer size.

> "Heinz Saathoff" <hsaat@despammed.com> wrote in message news:MPG.1b85610ac0558da69896e3@news.digitalmars.com...
> > Hello,
> >
> > I wrote a small program (NT console app) to search for filenames in my Eudora mailbox files. The first attempt was to use (fopen, fgetc, flcose) functions, files were opened in binary mode. The resulting program worked but took a long time to run. Ok, might be my straight- forward naive algorithm. Then I experimented with memory mapped files and the otherwise same program was very fast. Ok, might be the OS overhead, so I tried the (_open, _read, _close) functions with a own small buffering (1K buffer). It was a bit slower than the memory mapped approch but still very fast. Here are the times measured:
> >
> > stdio :   14.8 seconds
> > _read :    1.8 seconds
> > mmap  :    0.8 seconds
> >
> > Not that it matters for my small program, but I'm wondering why the stdio fgetc function takes so much time.
> 
> 
August 13, 2004
Heinz Saathoff wrote:
> I only do a sequential read using fgetc. As far as I know stdio also uses a buffer of at least 512 bytes. Decreasing the buffer size from 1024 byte to 512 byte in my (_open,_read,_close)-buffered version increases the runtime from 1.8 seconds to 1.95 seconds. That's not too much for the smallest praxtical buffer size.

How much are you reading at once?
fgetc only does 1 character per call. (_)read usually does more.
Calling the buffered I/O system for every single character or once for a block of 128 does make a difference.


>  
> 
>>"Heinz Saathoff" <hsaat@despammed.com> wrote in message
>>news:MPG.1b85610ac0558da69896e3@news.digitalmars.com...
>>
>>>Hello,
>>>
>>>I wrote a small program (NT console app) to search for filenames in my
>>>Eudora mailbox files. The first attempt was to use (fopen, fgetc,
>>>flcose) functions, files were opened in binary mode. The resulting
>>>program worked but took a long time to run. Ok, might be my straight-
>>>forward naive algorithm. Then I experimented with memory mapped files
>>>and the otherwise same program was very fast. Ok, might be the OS
>>>overhead, so I tried the (_open, _read, _close) functions with a own
>>>small buffering (1K buffer). It was a bit slower than the memory mapped
>>>approch but still very fast. Here are the times measured:
>>>
>>>stdio :   14.8 seconds
>>>_read :    1.8 seconds
>>>mmap  :    0.8 seconds
>>>
>>>Not that it matters for my small program, but I'm wondering why the
>>>stdio fgetc function takes so much time.
>>
>>


-- 
ManiaC++
Jan Knepper

But as for me and my household, we shall use Mozilla...
www.mozilla.org
August 13, 2004
Try setting the buffer size larger, not smaller, and make it a multiple of 4K. -Walter

"Heinz Saathoff" <hsaat@despammed.com> wrote in message news:MPG.1b868bd6fe18ecf99896e4@news.digitalmars.com...
> Hello Walter,
>
> Walter wrote ...
> > It could be that the optimal buffer size you need is not the default one used by stdio.
>
> I only do a sequential read using fgetc. As far as I know stdio also
> uses a buffer of at least 512 bytes.
> Decreasing the buffer size from 1024 byte to 512 byte in my
> (_open,_read,_close)-buffered version increases the runtime from 1.8
> seconds to 1.95 seconds. That's not too much for the smallest praxtical
> buffer size.
>
> > "Heinz Saathoff" <hsaat@despammed.com> wrote in message news:MPG.1b85610ac0558da69896e3@news.digitalmars.com...
> > > Hello,
> > >
> > > I wrote a small program (NT console app) to search for filenames in my Eudora mailbox files. The first attempt was to use (fopen, fgetc, flcose) functions, files were opened in binary mode. The resulting program worked but took a long time to run. Ok, might be my straight- forward naive algorithm. Then I experimented with memory mapped files and the otherwise same program was very fast. Ok, might be the OS overhead, so I tried the (_open, _read, _close) functions with a own small buffering (1K buffer). It was a bit slower than the memory
mapped
> > > approch but still very fast. Here are the times measured:
> > >
> > > stdio :   14.8 seconds
> > > _read :    1.8 seconds
> > > mmap  :    0.8 seconds
> > >
> > > Not that it matters for my small program, but I'm wondering why the stdio fgetc function takes so much time.
> >
> >


August 16, 2004
Hello Jan,

Jan Knepper wrote...
> Heinz Saathoff wrote:
> > I only do a sequential read using fgetc. As far as I know stdio also uses a buffer of at least 512 bytes. Decreasing the buffer size from 1024 byte to 512 byte in my (_open,_read,_close)-buffered version increases the runtime from 1.8 seconds to 1.95 seconds. That's not too much for the smallest praxtical buffer size.
> 
> How much are you reading at once?
> fgetc only does 1 character per call. (_)read usually does more.
> Calling the buffered I/O system for every single character or once for a
> block of 128 does make a difference.

When using stdio I read one char at a time with fgetc. But internally stdio users a buffer too. My simple buffered file is this:
------------------- buffered file --------------------------
class CppFILE
{
public:
   CppFILE() : fhandle(0), idx(0), filled(-1) {}
   CppFILE(const char *name, const char *mode) : idx(0), filled(-1) {
      Open(name, mode);
   }
   ~CppFILE() { Close(); }
   bool Open(const char *name, const char *mode) {
      fhandle = _open(name, _O_RDONLY|_O_BINARY);
      return fhandle>=0;
   }
   void Close() { if(fhandle > 0) {
                     _close(fhandle);
                     fhandle = -1;
                     idx = 0;
                     filled = -1;
                  }
                }
   int  getc();
protected:
   void  Fill();
   int   fhandle;
   unsigned char  buffer[4096];
   int   idx, filled;
};

void CppFILE::Fill()
{
   if( fhandle>=0 && (filled < 0 || filled == sizeof(buffer)) ) {
      // fill possible
      filled = _read(fhandle, buffer, sizeof(buffer));
      //printf("Fill: read %d\n", filled);
      idx = 0;
   }//if
}

int CppFILE::getc()
{
   if(idx < filled)  return buffer[idx++];
   Fill();
   if(idx < filled)  return buffer[idx++];
   return EOF;
}
---------------- end buffered file -------------------------

Instead of fgetc I used  infile.getc()  to read a single char. I thought that fgetc would do it's buffering in a similar way. But it seems that fgetc() does much more than my simple getc(). I think it's time to look for the sources of stdio.


- Heinz
August 16, 2004
Hello Walter,

The test with small buffer was to show that my simple buffering still is much faster than the stdio fgetc(). For stdio I didn't change anything. As far as I know stdio uses buffering too if not disabled. I think I will have a look at the stdio sources to find out what happens.

- Heinz


Walter wrote...
> Try setting the buffer size larger, not smaller, and make it a multiple of 4K. -Walter
> 
> "Heinz Saathoff" <hsaat@despammed.com> wrote in message news:MPG.1b868bd6fe18ecf99896e4@news.digitalmars.com...
> > Hello Walter,
> >
> > Walter wrote ...
> > > It could be that the optimal buffer size you need is not the default one used by stdio.
> >
> > I only do a sequential read using fgetc. As far as I know stdio also
> > uses a buffer of at least 512 bytes.
> > Decreasing the buffer size from 1024 byte to 512 byte in my
> > (_open,_read,_close)-buffered version increases the runtime from 1.8
> > seconds to 1.95 seconds. That's not too much for the smallest praxtical
> > buffer size.
> 
> 
August 16, 2004
"Heinz Saathoff" <hsaat@despammed.com> wrote in message news:MPG.1b8a8600757fa24d9896e6@news.digitalmars.com...
> Hello Walter,
>
> The test with small buffer was to show that my simple buffering still is much faster than the stdio fgetc(). For stdio I didn't change anything. As far as I know stdio uses buffering too if not disabled. I think I will have a look at the stdio sources to find out what happens.
>
> - Heinz

fgetc also must do thread synchronization.


August 16, 2004
Heinz Saathoff wrote:

> Hello Jan,
> 
> Jan Knepper wrote...
> 
>>Heinz Saathoff wrote:
>>
>>>I only do a sequential read using fgetc. As far as I know stdio also uses a buffer of at least 512 bytes. Decreasing the buffer size from 1024 byte to 512 byte in my (_open,_read,_close)-buffered version increases the runtime from 1.8 seconds to 1.95 seconds. That's not too much for the smallest praxtical buffer size.
>>
>>How much are you reading at once?
>>fgetc only does 1 character per call. (_)read usually does more.
>>Calling the buffered I/O system for every single character or once for a block of 128 does make a difference.

Jan's point is that a function call to fgetc() has a lot more overhead associated with it than incrementing a pointer. The test would be a little better balanced if you benchmarked fread() against _read().

In both the _read() and the memory mapped file case, you're reading into a buffer and (presumably) using a character pointer to examine each character in the buffer. This will always be faster than fgetc(), even if fgetc() is inlined.
August 17, 2004
Hello Scott,

Scott Michel wrote...
> >>How much are you reading at once?
> >>fgetc only does 1 character per call. (_)read usually does more.
> >>Calling the buffered I/O system for every single character or once for a
> >>block of 128 does make a difference.
> 
> Jan's point is that a function call to fgetc() has a lot more overhead associated with it than incrementing a pointer. The test would be a little better balanced if you benchmarked fread() against _read().

That fgetc() has much overhead is true, but I wasn't sure why it's
nearly a factor of 10 against my primitive buffering approach. Walter
told me that fgetc has to be aware of multithreading. There will be some
error handling too. All this is overhead.
When I find some time I will have a look to the sources and see what
happens.


> In both the _read() and the memory mapped file case, you're reading into a buffer and (presumably) using a character pointer to examine each character in the buffer. This will always be faster than fgetc(), even if fgetc() is inlined.

If fgetc() was implemented the way I did in my simple buffering file wrapper it would be as fast as my version. As you told fgetc() does more than just picking a char from a buffer and incrementing a pointer. I didn't expect this overhead in first place but now I know not to use fgetc() in timecritical applications.


- Heinz
« First   ‹ Prev
1 2