memory-mapped files - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » memory-mapped files

Thread overview

memory-mapped files
Feb 18, 2009 Andrei Alexandrescu
Feb 18, 2009 grauzone
Feb 18, 2009 bearophile
Feb 18, 2009 Brad Roberts
Feb 18, 2009 Andrei Alexandrescu
Feb 18, 2009 Vladimir Panteleev
Feb 18, 2009 Sean Kelly
Feb 19, 2009 Benji Smith
Feb 19, 2009 Andrei Alexandrescu
Feb 19, 2009 Sergey Gromov
Feb 18, 2009 Andrei Alexandrescu
Feb 19, 2009 Walter Bright
Feb 19, 2009 Kagamin
Feb 18, 2009 Lionello Lunesu
Feb 18, 2009 BCS
Feb 18, 2009 Kagamin

February 18, 2009

memory-mapped files

Posted by Andrei Alexandrescu

Andrei Alexandrescu

Indeed, time and again, "testing is believing".

I tried a simple line splitting program in D with and without memory mapping against a 140MB file. The program just reads the entire file and does some simple string processing on it.

The loop pattern looks like this:

    foreach (line; byLineDirect(stdin))
    {
        auto r = splitter(line, "|||");
        write(r.head, ":");
        r.next;
        writeln(r.head);
    }

The byLineDirect returns a range that uses memory mapped files when possible, or simple fread calls otherwise.

The memory-mapped version takes 2.15 seconds on average. I was fighting against Perl's equivalent 2.45. At some point I decided to try without memory mapping and I consistently got 1.75 seconds. What the heck is going on? When does memory mapping actually help?


Andrei

February 18, 2009

Re: memory-mapped files

Posted by grauzone
in reply to Andrei Alexandrescu

grauzone

Posted in reply to Andrei Alexandrescu

Could you post compilable versions for both approaches, so that we can test it our self?
I guess one would also need some input data.

February 18, 2009

Re: memory-mapped files

Posted by bearophile
in reply to Andrei Alexandrescu

bearophile

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu:

> Indeed, time and again, "testing is believing".

Yep. Time ago I have read that the only science of "computer science" is in things like timing benchmarks and the like :-)


>      foreach (line; byLineDirect(stdin))

I don't like that byLineDirect() too much, it will become one of the most used in scripting-like programs, so it deserves to be short&easy.


>          write(r.head, ":");

Something tells me that such .head will become so common in D programs that my fingers will learn to write it while I sleep too :-)

>          r.next;

.next is clear, nice, and short. Its only fault is that it doesn't sound much like something that has side effects... I presume it's not possible to improve this situation.


>What the heck is going on? When does memory mapping actually help?<

You are scanning the file linearly, and the memory window you use is probably very small. In such situation a memory mapping is probably not the best thing. A memory mapping is useful when you for example operate with random access on a wider sliding window on the file.

Bye,
bearophile

February 18, 2009

Re: memory-mapped files

Posted by Brad Roberts
in reply to bearophile

Brad Roberts

Posted in reply to bearophile

bearophile wrote:
 >> What the heck is going on? When does memory mapping actually help?<
> 
> You are scanning the file linearly, and the memory window you use is probably very small. In such situation a memory mapping is probably not the best thing. A memory mapping is useful when you for example operate with random access on a wider sliding window on the file.

You can drop the 'sliding' part.  mmap tends to help when doing random access (or sequential but non-contiguous maybe) over a file.  Pure streaming is handled pretty well by both patterns.  One nicity with mmap is that you can hint to the os how you'll be using it via madvise.  You can't do that with [f]read.

Later,
Brad

February 18, 2009

Re: memory-mapped files

Posted by Andrei Alexandrescu
in reply to Brad Roberts

Andrei Alexandrescu

Posted in reply to Brad Roberts

Brad Roberts wrote:
> bearophile wrote:
>  >> What the heck is going on? When does memory mapping actually help?<
>> You are scanning the file linearly, and the memory window you use is
>> probably very small. In such situation a memory mapping is probably
>> not the best thing. A memory mapping is useful when you for example
>> operate with random access on a wider sliding window on the file.
> 
> You can drop the 'sliding' part.  mmap tends to help when doing random
> access (or sequential but non-contiguous maybe) over a file.  Pure
> streaming is handled pretty well by both patterns.  One nicity with mmap
> is that you can hint to the os how you'll be using it via madvise.  You
> can't do that with [f]read.

This all would make perfect sense if the performance was about the same in the two cases. But in fact memory mapping introduced a large *pessimization*. Why? I am supposedly copying less data and doing less work. This is very odd.


Andrei

February 18, 2009

Re: memory-mapped files

Posted by Vladimir Panteleev
in reply to Andrei Alexandrescu

Vladimir Panteleev

Posted in reply to Andrei Alexandrescu

On Wed, 18 Feb 2009 06:22:17 +0200, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Brad Roberts wrote:
>> bearophile wrote:
>>  >> What the heck is going on? When does memory mapping actually help?<
>>> You are scanning the file linearly, and the memory window you use is
>>> probably very small. In such situation a memory mapping is probably
>>> not the best thing. A memory mapping is useful when you for example
>>> operate with random access on a wider sliding window on the file.
>>  You can drop the 'sliding' part.  mmap tends to help when doing random
>> access (or sequential but non-contiguous maybe) over a file.  Pure
>> streaming is handled pretty well by both patterns.  One nicity with mmap
>> is that you can hint to the os how you'll be using it via madvise.  You
>> can't do that with [f]read.
>
> This all would make perfect sense if the performance was about the same in the two cases. But in fact memory mapping introduced a large *pessimization*. Why? I am supposedly copying less data and doing less work. This is very odd.

Perhaps this may help:
http://en.wikipedia.org/wiki/Memory-mapped_file#Drawbacks

-- 
Best regards,
 Vladimir                          mailto:thecybershadow@gmail.com

February 18, 2009

Re: memory-mapped files

Posted by Lionello Lunesu
in reply to Andrei Alexandrescu

Lionello Lunesu

Posted in reply to Andrei Alexandrescu

> The memory-mapped version takes 2.15 seconds on average. I was fighting against Perl's equivalent 2.45. At some point I decided to try without memory mapping and I consistently got 1.75 seconds. What the heck is going on? When does memory mapping actually help?

Random seeking in large files :)

Sequential read can't possibly gain anything by using MM because that's what the OS will end up doing, but MM is using the paging system, which has some overhead (a page fault has quite a penalty, or so I've heard.)

I use std.mmfile for a simple DB implementation, where the DB file is just a large,  >1GB, array of structs, conveniently accessible as a struct[] in D. (Primary key is the index, of course.)

L.

February 18, 2009

Re: memory-mapped files

Posted by BCS
in reply to Lionello Lunesu

BCS

Posted in reply to Lionello Lunesu

Hello Lionello,

>> The memory-mapped version takes 2.15 seconds on average. I was
>> fighting against Perl's equivalent 2.45. At some point I decided to
>> try without memory mapping and I consistently got 1.75 seconds. What
>> the heck is going on? When does memory mapping actually help?
>> 
> Random seeking in large files :)
> 
> Sequential read can't possibly gain anything by using MM because
> that's what the OS will end up doing, but MM is using the paging
> system, which has some overhead (a page fault has quite a penalty, or
> so I've heard.)

paging is going to be built to move date in the fastest possible way so it would be expected that using MM would be fast. The only thing I see getting in the way would be 1) it uses up lots of address space and 2) you might be able to lump reads or hint to the OS to pre load when you load the file other ways. 

It would be neat to see what happens if you MM a file and force page faults on the whole thing right up front (IIRC the is an asm op that forces a page fault but doesn't wait for it). Even better might be to force a page fault for N pages ahead of where you are processing.

February 18, 2009

Re: memory-mapped files

Posted by Kagamin
in reply to Andrei Alexandrescu

Kagamin

Posted in reply to Andrei Alexandrescu

May be mm scheme results in more calls to HDD?

February 18, 2009

Re: memory-mapped files

Posted by Sean Kelly
in reply to Andrei Alexandrescu

Sean Kelly

Posted in reply to Andrei Alexandrescu

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> Brad Roberts wrote:
> > bearophile wrote:
> >  >> What the heck is going on? When does memory mapping actually help?<
> >> You are scanning the file linearly, and the memory window you use is probably very small. In such situation a memory mapping is probably not the best thing. A memory mapping is useful when you for example operate with random access on a wider sliding window on the file.
> >
> > You can drop the 'sliding' part.  mmap tends to help when doing random access (or sequential but non-contiguous maybe) over a file.  Pure streaming is handled pretty well by both patterns.  One nicity with mmap is that you can hint to the os how you'll be using it via madvise.  You can't do that with [f]read.
> This all would make perfect sense if the performance was about the same in the two cases. But in fact memory mapping introduced a large *pessimization*. Why? I am supposedly copying less data and doing less work. This is very odd.

If I had to guess, I'd say that the OS assumes every file will be read in a
linear manner from front to back, and optimizes accordingly.  There's no
way of knowing how a memory-mapped file will be accessed however,
so no such optimization occurs.


Sean

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation