Jump to page: 1 24  
Page
Thread overview
Is there any reasons to not use "mmap" to read files?
Feb 06, 2022
rempas
Feb 06, 2022
Elronnd
Feb 06, 2022
rempas
Feb 06, 2022
Temtaime
Feb 06, 2022
rempas
Feb 06, 2022
Ali Çehreli
Feb 06, 2022
rempas
Feb 06, 2022
Temtaime
Feb 06, 2022
rempas
Feb 06, 2022
H. S. Teoh
Feb 06, 2022
IGotD-
Feb 07, 2022
rempas
Feb 07, 2022
H. S. Teoh
Feb 12, 2022
rempas
Feb 07, 2022
rempas
Feb 06, 2022
Ali Çehreli
Feb 06, 2022
rempas
Feb 06, 2022
Patrick Schluter
Feb 06, 2022
rempas
Feb 08, 2022
norm
Feb 12, 2022
rempas
Feb 08, 2022
sarn
Feb 12, 2022
rempas
Feb 09, 2022
user1234
Feb 12, 2022
rempas
Feb 12, 2022
user1234
Feb 12, 2022
Basile B.
Feb 13, 2022
Ali Çehreli
Feb 13, 2022
H. S. Teoh
Feb 13, 2022
Patrick Schluter
Feb 13, 2022
Patrick Schluter
Feb 13, 2022
Florian Weimer
Feb 13, 2022
rempas
February 06, 2022

This should have probably been posted in the "Learn" section but I thought that it is an advanced topic so maybe people other than me may learn something too. So here we go!

I'm planning to make a change to my program to use "mmap" to the contents of a file rather than "fgetc". This is because I learned that "mmap" can do it faster. The thing is, are there any problems that can occur when using "mmap"? I need to know now because changing this means changing the design of the program and this is not something pleasant to do so I want to be sure that I won't have to change back in the future (where the project will be even bigger).

February 06, 2022
On Sunday, 6 February 2022 at 09:40:48 UTC, rempas wrote:
> I'm planning to make a change to my program to use "mmap" to the contents of a file rather than "fgetc".  This is because I learned that "mmap" can do it faster.  The thing is, are there any problems that can occur when using "mmap"?

Performance is weird, and depends a lot on your access patterns and constraints.  Mmap is not universally fast and, I would argue, really only makes sense in a few constrained circumstances.  I would not switch to mmap just because you heard it was faster; only consider switching if you know i/o is a bottleneck for your application and know mmap is the solution.

https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf  recent, good read.
February 06, 2022
On Sunday, 6 February 2022 at 10:08:24 UTC, Elronnd wrote:
> Performance is weird, and depends a lot on your access patterns and constraints.  Mmap is not universally fast and, I would argue, really only makes sense in a few constrained circumstances.  I would not switch to mmap just because you heard it was faster; only consider switching if you know i/o is a bottleneck for your application and know mmap is the solution.
>
> https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf  recent, good read.

Thank you! I will actually make a compiler so it will just open and read the requested files. I don't know if the database example you linked will be similar to my case (I will of course read it tho) so I have to make my research I guess just to be sure.
February 06, 2022
On Sunday, 6 February 2022 at 10:48:01 UTC, rempas wrote:
> On Sunday, 6 February 2022 at 10:08:24 UTC, Elronnd wrote:
>> Performance is weird, and depends a lot on your access patterns and constraints.  Mmap is not universally fast and, I would argue, really only makes sense in a few constrained circumstances.  I would not switch to mmap just because you heard it was faster; only consider switching if you know i/o is a bottleneck for your application and know mmap is the solution.
>>
>> https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf  recent, good read.
>
> Thank you! I will actually make a compiler so it will just open and read the requested files. I don't know if the database example you linked will be similar to my case (I will of course read it tho) so I have to make my research I guess just to be sure.

Perso i'm almost always use mmap for opening large files for r/w. It IS faster.
Exception are small ones that can be read into the memory using std.file.read for example.
February 06, 2022
On Sunday, 6 February 2022 at 10:53:49 UTC, Temtaime wrote:
> Perso i'm almost always use mmap for opening large files for r/w. It IS faster.
> Exception are small ones that can be read into the memory using std.file.read for example.

Thank you! For how big files are we talking about? Also like another guy told me in another (C) forum, "mmap" is for Unix systems so do you know if Windows or MacOS can emulate that behavior with their memory allocation system calls?
February 06, 2022

On Sunday, 6 February 2022 at 09:40:48 UTC, rempas wrote:

>

This should have probably been posted in the "Learn" section but I thought that it is an advanced topic so maybe people other than me may learn something too. So here we go!

I'm planning to make a change to my program to use "mmap" to the contents of a file rather than "fgetc". This is because I learned that "mmap" can do it faster. The thing is, are there any problems that can occur when using "mmap"? I need to know now because changing this means changing the design of the program and this is not something pleasant to do so I want to be sure that I won't have to change back in the future (where the project will be even bigger).

mmap has quite the overhead to set up the page table for a file. This means for small files, open/read/write calls (and stdio which build on it) are faster.

The other issue with mmap is if you use string functions on the mapped part, you have to make sure that there are 0 bytes in the file or else you risk these functions to overshoot to unmapped pages and crashing the application.

February 06, 2022

On Sunday, 6 February 2022 at 12:52:45 UTC, Patrick Schluter wrote:

>

mmap has quite the overhead to set up the page table for a file. This means for small files, open/read/write calls (and stdio which build on it) are faster.

The other issue with mmap is if you use string functions on the mapped part, you have to make sure that there are 0 bytes in the file or else you risk these functions to overshoot to unmapped pages and crashing the application.

Thank you! After all I've heard, I will probably stick with "read". The files I'm going to read are going to be some kilobytes (megabytes at worse) so I should probably be fine.

February 06, 2022
On 2/6/22 04:21, rempas wrote:
> On Sunday, 6 February 2022 at 10:53:49 UTC, Temtaime wrote:
>> Perso i'm almost always use mmap for opening large files for r/w. It
>> IS faster.

Ditto.

> how big files are we talking about?

So big that they can't fit in memory. For example, I benefit from mmap on a 16G system where a file would be 30G.

As others said, it depends on the use case. If the entire file will be read anyway especially in sequential order, then mmap may not have much benefit. In my use case though it is common to just read unknown small amounts of bytes from unknown places of the huge file. (Say, 5G total out of a 30G.)

Instead of my making multiple reads to those interesting parts of the file, mmap handles everything transparently: Just mmap the whole thing as a single array and access parts of that memory as needed.

One huge improvement is to add madvise(2) system call to the picture to tell the system the exact amount of memory that will be touched so the OS reads in a single shot. Otherwise, the system reads by a default amount, which I think is 4K, which can turn out to be pathetically slow e.g. when the file is accessed over a slow network. (Why read 4K when the need is just 200 bytes and why read in 4K steps when the need is already to be 1M?)

> Also like another guy
> told me in another (C) forum, "mmap" is for Unix systems so do you know
> if Windows or MacOS can emulate that behavior with their memory
> allocation system calls?

I haven't used mmap on Windows but it's in Phobos, so it should work. After all, mmap uses the virtual memory system of the OS and non-ancient Windows versions do use virtual memory and std.mmfile does include 'version (windows)' sections; so, yes. :)

Ali

February 06, 2022
On Sunday, 6 February 2022 at 16:45:59 UTC, Ali Çehreli wrote:
> So big that they can't fit in memory. For example, I benefit from mmap on a 16G system where a file would be 30G.

Oh, this small...

> As others said, it depends on the use case. If the entire file will be read anyway especially in sequential order, then mmap may not have much benefit. In my use case though it is common to just read unknown small amounts of bytes from unknown places of the huge file. (Say, 5G total out of a 30G.)
>
> Instead of my making multiple reads to those interesting parts of the file, mmap handles everything transparently: Just mmap the whole thing as a single array and access parts of that memory as needed.

Thank you! I will have that in mind in case I want to do something like that in the future. In my use-case tho, I will read the whole file.

> One huge improvement is to add madvise(2) system call to the picture to tell the system the exact amount of memory that will be touched so the OS reads in a single shot. Otherwise, the system reads by a default amount, which I think is 4K, which can turn out to be pathetically slow e.g. when the file is accessed over a slow network. (Why read 4K when the need is just 200 bytes and why read in 4K steps when the need is already to be 1M?)
>
> I haven't used mmap on Windows but it's in Phobos, so it should work. After all, mmap uses the virtual memory system of the OS and non-ancient Windows versions do use virtual memory and std.mmfile does include 'version (windows)' sections; so, yes. :)
>
> Ali

"mmap" is a system call that doesn't exist (natively) on Windows. I don't know what D does with Phobos (which I'm not gonna use anyway) but even if it works (how?), I will end up creating my own library so I'm in the same spot. "madvise" seems cool, I'll check it out! Thanks! In the end, I like advising and telling others how to do their work, XD!
February 06, 2022
On Sunday, 6 February 2022 at 18:14:51 UTC, rempas wrote:
> On Sunday, 6 February 2022 at 16:45:59 UTC, Ali Çehreli wrote:
>> [...]
>
> Oh, this small...
>
>> [...]
>
> Thank you! I will have that in mind in case I want to do something like that in the future. In my use-case tho, I will read the whole file.
>
>> [...]
>
> "mmap" is a system call that doesn't exist (natively) on Windows. I don't know what D does with Phobos (which I'm not gonna use anyway) but even if it works (how?), I will end up creating my own library so I'm in the same spot. "madvise" seems cool, I'll check it out! Thanks! In the end, I like advising and telling others how to do their work, XD!

Windows has its own API to mmap files. There's no need to reinvent the wheel, phobos MmFile works for me without any problems.
Maybe there's no flush function, but for my use cases it's not so critical.
« First   ‹ Prev
1 2 3 4