February 08, 2022

On Sunday, 6 February 2022 at 09:40:48 UTC, rempas wrote:

>

This should have probably been posted in the "Learn" section but I thought that it is an advanced topic so maybe people other than me may learn something too. So here we go!

I'm planning to make a change to my program to use "mmap" to the contents of a file rather than "fgetc". This is because I learned that "mmap" can do it faster. The thing is, are there any problems that can occur when using "mmap"? I need to know now because changing this means changing the design of the program and this is not something pleasant to do so I want to be sure that I won't have to change back in the future (where the project will be even bigger).

One reason to use read/write based I/O by default is that it's more versatile. It's kind of like an input range versus a random access in Phobos.

// Could not map file /dev/stdin (Invalid argument)
auto f = new MmFile("/dev/stdin");
February 09, 2022

On Sunday, 6 February 2022 at 09:40:48 UTC, rempas wrote:

>

This should have probably been posted in the "Learn" section but I thought that it is an advanced topic so maybe people other than me may learn something too. So here we go!

I'm planning to make a change to my program to use "mmap" to the contents of a file rather than "fgetc". This is because I learned that "mmap" can do it faster. The thing is, are there any problems that can occur when using "mmap"? I need to know now because changing this means changing the design of the program and this is not something pleasant to do so I want to be sure that I won't have to change back in the future (where the project will be even bigger).

std.file.readText() is just fine... your really want to do an os with call fgetc for every single byte that has to be read ?

February 12, 2022
On Monday, 7 February 2022 at 18:31:42 UTC, H. S. Teoh wrote:
>
> I tried reading GLIB source code once. I will never ever do it again. :-P

C!!!! You gotta love it, lol!


>
> If it's in C? Yeah, they all look like that.
>
>
> T

Don't be so sure about that! Everything "GNU" seems to be bloated but try to read some *BSD libc source code. It's both a little bit more readable and more organized, minimal and simple to understand.
February 12, 2022

On Tuesday, 8 February 2022 at 03:33:11 UTC, Steven Schveighoffer wrote:

>

Will mmap be faster than fgetc? Almost certainly.

Will it be faster than other i/o systems? Possibly not.

for my i/o system iopipe, every array is also an iopipe, so switching between mmap and file i/o is trivial. See my talk in 2017 where I switched to mmap while on stage to show the difference.

IMO, the best way to determine which is better is to try it and measure. Having an i/o system that allows easy switching is helpful.

For sure, depending on your other tasks in your program, improving the file i/o might be insignificant.

-Steve

Thanks for your time Steve! I will do proper testing like you suggested and see! It will take some time but I think it's worth it rather than randomly choose between one of them :)

February 12, 2022

On Tuesday, 8 February 2022 at 21:37:29 UTC, sarn wrote:

>

One reason to use read/write based I/O by default is that it's more versatile. It's kind of like an input range versus a random access in Phobos.

// Could not map file /dev/stdin (Invalid argument)
auto f = new MmFile("/dev/stdin");

Yeah, thank you! I will open and read the whole file anyways so it seems that it makes sense to try with this method and then measurement my program in the future to see! Have a nice day!

February 12, 2022

On Wednesday, 9 February 2022 at 02:07:05 UTC, user1234 wrote:

>

std.file.readText() is just fine... your really want to do an os with call fgetc for every single byte that has to be read ?

Good point! I was really wondering if "fgetc" does a system call every single time that it is called or if the text is buffered just like with "printf". I will use "read" in any case just to be sure tho. I don't want to use Phobos tho so I cannot use "file.readText". Thank you for your time!

February 12, 2022

On Saturday, 12 February 2022 at 13:17:19 UTC, rempas wrote:

>

On Wednesday, 9 February 2022 at 02:07:05 UTC, user1234 wrote:

>

std.file.readText() is just fine... your really want to do an os with call fgetc for every single byte that has to be read ?

Good point! I was really wondering if "fgetc" does a system call every single time that it is called or if the text is buffered just like with "printf". I will use "read" in any case just to be sure tho. I don't want to use Phobos tho so I cannot use "file.readText". Thank you for your time!

I think that nowadays fgetc does not make sense anymore, maybe in the past when the amount of memory available was very reduced... source files are 100 kb top. You can load 100 of them, the fingerprint is still small. What will likely consume the more is the AST.

Otherwise readText is easy to translate, it's just fopen then fread then fclose, + a few checks for the errors, not a big deal to translate.

February 12, 2022

On Saturday, 12 February 2022 at 16:48:26 UTC, user1234 wrote:

>

On Saturday, 12 February 2022 at 13:17:19 UTC, rempas wrote:

>

On Wednesday, 9 February 2022 at 02:07:05 UTC, user1234 wrote:

>

std.file.readText() is just fine... your really want to do an os with call fgetc for every single byte that has to be read ?

Good point! I was really wondering if "fgetc" does a system call every single time that it is called or if the text is buffered just like with "printf". I will use "read" in any case just to be sure tho. I don't want to use Phobos tho so I cannot use "file.readText". Thank you for your time!

I think that nowadays fgetc does not make sense anymore, maybe in the past when the amount of memory available was very reduced... source files are 100 kb top. You can load 100 of them, the fingerprint is still small. What will likely consume the more is the AST.

Otherwise readText is easy to translate, it's just fopen then fread then fclose, + a few checks for the errors, not a big deal to translate.

The problem with phobos and if used to program a compiler is dynamic arrays, because of how they are managed.

With Styx I had used phobos because I knew the memory management was designed to work similarly with arrays, i.e functions can return arrays, but using the "sink" style would have not caused any problem (by "sink" style I mean when the buffer is owned by the calling frame, and passed as parameter, e.g like in many C-style APIs)

Then the amount of phobos code to translate in order to bootstrap was minimal:

std.paths:

  • isAbsolute
  • isDir
  • isFile
  • dirName
  • baseName
  • exists
  • cwd
  • dirEntries
  • setExtension

std.files:

  • read (or readText)
  • write (not even used I realize now)

std.process

  • pipeProcess (actually just used to optionally --run after compile)

std.getopt

  • getopt (tho libc functions for that could have been used... dmd itself doesnt have any special functions for the arg processing in the driver IIRC)

Add to this a few things from libc and unistd and you're good. You dont need more.

February 12, 2022

On 2/8/22 9:07 PM, user1234 wrote:

>

On Sunday, 6 February 2022 at 09:40:48 UTC, rempas wrote:

>

This should have probably been posted in the "Learn" section but I thought that it is an advanced topic so maybe people other than me may learn something too. So here we go!

I'm planning to make a change to my program to use "mmap" to the contents of a file rather than "fgetc". This is because I learned that "mmap" can do it faster. The thing is, are there any problems that can occur when using "mmap"? I need to know now because changing this means changing the design of the program and this is not something pleasant to do so I want to be sure that I won't have to change back in the future (where the project will be even bigger).

std.file.readText() is just fine... your really want to do an os with call fgetc for every single byte that has to be read ?

Just a clarification here -- fgetc does NOT do an OS system call for every character. It's a C library function, which uses a FILE *. And this is not a new development -- my ANSI C book from 1988 talks about how FILE has a buffer.

While it does not do a system call (unless the buffer is empty and it needs to fill the buffer), it's still an opaque call, which might cost a decent amount if you are reading by character.

-Steve

February 12, 2022
On 2/12/22 05:17, rempas wrote:

> a system call every single time

I have a related experience: I realized that very many ftell() calls that I were making were very costly. I saved a lot of time after realizing that I did not need to make the calls because I could maintain a 'long' variable to keep track of where I was in the file.

I assumed ftell() would do the same but apparently not.

Ali