February 12, 2022
On Sat, Feb 12, 2022 at 07:01:09PM -0800, Ali Çehreli via Digitalmars-d wrote:
> On 2/12/22 05:17, rempas wrote:
> 
> > a system call every single time
> 
> I have a related experience: I realized that very many ftell() calls that I were making were very costly. I saved a lot of time after realizing that I did not need to make the calls because I could maintain a 'long' variable to keep track of where I was in the file.
> 
> I assumed ftell() would do the same but apparently not.
[...]

I think the reason is the ftell involves an OS API call, because fread() uses the underlying read() syscall which reads from where it left off last, and there could be multiple threads reading from the same file descriptor, so the only way for fseek/ftell to work correctly is via a syscall into the kernel.  Obviously, this would be expensive, as it would involve a kernel context-switch as well as acquiring and releasing a lock on the file descriptor.


T

-- 
Too many people have open minds but closed eyes.
February 12, 2022

On 2/12/22 10:13 PM, H. S. Teoh wrote:

>

On Sat, Feb 12, 2022 at 07:01:09PM -0800, Ali Çehreli via Digitalmars-d wrote:

>

On 2/12/22 05:17, rempas wrote:

>

a system call every single time

I have a related experience: I realized that very many ftell() calls
that I were making were very costly. I saved a lot of time after
realizing that I did not need to make the calls because I could
maintain a 'long' variable to keep track of where I was in the file.

I assumed ftell() would do the same but apparently not.
[...]

I think the reason is the ftell involves an OS API call, because fread()
uses the underlying read() syscall which reads from where it left off
last, and there could be multiple threads reading from the same file
descriptor, so the only way for fseek/ftell to work correctly is via a
syscall into the kernel. Obviously, this would be expensive, as it
would involve a kernel context-switch as well as acquiring and releasing
a lock on the file descriptor.

ftell does not need to do a system call to get the current file position. But otherwise it has to store the offset of the file somewhere (which it does not). In fact, if you move the file pointer underneath (by using another thread to read from it, or e.g. with lseek), you will completely invalidate what ftell returns (try it!)

What ftell basically does is to a system call to lseek to get the current file position, then subtracts the difference between the current buffer offset and the buffer size.

This is not the same for fgetc. That only depends on the buffer, and not anything from the OS (after the buffer is filled).

-Steve

February 13, 2022
On Sunday, 13 February 2022 at 03:01:09 UTC, Ali Çehreli wrote:
> On 2/12/22 05:17, rempas wrote:
>
> > a system call every single time
>
> I have a related experience: I realized that very many ftell() calls that I were making were very costly. I saved a lot of time after realizing that I did not need to make the calls because I could maintain a 'long' variable to keep track of where I was in the file.
>
> I assumed ftell() would do the same but apparently not.
>

ftell() and fseek() use a syscall but also trigger that the next stdio read call (fgets, fgetc, fread, fscanf etc.) will systematically read its internal buffer again. If you make an itrace on an app with a fseek (ftell is often implement by using a relative seek of 0 call) yo will see something like

That's why one should avoid using seek when working with buffered stdio.
February 13, 2022
On Sunday, 13 February 2022 at 03:13:43 UTC, H. S. Teoh wrote:
> On Sat, Feb 12, 2022 at 07:01:09PM -0800, Ali Çehreli via Digitalmars-d wrote:
>> On 2/12/22 05:17, rempas wrote:
>> 
>> > a system call every single time
>> 
>> I have a related experience: I realized that very many ftell() calls that I were making were very costly. I saved a lot of time after realizing that I did not need to make the calls because I could maintain a 'long' variable to keep track of where I was in the file.
>> 
>> I assumed ftell() would do the same but apparently not.
> [...]
>
> I think the reason is the ftell involves an OS API call, because fread() uses the underlying read() syscall which reads from where it left off last, and there could be multiple threads reading from the same file descriptor, so the only way for fseek/ftell to work correctly is via a syscall into the kernel.  Obviously, this would be expensive, as it would involve a kernel context-switch as well as acquiring and releasing a lock on the file descriptor.
>
fread reads from its internal buffer when it can. By default it uses 1 page (4096 bytes on x86 and ARM). After a seek operation it will always try to fill the buffer with 4096 bytes (of course the read syscall might return less). As long as the reads are within the buffer fread() will not invoke a read syscall.



February 13, 2022

One issue that hasn't been mentioned so far is that if the input file is truncated, accessing is memory-mapped view results in SIGBUS on Linux and other systems. (I think Windows prevents truncation instead.)

In theory, it is possible to intercept that signal and turn it into something else (Java does that), but I don't think the D implementation does that.

February 13, 2022

On 2/13/22 6:02 AM, Patrick Schluter wrote:

>

On Sunday, 13 February 2022 at 03:13:43 UTC, H. S. Teoh wrote:

>

On Sat, Feb 12, 2022 at 07:01:09PM -0800, Ali Çehreli via Digitalmars-d wrote:

>

On 2/12/22 05:17, rempas wrote:

>

a system call every single time

I have a related experience: I realized that very many ftell() calls that I were making were very costly. I saved a lot of time after realizing that I did not need to make the calls because I could maintain a 'long' variable to keep track of where I was in the file.

I assumed ftell() would do the same but apparently not.
[...]

I think the reason is the ftell involves an OS API call, because fread() uses the underlying read() syscall which reads from where it left off last, and there could be multiple threads reading from the same file descriptor, so the only way for fseek/ftell to work correctly is via a syscall into the kernel.  Obviously, this would be expensive, as it would involve a kernel context-switch as well as acquiring and releasing a lock on the file descriptor.

fread reads from its internal buffer when it can. By default it uses 1 page (4096 bytes on x86 and ARM). After a seek operation it will always try to fill the buffer with 4096 bytes (of course the read syscall might return less). As long as the reads are within the buffer fread() will not invoke a read syscall.

If you seek within the buffer it could potentially leave the buffer alone. But it chooses to flush the buffer completely. Not sure why it does that. It's not so it can keep the data filled, it tries to read the full buffer at that point (meaning it removed all the buffered data).

This could be potentially really slow if you were skipping a few bytes at a time using fseek, as it would reload the entire buffer every seek.

-Steve

February 13, 2022

On Sunday, 13 February 2022 at 12:55:43 UTC, Florian Weimer wrote:

>

One issue that hasn't been mentioned so far is that if the input file is truncated, accessing is memory-mapped view results in SIGBUS on Linux and other systems. (I think Windows prevents truncation instead.)

In theory, it is possible to intercept that signal and turn it into something else (Java does that), but I don't think the D implementation does that.

Thank you for the info! That's very important and I'll keep in in mind!

1 2 3 4
Next ›   Last »