November 11, 2019
On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler wrote:
> On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:
>> [...]
>
> For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.

Looks like sendfile(), which as said is not portable. It exists on different Unixes but with different semantics. Requires also a bit of work around because of its limitations. On Linux it can only send at most 0x7ffff000 (2,147,479,552) bytes for example. I used it to implement a cp and it is indeed quite fast and definitely easier to use than mmap, which is often very difficult to get right (I'm talking C here).




November 11, 2019
On 11/10/19 2:16 AM, bioinfornatics wrote:
> On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
>> Dear,
>>
>> In my field we are io bound thus I would like to have our tools fast as I can read a file.
>>
>> Thus I started some dummy bench which count the number of lines.
>> The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process.
>>
>>
>> https://github.com/bioinfornatics/test_io
>>
>> Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.
> 
> If you have some scripts or enhancements you are welcome
> 
> Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts

I will say from my experience with iopipe, the secret to counting lines is memchr.

After switching to memchr to find single bytes as an optimization, I was beating Linux getline. Both use memchr, but getline does extra processing to ensure the FILE * state is maintained.

See https://github.com/schveiguy/iopipe/blob/6fa58b67bc9cadeb5ccded0d686f0fd116aed1ed/examples/byline/byline.d

If you run that like:

iopipe_byline -nooutput < filetocheck.txt

that's about as fast as I can get without using mmap, should be comparable to wc -l. And it should work fine with all encodings (though only UTF8 is optimized with memchr, should work on that).

-Steve
November 11, 2019
On Sun, Nov 10, 2019 at 8:45 AM bioinfornatics via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>
> On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler wrote:
>
> b)
> On linux et seem that kernel could handle // read through
> asynchronous read ,describe
> here: https://oxnz.github.io/2016/10/13/linux-aio/.

Do not use that. If you want AIO on linux you should use io_uring

https://www.phoronix.com/scan.php?page=news_item&px=Linux-io_uring-Fast-Efficient

I have been using for some time and it is really fast. The only issue is you need recent kernels
November 11, 2019
On 2019-11-11 02:04, sarn wrote:

> FTR, that sounds like Linux's sendfile and splice syscalls. They're not portable, though.

"sendfile" is intended to send a file over a socket?

-- 
/Jacob Carlborg
November 11, 2019
On Monday, 11 November 2019 at 19:36:22 UTC, Jacob Carlborg wrote:
> On 2019-11-11 02:04, sarn wrote:
>
>> FTR, that sounds like Linux's sendfile and splice syscalls. They're not portable, though.
>
> "sendfile" is intended to send a file over a socket?

You could use it to send a file over a socket.  However, it should be usable to forward data between any 2 file descriptors.   I believe that `cat` uses it to forward a file handle to stdio for example.  Or you could use it to implement `cp` to copy from content from one file to another.
November 12, 2019
On Monday, 11 November 2019 at 10:14:51 UTC, Patrick Schluter wrote:
> On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler wrote:
>> On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:
>>> [...]
>>
>> For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.
>
> Looks like sendfile(), which as said is not portable. It exists on different Unixes but with different semantics. Requires also

There are more non-portable options for fast disk io - O_DIRECT flag for open()[1] and readahead()[2].

1. http://man7.org/linux/man-pages/man2/open.2.html
2. http://man7.org/linux/man-pages/man2/readahead.2.html

November 12, 2019
On Monday, 11 November 2019 at 19:36:22 UTC, Jacob Carlborg wrote:
> On 2019-11-11 02:04, sarn wrote:
>
>> FTR, that sounds like Linux's sendfile and splice syscalls. They're not portable, though.
>
> "sendfile" is intended to send a file over a socket?

It works with any file handle. I used it to implement cp and I had used it with pipes. Its only limitation is the 0x7FFF0000 limit, but a 3 line loop takes care of that easily.
November 12, 2019
On Sun, Nov 10, 2019 at 8:45 AM bioinfornatics via Digitalmars-d
<digitalmars-d@puremagic.com> wrote:
On Sunday, 10 November 2019 at 07:43:31 UTC, bioinfornatics wrote:
>
> On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler
> wrote:
>
> b)
> On linux et seem that kernel could handle // read through
> asynchronous read ,describe
> here: https://oxnz.github.io/2016/10/13/linux-aio/.

Do not use that. If you want AIO on linux you should use io_uring

https://www.phoronix.com/scan.php?page=news_item&px=Linux-io_uring-Fast-Efficient

I have been using for some time and it is really fast. The only issue
is you need recent kernels
1 2
Next ›   Last »