Jump to page: 1 2
Thread overview
D and i/o
Nov 09, 2019
bioinfornatics
Nov 10, 2019
bioinfornatics
Nov 10, 2019
Jonathan Marler
Nov 10, 2019
bioinfornatics
Nov 10, 2019
bioinfornatics
Nov 11, 2019
Daniel Kozak
Nov 12, 2019
Daniel Kozak
Nov 10, 2019
Jonathan Marler
Nov 10, 2019
Jon Degenhardt
Nov 10, 2019
Jonathan Marler
Nov 10, 2019
Jon Degenhardt
Nov 11, 2019
sarn
Nov 11, 2019
Jacob Carlborg
Nov 11, 2019
Jonathan Marler
Nov 12, 2019
Patrick Schluter
Nov 11, 2019
Patrick Schluter
Nov 12, 2019
ikod
November 09, 2019
Dear,

In my field we are io bound thus I would like to have our tools fast as I can read a file.

Thus I started some dummy bench which count the number of lines.
The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process.


https://github.com/bioinfornatics/test_io

Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.
November 10, 2019
On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
> Dear,
>
> In my field we are io bound thus I would like to have our tools fast as I can read a file.
>
> Thus I started some dummy bench which count the number of lines.
> The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process.
>
>
> https://github.com/bioinfornatics/test_io
>
> Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.

If you have some scripts or enhancements you are welcome

Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
November 10, 2019
On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
> On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
>> [...]
>
> If you have some scripts or enhancements you are welcome
>
> Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts

I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.
November 10, 2019
On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
> On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
>> Dear,
>>
>> In my field we are io bound thus I would like to have our tools fast as I can read a file.
>>
>> Thus I started some dummy bench which count the number of lines.
>> The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process.
>>
>>
>> https://github.com/bioinfornatics/test_io
>>
>> Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.
>
> If you have some scripts or enhancements you are welcome
>
> Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts

Here's an example implementation of wc using mmap:

#!/usr/bin/env rdmd
import std.stdio, std.algorithm, std.mmfile;

void main(string[] args)
{
    foreach (arg; args[1..$])
    {
        auto file = new MmFile(arg, MmFile.Mode.read, 0, null);
        auto content = cast(char[])file.opSlice;
        writefln("%s", content.count('\n'));
    }
}

November 10, 2019
On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler wrote:
> On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
>> On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
>>> [...]
>>
>> If you have some scripts or enhancements you are welcome
>>
>> Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
>
> I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.

a)
Thanks Jonathan, I plan to add a script using mmap. It is definitely into my todo list.

b)
On linux et seem that kernel could handle // read through asynchronous read ,describe here: https://oxnz.github.io/2016/10/13/linux-aio/.

c)
I plan too perform same file process than describe here in C# : https://oxnz.github.io/2016/10/13/linux-aio/
November 10, 2019
On Sunday, 10 November 2019 at 07:43:31 UTC, bioinfornatics wrote:
> On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler wrote:
>> On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
>>> [...]
>>
>> I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.
>
> a)
> Thanks Jonathan, I plan to add a script using mmap. It is definitely into my todo list.
>
> b)
> On linux et seem that kernel could handle // read through asynchronous read ,describe here: https://oxnz.github.io/2016/10/13/linux-aio/.
>
> c)
> I plan too perform same file process than describe here in C# : https://oxnz.github.io/2016/10/13/linux-aio/

Oops, here the C# benchmark
https://cc.davelozinski.com/c-sharp/fastest-way-to-read-text-files
November 10, 2019
On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
> Dear,
>
> In my field we are io bound thus I would like to have our tools fast as I can read a file.
>
> Thus I started some dummy bench which count the number of lines.
> The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process.
>
>
> https://github.com/bioinfornatics/test_io
>
> Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.

You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included.

A general observation is that if lines are involved, it's important to measure performance of both short and long lines. This may even affect 'wc' when reading by chunk or memory mapped files, see H. S. Teoh's observations on 'wc' performance: https://forum.dlang.org/post/mailman.664.1571878411.8294.digitalmars-d@puremagic.com.

As an aside - My preliminary conclusion is that phobos facilities are overall quite good (based on tsv-utils comparative performance benchmarks), but are non-optimal when short lines are involved. This is the case for both input and output. Both the tsv-utils covers and iopipe are better, with iopipe being the best for input, but appears to need some further work on the output side (or I don't know iopipe well enough). By "preliminary", I mean just that. There could certainly be mistakes or incomplete analysis in the tests.

--Jon
November 10, 2019
On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:
> On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
>> [...]
>
> You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included.
>
> A general observation is that if lines are involved, it's important to measure performance of both short and long lines. This may even affect 'wc' when reading by chunk or memory mapped files, see H. S. Teoh's observations on 'wc' performance: https://forum.dlang.org/post/mailman.664.1571878411.8294.digitalmars-d@puremagic.com.
>
> As an aside - My preliminary conclusion is that phobos facilities are overall quite good (based on tsv-utils comparative performance benchmarks), but are non-optimal when short lines are involved. This is the case for both input and output. Both the tsv-utils covers and iopipe are better, with iopipe being the best for input, but appears to need some further work on the output side (or I don't know iopipe well enough). By "preliminary", I mean just that. There could certainly be mistakes or incomplete analysis in the tests.
>
> --Jon

For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.
November 10, 2019
On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler wrote:
> On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:
>> On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
>>> [...]
>>
>> You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included.
>>
>> [...]
>
> For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.

Thanks, I wasn't aware of this. But perhaps I should describe the motivation in more detail. I'm not actually interested in 'cat' per se, it is just a stand-in for the more general processing I'm typically interested in. In every case I'm operating on the records in some form (lines or something else), making a transformation, and depending on application, writing something out. This is the case in tsv-utils as well as many scenarios of the systems I work on (search engines). These applications sometimes operate on data streams, sometimes on complete files. Hence my interest in line-oriented I/O performance.

Obviously there is a lot more ground in the general set of applications I'm interested in than is covered in the simple performance tests in dcat-perf, but it's a starting point. It's also why I didn't make comparisons to existing versions of 'cat'.
November 11, 2019
On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler wrote:
> For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.

FTR, that sounds like Linux's sendfile and splice syscalls.  They're not portable, though.
« First   ‹ Prev
1 2