D and i/o - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » D and i/o

Thread overview

D and i/o
Nov 09, 2019 bioinfornatics
Nov 10, 2019 bioinfornatics
Nov 10, 2019 Jonathan Marler
Nov 10, 2019 bioinfornatics
Nov 10, 2019 bioinfornatics
Nov 11, 2019 Daniel Kozak
Nov 12, 2019 Daniel Kozak
Nov 10, 2019 Jonathan Marler
Nov 11, 2019 Steven Schveighoffer
Nov 10, 2019 Jon Degenhardt
Nov 10, 2019 Jonathan Marler
Nov 10, 2019 Jon Degenhardt
Nov 11, 2019 sarn
Nov 11, 2019 Jacob Carlborg
Nov 11, 2019 Jonathan Marler
Nov 12, 2019 Patrick Schluter
Nov 11, 2019 Patrick Schluter
Nov 12, 2019 ikod

November 09, 2019

Posted by bioinfornatics

bioinfornatics

Dear,

In my field we are io bound thus I would like to have our tools fast as I can read a file.

Thus I started some dummy bench which count the number of lines.
The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process.


https://github.com/bioinfornatics/test_io

Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.

November 10, 2019

Posted by bioinfornatics
in reply to bioinfornatics

bioinfornatics

Posted in reply to bioinfornatics

On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
> Dear,
>
> In my field we are io bound thus I would like to have our tools fast as I can read a file.
>
> Thus I started some dummy bench which count the number of lines.
> The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process.
>
>
> https://github.com/bioinfornatics/test_io
>
> Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.

If you have some scripts or enhancements you are welcome

Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts

November 10, 2019

Posted by Jonathan Marler
in reply to bioinfornatics

Jonathan Marler

Posted in reply to bioinfornatics

On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
> On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
>> [...]
>
> If you have some scripts or enhancements you are welcome
>
> Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts

I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.

November 10, 2019

Posted by Jonathan Marler
in reply to bioinfornatics

Jonathan Marler

Posted in reply to bioinfornatics

On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
> On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
>> Dear,
>>
>> In my field we are io bound thus I would like to have our tools fast as I can read a file.
>>
>> Thus I started some dummy bench which count the number of lines.
>> The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process.
>>
>>
>> https://github.com/bioinfornatics/test_io
>>
>> Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.
>
> If you have some scripts or enhancements you are welcome
>
> Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts

Here's an example implementation of wc using mmap:

#!/usr/bin/env rdmd
import std.stdio, std.algorithm, std.mmfile;

void main(string[] args)
{
    foreach (arg; args[1..$])
    {
        auto file = new MmFile(arg, MmFile.Mode.read, 0, null);
        auto content = cast(char[])file.opSlice;
        writefln("%s", content.count('\n'));
    }
}

November 10, 2019

Posted by bioinfornatics
in reply to Jonathan Marler

bioinfornatics

Posted in reply to Jonathan Marler

On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler wrote:
> On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
>> On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
>>> [...]
>>
>> If you have some scripts or enhancements you are welcome
>>
>> Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
>
> I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.

a)
Thanks Jonathan, I plan to add a script using mmap. It is definitely into my todo list.

b)
On linux et seem that kernel could handle // read through asynchronous read ,describe here: https://oxnz.github.io/2016/10/13/linux-aio/.

c)
I plan too perform same file process than describe here in C# : https://oxnz.github.io/2016/10/13/linux-aio/

November 10, 2019

Posted by bioinfornatics
in reply to bioinfornatics

bioinfornatics

Posted in reply to bioinfornatics

On Sunday, 10 November 2019 at 07:43:31 UTC, bioinfornatics wrote:
> On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler wrote:
>> On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
>>> [...]
>>
>> I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.
>
> a)
> Thanks Jonathan, I plan to add a script using mmap. It is definitely into my todo list.
>
> b)
> On linux et seem that kernel could handle // read through asynchronous read ,describe here: https://oxnz.github.io/2016/10/13/linux-aio/.
>
> c)
> I plan too perform same file process than describe here in C# : https://oxnz.github.io/2016/10/13/linux-aio/

Oops, here the C# benchmark
https://cc.davelozinski.com/c-sharp/fastest-way-to-read-text-files

November 10, 2019

Posted by Jon Degenhardt
in reply to bioinfornatics

Jon Degenhardt

Posted in reply to bioinfornatics

On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
> Dear,
>
> In my field we are io bound thus I would like to have our tools fast as I can read a file.
>
> Thus I started some dummy bench which count the number of lines.
> The result is compared to wc -l command. The line counting is only a pretext to evaluate the io, this process can be switched by any io processing. Thus we use much as possible the buffer instead the byLine range. Moreover such range imply that the buffer was read once before to be ready to process.
>
>
> https://github.com/bioinfornatics/test_io
>
> Ideally I would like to process a shared buffer through multiple core and run a simd computation. But it is not yet done.

You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included.

A general observation is that if lines are involved, it's important to measure performance of both short and long lines. This may even affect 'wc' when reading by chunk or memory mapped files, see H. S. Teoh's observations on 'wc' performance: https://forum.dlang.org/post/mailman.664.1571878411.8294.digitalmars-d@puremagic.com.

As an aside - My preliminary conclusion is that phobos facilities are overall quite good (based on tsv-utils comparative performance benchmarks), but are non-optimal when short lines are involved. This is the case for both input and output. Both the tsv-utils covers and iopipe are better, with iopipe being the best for input, but appears to need some further work on the output side (or I don't know iopipe well enough). By "preliminary", I mean just that. There could certainly be mistakes or incomplete analysis in the tests.

--Jon

November 10, 2019

Posted by Jonathan Marler
in reply to Jon Degenhardt

Jonathan Marler

Posted in reply to Jon Degenhardt

On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:
> On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
>> [...]
>
> You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included.
>
> A general observation is that if lines are involved, it's important to measure performance of both short and long lines. This may even affect 'wc' when reading by chunk or memory mapped files, see H. S. Teoh's observations on 'wc' performance: https://forum.dlang.org/post/mailman.664.1571878411.8294.digitalmars-d@puremagic.com.
>
> As an aside - My preliminary conclusion is that phobos facilities are overall quite good (based on tsv-utils comparative performance benchmarks), but are non-optimal when short lines are involved. This is the case for both input and output. Both the tsv-utils covers and iopipe are better, with iopipe being the best for input, but appears to need some further work on the output side (or I don't know iopipe well enough). By "preliminary", I mean just that. There could certainly be mistakes or incomplete analysis in the tests.
>
> --Jon

For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.

November 10, 2019

Posted by Jon Degenhardt
in reply to Jonathan Marler

Jon Degenhardt

Posted in reply to Jonathan Marler

On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler wrote:
> On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:
>> On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
>>> [...]
>>
>> You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included.
>>
>> [...]
>
> For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.

Thanks, I wasn't aware of this. But perhaps I should describe the motivation in more detail. I'm not actually interested in 'cat' per se, it is just a stand-in for the more general processing I'm typically interested in. In every case I'm operating on the records in some form (lines or something else), making a transformation, and depending on application, writing something out. This is the case in tsv-utils as well as many scenarios of the systems I work on (search engines). These applications sometimes operate on data streams, sometimes on complete files. Hence my interest in line-oriented I/O performance.

Obviously there is a lot more ground in the general set of applications I'm interested in than is covered in the simple performance tests in dcat-perf, but it's a starting point. It's also why I didn't make comparisons to existing versions of 'cat'.

November 11, 2019

Posted by sarn
in reply to Jonathan Marler

sarn

Posted in reply to Jonathan Marler

On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler wrote:
> For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.

FTR, that sounds like Linux's sendfile and splice syscalls.  They're not portable, though.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation