Jump to page: 1 2 3
Thread overview
Another new io library
Feb 17, 2016
Rikki Cattermole
Feb 17, 2016
yawniek
Feb 17, 2016
John Colvin
Feb 17, 2016
Adam D. Ruppe
Feb 18, 2016
Wyatt
Feb 18, 2016
Wyatt
Feb 18, 2016
H. S. Teoh
Feb 17, 2016
deadalnix
Feb 17, 2016
Jonathan M Davis
Feb 18, 2016
deadalnix
Feb 18, 2016
Wyatt
Feb 18, 2016
Wyatt
Feb 19, 2016
Kagamin
Feb 18, 2016
Chad Joan
Feb 19, 2016
Chad Joan
Feb 19, 2016
Dejan Lekic
February 17, 2016
It's no secret that I've been looking to create an updated io library for phobos. In fact, I've been working on one on and off since 2011 (ouch).

After about 5 iterations of API and design, and testing out ideas, I think I have come up with something pretty interesting. It started out as a plan to replace std.stdio (and that did not go over well: https://forum.dlang.org/post/j3u0l4$1atr$1@digitalmars.com), in addition to trying to find a better way to deal with i/o. However, I've scaled back my plan of world domination to just try for the latter, and save tackling the replacement of Phobos's i/o guts for a later battle, if at all. It's much easier to reason about something new than to muddle the discussion with how it will break code. It's also much easier to build something that doesn't have to be a drop-in replacement of something so insanely complex.

I also have been inspired over the last few years by various great presentations and libraries, two being Dmitry's proof-of-concept library to have buffers that automatically move/fill when more data is needed, and Andrei's std.allocator library. They have changed drastically the way I have approached this challenge.

Therefore, I now have a new dub-based repository available for playing with: https://github.com/schveiguy/iopipe. First, the candy:

- This is a piping library. It allows one to hook buffered i/o through various processors/transformers much like unix pipes or range functions/algorithms. However, unlike unix pipes, this library attempts to make as few copies as possible of the data.

example:

foreach(line; (new IODevice(0)).bufferedInput
    .asText!(UTFType.UTF8)
    .byLine
    .asInputRange)
   // handle line

- It can handle 5 forms of UTF encoding - UTF8, UTF16, UTF16LE, UTF32, UTF32LE (phobos only partially handles UTF8). Sorry, no grapheme support or other utf-related things, but this of course can be added later.

- Arrays are first-class ioPipe types. This works:

foreach(line; "one\ntwo\nthree\nfour\n".byLine.asInputRange)

- Everything is compile-time for the most part, and uses lots of introspection. The intent is to give the compiler full gamut of optimization capabilities.

- I added rudimentary compression/decompression support using etc.c.zlib. Using compression is done like so:

foreach(line; (new IODevice(0)).bufferedInput
    .unzip
    .asText!(UTFType.UTF8)
    .byLine
    .asInputRange)

- The plan is for this to be a basis to make super-fast and modular parsing libraries. I plan to write a JSON one as a proof of concept. So all you have to do is add a parseJSON function to the end of any chain, as long as the the input is some pipe of text data (including a string literal).


=================

I will stress some very very important things:

1. This library is FAR from finished. Even the concepts probably need some tweaking. But I'm very happy with the current API/usage.

2. Docs are very thin. Unit tests are sparse (but do pass).

3. The focus of this library is NOT replacement of std.stream, or even low-level i/o in general. In fact, I have copied over my stream class from previous attempts at this i/o rewrite ONLY as a mechanism to have something that can read/write from file descriptors with the right API (located in iopipe/stream.d). I admit to never having looked at std.stream really, so I have no idea how it would compare.

4. As the stream framework is only for playing with the other useful parts of the library, I only wrote it for my OS (OSX), so you won't be able to play out of the box on Windows (probably can be added without much effort, or use another stream library such as this one that was recently announced: https://forum.dlang.org/post/xtxiuxcmewxnhseubyik@forum.dlang.org), but it will likely work on other Unixen.

5. This is NOT thread-aware out of the box.

6. There is a concept in here I called "valves". It's very weird, but it allows unifying input and output into one seamless chain. In fact, I can't think of how I could have done output in this regime without them. See the convert example application for details on how it is used.

7. I expect to be changing the buffer API, as I think perhaps I have the wrong abstraction for buffers. However, I did attempt to have a std.allocator version of the buffer.

8. It's not on code.dlang.org yet. I'll work on this.

Destroy!

-Steve
February 17, 2016
On 17/02/16 7:45 PM, Steven Schveighoffer wrote:
> It's no secret that I've been looking to create an updated io library
> for phobos. In fact, I've been working on one on and off since 2011 (ouch).
>
> After about 5 iterations of API and design, and testing out ideas, I
> think I have come up with something pretty interesting. It started out
> as a plan to replace std.stdio (and that did not go over well:
> https://forum.dlang.org/post/j3u0l4$1atr$1@digitalmars.com), in addition
> to trying to find a better way to deal with i/o. However, I've scaled
> back my plan of world domination to just try for the latter, and save
> tackling the replacement of Phobos's i/o guts for a later battle, if at
> all. It's much easier to reason about something new than to muddle the
> discussion with how it will break code. It's also much easier to build
> something that doesn't have to be a drop-in replacement of something so
> insanely complex.
>
> I also have been inspired over the last few years by various great
> presentations and libraries, two being Dmitry's proof-of-concept library
> to have buffers that automatically move/fill when more data is needed,
> and Andrei's std.allocator library. They have changed drastically the
> way I have approached this challenge.
>
> Therefore, I now have a new dub-based repository available for playing
> with: https://github.com/schveiguy/iopipe. First, the candy:
>
> - This is a piping library. It allows one to hook buffered i/o through
> various processors/transformers much like unix pipes or range
> functions/algorithms. However, unlike unix pipes, this library attempts
> to make as few copies as possible of the data.
>
> example:
>
> foreach(line; (new IODevice(0)).bufferedInput
>      .asText!(UTFType.UTF8)
>      .byLine
>      .asInputRange)
>     // handle line
>
> - It can handle 5 forms of UTF encoding - UTF8, UTF16, UTF16LE, UTF32,
> UTF32LE (phobos only partially handles UTF8). Sorry, no grapheme support
> or other utf-related things, but this of course can be added later.
>
> - Arrays are first-class ioPipe types. This works:
>
> foreach(line; "one\ntwo\nthree\nfour\n".byLine.asInputRange)
>
> - Everything is compile-time for the most part, and uses lots of
> introspection. The intent is to give the compiler full gamut of
> optimization capabilities.
>
> - I added rudimentary compression/decompression support using
> etc.c.zlib. Using compression is done like so:
>
> foreach(line; (new IODevice(0)).bufferedInput
>      .unzip
>      .asText!(UTFType.UTF8)
>      .byLine
>      .asInputRange)
>
> - The plan is for this to be a basis to make super-fast and modular
> parsing libraries. I plan to write a JSON one as a proof of concept. So
> all you have to do is add a parseJSON function to the end of any chain,
> as long as the the input is some pipe of text data (including a string
> literal).
>
>
> =================
>
> I will stress some very very important things:
>
> 1. This library is FAR from finished. Even the concepts probably need
> some tweaking. But I'm very happy with the current API/usage.
>
> 2. Docs are very thin. Unit tests are sparse (but do pass).
>
> 3. The focus of this library is NOT replacement of std.stream, or even
> low-level i/o in general. In fact, I have copied over my stream class
> from previous attempts at this i/o rewrite ONLY as a mechanism to have
> something that can read/write from file descriptors with the right API
> (located in iopipe/stream.d). I admit to never having looked at
> std.stream really, so I have no idea how it would compare.
>
> 4. As the stream framework is only for playing with the other useful
> parts of the library, I only wrote it for my OS (OSX), so you won't be
> able to play out of the box on Windows (probably can be added without
> much effort, or use another stream library such as this one that was
> recently announced:
> https://forum.dlang.org/post/xtxiuxcmewxnhseubyik@forum.dlang.org), but
> it will likely work on other Unixen.
>
> 5. This is NOT thread-aware out of the box.
>
> 6. There is a concept in here I called "valves". It's very weird, but it
> allows unifying input and output into one seamless chain. In fact, I
> can't think of how I could have done output in this regime without them.
> See the convert example application for details on how it is used.
>
> 7. I expect to be changing the buffer API, as I think perhaps I have the
> wrong abstraction for buffers. However, I did attempt to have a
> std.allocator version of the buffer.
>
> 8. It's not on code.dlang.org yet. I'll work on this.
>
> Destroy!
>
> -Steve

A few things: https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126 why isn't that used more especially with e.g. window?
After all, window seems like a very well used word...

I don't like that a stream isn't inherently an input range.
This seems to me like a good place to use this abstraction by default.
February 17, 2016
On 2/17/16 1:58 AM, Rikki Cattermole wrote:

> A few things:
> https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126
> why isn't that used more especially with e.g. window?
> After all, window seems like a very well used word...

Not sure what you mean.

> I don't like that a stream isn't inherently an input range.
> This seems to me like a good place to use this abstraction by default.

What is front for an input stream? A byte? A character? A word? A line?

It's not there by default because it would be too assuming IMO. You can create an input range out of a stream quite easily.

e.g. https://github.com/schveiguy/iopipe/blob/master/source/iopipe/bufpipe.d#L664

What would be the benefit of having it an input range by default?

-Steve
February 17, 2016
On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:
> On 2/17/16 1:58 AM, Rikki Cattermole wrote:
> What would be the benefit of having it an input range by default?
>
> -Steve

https://en.wikipedia.org/wiki/Principle_of_least_astonishment
something the D community is lacking a bit in general imho.

but awesome library, will definitely use, thanks!
February 17, 2016
On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:
> On 2/17/16 1:58 AM, Rikki Cattermole wrote:
>
>> A few things:
>> https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126
>> why isn't that used more especially with e.g. window?
>> After all, window seems like a very well used word...
>
> Not sure what you mean.
>
>> I don't like that a stream isn't inherently an input range.
>> This seems to me like a good place to use this abstraction by default.
>
> What is front for an input stream? A byte? A character? A word? A line?

Why not just say it's a ubyte and then compose with ranges from there?
February 17, 2016
On Wednesday, 17 February 2016 at 10:54:56 UTC, John Colvin wrote:
> Why not just say it's a ubyte and then compose with ranges from there?

You could put a range interface on it... but I think it would be of very limited value. For one, what about fseek? How does that interact with the range interface?


Or, what about reading a network interface where you get variable-sized packets?

A ubyte[] is probably the closest thing you can get to usefulness, but even then you'd need non-range buffering controls to make it efficient and usable. Consider the following:

Packet 1: 11\nHello
Packet 2:  World05\nD ro
Packet 3: x


You take the ubyte[] thing that gives each packet at a time as it comes off the hardware interface. Good, you can process as it comes and it fits the range interface.

But it isn't terribly useful. Are you going to copy the partial message into another buffer so the next range.popFront doesn't overwrite it? Or will you present the incomplete message from packet 1 to the consumer? The former is less than efficient (and still needs to wrap the range in some other interface to make the user code pretty) and the latter leads to ugly user code being directly exposed.

Copying it into a buffer is probably the most sane... but it is a wasteful copy if your existing buffer has enough space. But how to you say that to a range? popFront takes no arguments.

What about packet 2, which has part of the first message and part of the second message? Can you tell it that you already consumed the first six bytes and it can now append the next packet to the existing buffer, but please return that slice on the next call?



Ranges are great for a sequence of data that is the same type on each call. Files, however, tend to have variable length (which you might want to skip large sections of) and different types of data as you iterate through them.

I find std.stdio's byChunk and byLine to be almost completely useless in my cases.
February 17, 2016
First, I'm very happy to see that. Sounds like a good project. Some remarks:
 - You seems to be using classes. These are good to compose at runtime, but we can do better at compile time using value types. I suggest using value types and have a class wrapper that can be used to make things composable at runtime if desirable.
 - Being able to read.write from an io device in a generator like manner is I think important if we are rolling out something new. Literally the only thing that can explain the success of Node.js is this (everything else is crap). See async/await in C# (https://msdn.microsoft.com/fr-fr/library/hh191443.aspx) or Hack (https://docs.hhvm.com/hack/async/introduction).
 - I like the input range stuff. Input ranges needs more love.
 - Please explain valves more.
 - ...
 - Profit ?
February 17, 2016
On Wednesday, 17 February 2016 at 22:47:27 UTC, deadalnix wrote:
> See async/await in C# (https://msdn.microsoft.com/fr-fr/library/hh191443.aspx)

Or for those poor souls who can't read French... ;)

https://msdn.microsoft.com/en-us/library/hh191443.aspx

- Jonathan M Davis
February 18, 2016
On Wednesday, 17 February 2016 at 23:15:51 UTC, Jonathan M Davis wrote:
> On Wednesday, 17 February 2016 at 22:47:27 UTC, deadalnix wrote:
>> See async/await in C# (https://msdn.microsoft.com/fr-fr/library/hh191443.aspx)
>
> Or for those poor souls who can't read French... ;)
>
> https://msdn.microsoft.com/en-us/library/hh191443.aspx
>
> - Jonathan M Davis

Thank you for the fixup :)
February 18, 2016
On 2/17/16 3:54 AM, yawniek wrote:
> On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:
>> On 2/17/16 1:58 AM, Rikki Cattermole wrote:
>> What would be the benefit of having it an input range by default?
>>
> https://en.wikipedia.org/wiki/Principle_of_least_astonishment
> something the D community is lacking a bit in general imho.

There are exceptions (e.g. byLine), but the likelihood that providing a range interface is the range that the user would expect is pretty low.

> but awesome library, will definitely use, thanks!

Thanks! Please let me know what you think if you end up using it.

-Steve
« First   ‹ Prev
1 2 3