An IO Streams Library - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » An IO Streams Library

Thread overview

An IO Streams Library
Feb 07, 2016 Jason White
Feb 07, 2016 cym13
Feb 07, 2016 Jason White
Feb 07, 2016 Rikki Cattermole
Feb 07, 2016 Jason White
Feb 07, 2016 Rikki Cattermole
Feb 07, 2016 Jason White
Feb 07, 2016 Rikki Cattermole
Feb 07, 2016 Johannes Pfau
Feb 08, 2016 Jason White
Feb 08, 2016 Jakob Ovrum
Feb 08, 2016 Jason White
Feb 08, 2016 Dejan Lekic
Feb 08, 2016 Jakob Ovrum
Feb 08, 2016 Ola Fosheim Grøstad
Feb 08, 2016 Atila Neves
Feb 08, 2016 Chris Wright
Feb 08, 2016 Atila Neves
Feb 08, 2016 Chris Wright
Feb 08, 2016 Kagamin
Feb 08, 2016 Jason White
Feb 09, 2016 Kagamin
Feb 09, 2016 Chris Wright
Feb 08, 2016 Wyatt
Jul 24, 2016 Martin Nowak
Jul 24, 2016 Martin Nowak
Jul 26, 2016 Johannes Pfau
Jul 25, 2016 ikod
Jul 26, 2016 Johannes Pfau
Jul 27, 2016 Sönke Ludwig
Jul 27, 2016 ikod

February 07, 2016

An IO Streams Library

Posted by Jason White

Jason White

I see the subject of IO streams brought up here occasionally. The general consensus seems to be that we need something better than what Phobos provides.

I wrote a library "io" that can work as a replacement for std.stdio, std.mmfile, std.cstream, and parts of std.stream:

    GitHub:  https://github.com/jasonwhite/io
    Package: https://code.dlang.org/packages/io

This library provides an input and output range interface for streams (which is more efficient if the stream is buffered). Thus, many of the wonderful range operations from std.range and std.algorithm can be used with this.

I'm interested in feedback on this library. What is it missing? How can be better?

I'm also interested in a discussion of what IO-related functionality people are missing in Phobos.

Please destroy!

February 07, 2016

Re: An IO Streams Library

Posted by cym13
in reply to Jason White

cym13

Posted in reply to Jason White

On Sunday, 7 February 2016 at 00:48:54 UTC, Jason White wrote:
> I see the subject of IO streams brought up here occasionally. The general consensus seems to be that we need something better than what Phobos provides.
> [...]

From what I can see without testing it, very nice work, thanks!

More a little surprise than anything serious though, why did you choose to go with "println" instead of "writeln" and such? I find it more confusing than anything given phobos choice.

February 07, 2016

Re: An IO Streams Library

Posted by Jason White
in reply to cym13

Jason White

Posted in reply to cym13

On Sunday, 7 February 2016 at 01:01:21 UTC, cym13 wrote:
> From what I can see without testing it, very nice work, thanks!
>
> More a little surprise than anything serious though, why did you choose to go with "println" instead of "writeln" and such? I find it more confusing than anything given phobos choice.

Thanks!

There are a couple reasons for using print/println/etc. over write/writeln/etc.:
 1. A module-level definition of write(Stream s, ...) would clash with the stream's definition of write(...).
 2. Do we mean text-serialization or byte-for-byte output when we say write()? With print(), it's clear that we want the arguments to be converted to a text representation and have that written that to the stream. With write(), it's clear we're writing out the binary representation to the stream.

February 07, 2016

Re: An IO Streams Library

Posted by Rikki Cattermole
in reply to Jason White

Rikki Cattermole

Posted in reply to Jason White

On 07/02/16 1:48 PM, Jason White wrote:
> I see the subject of IO streams brought up here occasionally. The
> general consensus seems to be that we need something better than what
> Phobos provides.
>
> I wrote a library "io" that can work as a replacement for std.stdio,
> std.mmfile, std.cstream, and parts of std.stream:
>
>      GitHub:  https://github.com/jasonwhite/io
>      Package: https://code.dlang.org/packages/io
>
> This library provides an input and output range interface for streams
> (which is more efficient if the stream is buffered). Thus, many of the
> wonderful range operations from std.range and std.algorithm can be used
> with this.
>
> I'm interested in feedback on this library. What is it missing? How can
> be better?
>
> I'm also interested in a discussion of what IO-related functionality
> people are missing in Phobos.
>
> Please destroy!

I posted a link to your repo a couple days ago in IRC.
Honestly? I like it. It looks reasonably well made.

There is a bit of work regarding interfaces + ranges.
I.e. Sink really should be inheriting from OutputRange!ubyte

Its no where near Phobos quality and that is ok for now.
I do think given time it could be a reasonably good base to rework std.socket, std.stdio, std.stream, std.cstream and std.mmfile into a completely new set of modules.

Most of that code it would end up replacing is I think almost 10 years old either way its from D1 and I think we can do better.

February 07, 2016

Re: An IO Streams Library

Posted by Jason White
in reply to Rikki Cattermole

Jason White

Posted in reply to Rikki Cattermole

On Sunday, 7 February 2016 at 01:20:26 UTC, Rikki Cattermole wrote:
> I posted a link to your repo a couple days ago in IRC.
> Honestly? I like it. It looks reasonably well made.

Thanks. I saw a link to it in a recent thread in Learn. I figured I'd finally make a proper post on it.

> There is a bit of work regarding interfaces + ranges.
> I.e. Sink really should be inheriting from OutputRange!ubyte

I haven't had much use for the interfaces, which is why they aren't fleshed out. Do you have any particular use cases for this in mind?

> Its no where near Phobos quality and that is ok for now.

I agree. The documentation needs work and I imagine there are lots of use cases that aren't well supported. Increased visibility and usage definitely helps with finding the warts.

> I do think given time it could be a reasonably good base to rework std.socket, std.stdio, std.stream, std.cstream and std.mmfile into a completely new set of modules.
>
> Most of that code it would end up replacing is I think almost 10 years old either way its from D1 and I think we can do better.

February 07, 2016

Re: An IO Streams Library

Posted by Rikki Cattermole
in reply to Jason White

Rikki Cattermole

Posted in reply to Jason White

On 07/02/16 2:55 PM, Jason White wrote:
> On Sunday, 7 February 2016 at 01:20:26 UTC, Rikki Cattermole wrote:
>> I posted a link to your repo a couple days ago in IRC.
>> Honestly? I like it. It looks reasonably well made.
>
> Thanks. I saw a link to it in a recent thread in Learn. I figured I'd
> finally make a proper post on it.
>
>> There is a bit of work regarding interfaces + ranges.
>> I.e. Sink really should be inheriting from OutputRange!ubyte
>
> I haven't had much use for the interfaces, which is why they aren't
> fleshed out. Do you have any particular use cases for this in mind?

I have no use case other then range compatibility.

>> Its no where near Phobos quality and that is ok for now.
>
> I agree. The documentation needs work and I imagine there are lots of
> use cases that aren't well supported. Increased visibility and usage
> definitely helps with finding the warts.

Actually I think there are plenty of use cases not implemented.
Done properly as a full replacement and rework of Phobos will mean you need to do almost everything in e.g. std.stdio and std.socket but with better abstractions.

Of course your goal may not be inline with my assertions of reworking Phobos. So feel free to ignore, it just would be a shame since it really needs some love.

February 07, 2016

Re: An IO Streams Library

Posted by Jason White
in reply to Rikki Cattermole

Jason White

Posted in reply to Rikki Cattermole

On Sunday, 7 February 2016 at 01:59:43 UTC, Rikki Cattermole wrote:
> Actually I think there are plenty of use cases not implemented.
> Done properly as a full replacement and rework of Phobos will mean you need to do almost everything in e.g. std.stdio and std.socket but with better abstractions.

I think I'll tackle implementing sockets next. I might need that for another project of mine.

Once this gets polished enough, it would be great to eventually replace those modules in Phobos. However, it would be difficult to do this without compatibility breakages. For example, since std.stdio.File uses FILE* under the covers and this uses plain old file descriptors, programs that rely on that behavior would break.

> Of course your goal may not be inline with my assertions of reworking Phobos. So feel free to ignore, it just would be a shame since it really needs some love.

My primary goal is to provide a more useful and powerful IO library than what Phobos provides since that is what I need for my other projects. That goal is not necessarily counter to reworking Phobos. ;)

February 07, 2016

Re: An IO Streams Library

Posted by Rikki Cattermole
in reply to Jason White

Rikki Cattermole

Posted in reply to Jason White

On 07/02/16 3:43 PM, Jason White wrote:
> On Sunday, 7 February 2016 at 01:59:43 UTC, Rikki Cattermole wrote:
>> Actually I think there are plenty of use cases not implemented.
>> Done properly as a full replacement and rework of Phobos will mean you
>> need to do almost everything in e.g. std.stdio and std.socket but with
>> better abstractions.
>
> I think I'll tackle implementing sockets next. I might need that for
> another project of mine.

I wouldn't actually implement it based upon std.socket.
Use libasync instead.
https://github.com/etcimon/libasync

There has been talk about getting that into Phobos but it still needs time to mature.

One other important thing to note about sockets.
For anything performance related you need to have a central way to implement an event loop.

The one I've implemented is designed to work as a replacement for libasync's and to be used in Phobos.
https://github.com/rikkimax/alphaPhobos/blob/master/source/std/experimental/platform.d

If you do intend to make it compatible, you can ignore all of the windowing and related methods. Just keep things like optimizedEventLoop, eventLoopIteration, setAsDefault and thePlatform, defaultPlatform all in there.

We can combine later on, I just want it to be compatible when it comes time.

> Once this gets polished enough, it would be great to eventually replace
> those modules in Phobos. However, it would be difficult to do this
> without compatibility breakages. For example, since std.stdio.File uses
> FILE* under the covers and this uses plain old file descriptors,
> programs that rely on that behavior would break.

There is nothing wrong with breakage. Its old code. Its time to update.
But there must be a clear upgrade path.

>> Of course your goal may not be inline with my assertions of reworking
>> Phobos. So feel free to ignore, it just would be a shame since it
>> really needs some love.
>
> My primary goal is to provide a more useful and powerful IO library than
> what Phobos provides since that is what I need for my other projects.
> That goal is not necessarily counter to reworking Phobos. ;)

Okay sweet. Also if you can, it would be great to see you on IRC.

February 07, 2016

Re: An IO Streams Library

Posted by Johannes Pfau
in reply to Jason White

Johannes Pfau

Posted in reply to Jason White

Am Sun, 07 Feb 2016 00:48:54 +0000
schrieb Jason White <54f9byee3t32@gmail.com>:

> I see the subject of IO streams brought up here occasionally. The general consensus seems to be that we need something better than what Phobos provides.
> 
> I wrote a library "io" that can work as a replacement for std.stdio, std.mmfile, std.cstream, and parts of std.stream:
> 
>      GitHub:  https://github.com/jasonwhite/io
>      Package: https://code.dlang.org/packages/io
> 
> This library provides an input and output range interface for streams (which is more efficient if the stream is buffered). Thus, many of the wonderful range operations from std.range and std.algorithm can be used with this.
> 
> I'm interested in feedback on this library. What is it missing? How can be better?
> 
> I'm also interested in a discussion of what IO-related functionality people are missing in Phobos.
> 
> Please destroy!

I saw this on code.dlang.org some time ago and had a quick look. First of all this would have to go into phobos to make sure it's used as some kind of a standard. Conflicting stream libraries would only cause more trouble.

Then if you want to go for phobos inclusion I'd recommend looking at
other stream implementations and learning from their mistakes ;-)
There's
https://github.com/schveiguy/phobos/tree/babe9fe338f03cafc0fb50fc0d37ea96505da3e3/std/io
which was supposed to be a stream replacement for phobos. Then there
are also vibe.d streams*.


Your Stream interfaces looks like standard stream implementations (which
is a good thing) which also work for unbuffered streams. I think it's a
good idea to support partial reads and writes. For an explanation why
partial reads, see the vibe.d rant below. Partial writes are useful
as a write syscall can be interrupted by posix signals to stop the
write. I'm not sure if the API should expose this feature (e.g. by
returning a partial write on EINTR) but it can sometimes be useful.
Still readExactly / writeAll helpers functions are useful. I would try
to implement these as UFCS functions instead of as a struct wrapper.

For some streams you'll need a TimeoutException. An interesting
question is whether users should be able to recover from
TimeoutExceptions. This essentially means if a read/write function
internally calls read/write posix calls more than once and only the
last one timed out, we already processed some data and it's not
possible to recover from a TimeoutException if the amount of already
processed data is unknown.
The simplest solution is using only one syscall internally. Then
TimeoutException => no data was processed. But this doesn't work for
read/writeExcatly (Another reason why read/writeExactly shouldn't be
the default. vibe.d...)

Regarding buffers / sliding windows I'd have a look at https://github.com/schveiguy/phobos/blob/babe9fe338f03cafc0fb50fc0d37ea96505da3e3/std/io/buffer.d

Another design question is whether there should be an interface for such buffered streams or whether it's OK to have only unbuffered streams + one buffer struct / class. Basically the question is whether there might be streams that can offer a buffer interface but can't  use the standard implementation.




* vibe.d stream rant ahead:

vibe.d streams get some things right and some things very wrong. For
example their leastSize/empty/read combo means you might actually
have to implement reading data in any of these functions. Users have to
handle timeouts or other errors for any of these as well.

Then the API requires a buffered stream, it simply won't work for
unbuffered IO (leastSize, empty). And the fact that read reads exactly
n bytes makes stream implementations more complicated (re-reading until
enough data has been read should be done by a generic function, not
reimplemented in every stream). It even makes some user code more
complicated: I've implemented a serial port library for vibe-d.
If I don't know how many bytes will arrive with the next packet, the
read posix function usually returns the expected/available amount of
data. But now vibe.d requires me to specify a fixed length when calling
the stream read method. This leads to ugly code using peak...

Then vibe.d also mixes the sliding window / buffer concept into the stream class, but does so in a bad way. A sliding window should expose the internal buffer so that it's possible to consume bytes from the buffer, skip bytes, refill... In vibe.d you can peak at the buffer. But you can't discard data. You'll have to call read instead which copies from the internal buffer to an external buffer, even if you only want to skip data. Even worse, your external buffer size is limited. So you have to implement some loop logic if you want to skip more data than fits your buffer. And all you need is a discard(size_t n) function which does _buffer = _buffer[n .. $] in the stream class...

TLDR: API design is very important.

February 08, 2016

Re: An IO Streams Library

Posted by Jason White
in reply to Johannes Pfau

Jason White

Posted in reply to Johannes Pfau

On Sunday, 7 February 2016 at 10:50:24 UTC, Johannes Pfau wrote:
> I saw this on code.dlang.org some time ago and had a quick look. First of all this would have to go into phobos to make sure it's used as some kind of a standard. Conflicting stream libraries would only cause more trouble.
>
> Then if you want to go for phobos inclusion I'd recommend looking at
> other stream implementations and learning from their mistakes ;-)
> There's
> https://github.com/schveiguy/phobos/tree/babe9fe338f03cafc0fb50fc0d37ea96505da3e3/std/io
> which was supposed to be a stream replacement for phobos. Then there
> are also vibe.d streams*.

I saw Steven's stream implementation quite some time ago and I had a look at vibe's stream implementation just now. I think it is a mistake to use classes over structs for this sort of thing. I briefly tried implementing it with classes, but ran into problems. The non-deterministic destruction of classes is probably the biggest issue. One has to be careful about calling f.close() in order to avoid accumulating too many open file descriptors in programs that open a lot of files. Reference counting takes care of this problem nicely and has less overhead. This is one area where classes relying on the GC is not ideal. Rust's ownership system solves this problem quite well. Python also solves this with "with" statements.

> Your Stream interfaces looks like standard stream implementations (which
> is a good thing) which also work for unbuffered streams. I think it's a
> good idea to support partial reads and writes. For an explanation why
> partial reads, see the vibe.d rant below. Partial writes are useful
> as a write syscall can be interrupted by posix signals to stop the
> write. I'm not sure if the API should expose this feature (e.g. by
> returning a partial write on EINTR) but it can sometimes be useful.

I don't want to assume what the user wants to do in the event of an EINTR unless a certain behavior is desired 100% of the time. I don't think that is the case here. Thus, that is probably something the user should handle manually, if needed.

> Still readExactly / writeAll helpers functions are useful. I would try
> to implement these as UFCS functions instead of as a struct wrapper.

I agree. I went ahead and made that change.

> For some streams you'll need a TimeoutException. An interesting
> question is whether users should be able to recover from
> TimeoutExceptions. This essentially means if a read/write function
> internally calls read/write posix calls more than once and only the
> last one timed out, we already processed some data and it's not
> possible to recover from a TimeoutException if the amount of already
> processed data is unknown.
> The simplest solution is using only one syscall internally. Then
> TimeoutException => no data was processed. But this doesn't work for
> read/writeExcatly (Another reason why read/writeExactly shouldn't be
> the default. vibe.d...)

In the current implementation of readExactly/writeExactly, one cannot assume how much was read or written in the event of an exception anyway. The only way around this I can see is to return the number of bytes read/written in the exception itself. In fact, that might solve the TimeoutException problem, too. Hmm...

I'd like to keep the fundamental read/write functions at just one system call each in order to guarantee that they are atomic in relation to each other.

> Regarding buffers / sliding windows I'd have a look at https://github.com/schveiguy/phobos/blob/babe9fe338f03cafc0fb50fc0d37ea96505da3e3/std/io/buffer.d
>
> Another design question is whether there should be an interface for such buffered streams or whether it's OK to have only unbuffered streams + one buffer struct / class. Basically the question is whether there might be streams that can offer a buffer interface but can't  use the standard implementation.

I think it's OK to re-implement buffering for different types of streams where it is more efficient to do so. For example, there is no need to implement buffering for an in-memory stream because, by definition, it is already buffered.

I'm not sure if having multiple buffering strategies would be useful. Right now, there is only the fixed-sized sliding window. If multiple buffering strategies are useful, then it makes sense to have all streams unbuffered by default and have separate buffering implementations.

There is an interesting buffering approach here that is mainly geared towards parsing: https://github.com/DmitryOlshansky/datapicked/blob/master/dpick/buffer/buffer.d

> * vibe.d stream rant ahead:
>
> vibe.d streams get some things right and some things very wrong. For
> example their leastSize/empty/read combo means you might actually
> have to implement reading data in any of these functions. Users have to
> handle timeouts or other errors for any of these as well.
>
> Then the API requires a buffered stream, it simply won't work for
> unbuffered IO (leastSize, empty). And the fact that read reads exactly
> n bytes makes stream implementations more complicated (re-reading until
> enough data has been read should be done by a generic function, not
> reimplemented in every stream). It even makes some user code more
> complicated: I've implemented a serial port library for vibe-d.
> If I don't know how many bytes will arrive with the next packet, the
> read posix function usually returns the expected/available amount of
> data. But now vibe.d requires me to specify a fixed length when calling
> the stream read method. This leads to ugly code using peak...
>
> Then vibe.d also mixes the sliding window / buffer concept into the stream class, but does so in a bad way. A sliding window should expose the internal buffer so that it's possible to consume bytes from the buffer, skip bytes, refill... In vibe.d you can peak at the buffer. But you can't discard data. You'll have to call read instead which copies from the internal buffer to an external buffer, even if you only want to skip data. Even worse, your external buffer size is limited. So you have to implement some loop logic if you want to skip more data than fits your buffer. And all you need is a discard(size_t n) function which does _buffer = _buffer[n .. $] in the stream class...

These are the golden nuggets of experience I was looking for when making this post. They definitely help to guide an ergonomic API design. Standing on the shoulders of giants and such. Thanks!

> TLDR: API design is very important.

Completely agree.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation