Jump to page: 1 24  
Page
Thread overview
std.stream replacement
Mar 05, 2013
BLM768
Mar 05, 2013
Jonathan M Davis
Mar 05, 2013
Dmitry Olshansky
Mar 05, 2013
Dmitry Olshansky
Mar 06, 2013
Timon Gehr
Mar 06, 2013
Dmitry Olshansky
Mar 05, 2013
BLM768
Mar 07, 2013
Stewart Gordon
Mar 07, 2013
BLM768
Mar 07, 2013
Johannes Pfau
Mar 09, 2013
Stewart Gordon
Jul 05, 2013
w0rp
Dec 14, 2013
Jacob Carlborg
Apr 16, 2014
sclytrack
May 28, 2014
Tero
May 29, 2014
Tero
Mar 09, 2013
Jonathan M Davis
Mar 09, 2013
Stewart Gordon
Mar 10, 2013
Marco Leise
Mar 10, 2013
Stewart Gordon
Mar 09, 2013
H. S. Teoh
Mar 07, 2013
BLM768
Mar 08, 2013
BLM768
Mar 09, 2013
BLM768
March 05, 2013
While working on a project, I've started to realize that I miss streams. If someone's not already working on bringing std.stream up to snuff, I think that we should start thinking about to do that.
Of course, with ranges being so popular (with very good reason), the new stream interface would probably just be a range wrapper around a file; in fact, a decent amount of functionality could be implemented by just adding a byChars range to the standard File struct and leaving the parsing functionality to std.conv.parse. Of course, there's no reason to stop there; we could also add socket streams, compressed streams, and just about any other type of stream, all without an especially large amount of effort.
Unless someone already wants to tackle the project (or has already started), I'd be willing to work out at least a basic design and implementation.
March 05, 2013
On Tuesday, March 05, 2013 09:14:16 BLM768 wrote:
> While working on a project, I've started to realize that I miss
> streams. If someone's not already working on bringing std.stream
> up to snuff, I think that we should start thinking about to do
> that.
> Of course, with ranges being so popular (with very good reason),
> the new stream interface would probably just be a range wrapper
> around a file; in fact, a decent amount of functionality could be
> implemented by just adding a byChars range to the standard File
> struct and leaving the parsing functionality to std.conv.parse.
> Of course, there's no reason to stop there; we could also add
> socket streams, compressed streams, and just about any other type
> of stream, all without an especially large amount of effort.
> Unless someone already wants to tackle the project (or has
> already started), I'd be willing to work out at least a basic
> design and implementation.

In general, a stream _is_ a range, making a lot of "stream" stuff basically irrelevant. What's needed then is a solid, efficient range interface on top of I/O (which we're lacking at the moment).

Steven Schveighoffer was working on std.io (which would be a replacement for std.stdio), and I believe that streams were supposed to be part of that, but I'm not sure. And I don't know quite what std.io's status is at this point, so I have no idea when it'll be ready for review. Steven seems to be very busy these days, so I suspect that it's been a while since much progress was made on it.

- Jonathan M Davis
March 05, 2013
On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> On Tuesday, March 05, 2013 09:14:16 BLM768 wrote:
>> While working on a project, I've started to realize that I miss
>> streams. If someone's not already working on bringing std.stream
>> up to snuff, I think that we should start thinking about to do
>> that.
>> Of course, with ranges being so popular (with very good reason),
>> the new stream interface would probably just be a range wrapper
>> around a file; in fact, a decent amount of functionality could be
>> implemented by just adding a byChars range to the standard File
>> struct and leaving the parsing functionality to std.conv.parse.
>> Of course, there's no reason to stop there; we could also add
>> socket streams, compressed streams, and just about any other type
>> of stream, all without an especially large amount of effort.
>> Unless someone already wants to tackle the project (or has
>> already started), I'd be willing to work out at least a basic
>> design and implementation.
>
> In general, a stream _is_ a range, making a lot of "stream" stuff basically
> irrelevant. What's needed then is a solid, efficient range interface on top of
> I/O (which we're lacking at the moment).

This is not correct.  A stream is a good low-level representation of i/o.  A range is a good high-level abstraction of that i/o.  We need both.  Ranges make terrible streams for two reasons:

1. r.front does not have room for 'read n bytes'.  Making it do that is awkward (e.g. r.nextRead = 20; r.front; // read 20 bytes)
2. ranges have separate operations for getting data and progressing data.  Streams by their very nature combine the two in one operation (i.e. read)

Now, ranges ARE a very good interface for a high level abstraction.  But we need a good low-level type to perform the buffering necessary to make ranges functional.  std.io is a design that hopefully will fit within the existing File type, be compatible with C's printf, and also provides a replacement for C's antiquated FILE * buffering stream.  With tests I have done, std.io is more efficient and more flexible/powerful than C's version.

>
> Steven Schveighoffer was working on std.io (which would be a replacement for
> std.stdio), and I believe that streams were supposed to be part of that, but
> I'm not sure. And I don't know quite what std.io's status is at this point, so
> I have no idea when it'll be ready for review. Steven seems to be very busy
> these days, so I suspect that it's been a while since much progress was made
> on it.

Yes, very busy :)  I had taken a break from D for about 3-4 months, had to work on my side business.  Still working like mad there, but I'm carving out as much time as I can for D.

std.io has not had pretty much any progress since I last went through the ringer (and how!) on the forums.  It is not dead, but it will take me some time to be able to kick start it again (read: understand what the hell I was doing there).  I do plan to try in the coming months.

-Steve
March 05, 2013
05-Mar-2013 20:12, Steven Schveighoffer пишет:
> On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis
> <jmdavisProg@gmx.com> wrote:
>
>> On Tuesday, March 05, 2013 09:14:16 BLM768 wrote:
>>> While working on a project, I've started to realize that I miss
>>> streams. If someone's not already working on bringing std.stream
>>> up to snuff, I think that we should start thinking about to do
>>> that.
>>> Of course, with ranges being so popular (with very good reason),
>>> the new stream interface would probably just be a range wrapper
>>> around a file; in fact, a decent amount of functionality could be
>>> implemented by just adding a byChars range to the standard File
>>> struct and leaving the parsing functionality to std.conv.parse.
>>> Of course, there's no reason to stop there; we could also add
>>> socket streams, compressed streams, and just about any other type
>>> of stream, all without an especially large amount of effort.
>>> Unless someone already wants to tackle the project (or has
>>> already started), I'd be willing to work out at least a basic
>>> design and implementation.
>>
[snip]
> Now, ranges ARE a very good interface for a high level abstraction.  But
> we need a good low-level type to perform the buffering necessary to make
> ranges functional.  std.io is a design that hopefully will fit within
> the existing File type, be compatible with C's printf, and also provides
> a replacement for C's antiquated FILE * buffering stream.  With tests I
> have done, std.io is more efficient and more flexible/powerful than C's
> version.

That's it.
C's iobuf stuff and locks around (f)getc are one reason for it being slower. In D we need no stinkin' locks as stuff is TLS by default.

Plus as far as I understand your std.io idea it was focused around filling up user-provided buffers directly without obligatory double buffering somewhere inside like C does.

>>
>> Steven Schveighoffer was working on std.io (which would be a
>> replacement for
>> std.stdio), and I believe that streams were supposed to be part of
>> that, but
>> I'm not sure. And I don't know quite what std.io's status is at this
>> point, so
>> I have no idea when it'll be ready for review. Steven seems to be very
>> busy
>> these days, so I suspect that it's been a while since much progress
>> was made
>> on it.
>
> Yes, very busy :)  I had taken a break from D for about 3-4 months, had
> to work on my side business.  Still working like mad there, but I'm
> carving out as much time as I can for D.
>
> std.io has not had pretty much any progress since I last went through
> the ringer (and how!) on the forums.  It is not dead, but it will take
> me some time to be able to kick start it again (read: understand what
> the hell I was doing there).  I do plan to try in the coming months.
>

Would love to see it progressing towards Phobos inclusion. It's one of areas where D can easily beat C runtime, no cheating.



-- 
Dmitry Olshansky
March 05, 2013
On Tue, 05 Mar 2013 11:43:59 -0500, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:


> That's it.
> C's iobuf stuff and locks around (f)getc are one reason for it being slower. In D we need no stinkin' locks as stuff is TLS by default.
>
> Plus as far as I understand your std.io idea it was focused around filling up user-provided buffers directly without obligatory double buffering somewhere inside like C does.

You are right about the locking, though shared streams like stdout will need to be locked (this is actually one of the more difficult parts to do, and I haven't done it yet.  Shared is a pain to work with, the current File struct cheats with casting, I think I will have to do something like that).  File does a pretty good job of locking for an entire operation (i.e. an entire writeln/readf).

C iobuf I think tries to avoid double buffering for some things (e.g. gcc's getline), but std.io takes that to a new level.

With std.io you have SAFE access directly to the buffer.  So instead of getline being "read directly into my buffer, or copy into my buffer", it's "make sure there is a complete line in the file buffer, then give me a slice to it".  What's great about this is, you don't need to hack phobos to get buffer access like you need to hack C's stream to get buffer access to create something like getline.  So many more possibilities exist.

So things like parsing xml files need no double buffering at all, AND you don't even have to provide a buffer!

Note that it is still possible to provide a buffer, in case that is what you want to do, and it will only copy any data already in the stream buffer.  Everything else is read directly in (I have some heuristics to try and prevent tiny reads, so if you want to say read 4 bytes, it will first fill the stream buffer, then copy 4 bytes).

-Steve
March 05, 2013
05-Mar-2013 22:49, Steven Schveighoffer пишет:
> On Tue, 05 Mar 2013 11:43:59 -0500, Dmitry Olshansky
> <dmitry.olsh@gmail.com> wrote:
>
>
>> That's it.
>> C's iobuf stuff and locks around (f)getc are one reason for it being
>> slower. In D we need no stinkin' locks as stuff is TLS by default.
>>
>> Plus as far as I understand your std.io idea it was focused around
>> filling up user-provided buffers directly without obligatory double
>> buffering somewhere inside like C does.
>
> You are right about the locking, though shared streams like stdout will
> need to be locked (this is actually one of the more difficult parts to
> do, and I haven't done it yet.  Shared is a pain to work with, the
> current File struct cheats with casting, I think I will have to do
> something like that).

But at least these are already shared :) In fact, shared is meant to be a pain in the ass (but I agree it should get some more convenience).

What is a key point is that shared should have been the user's problem. Now writeln and its ilk are too darn common so some locking scheme got to be backed-in to amend the pain.

> File does a pretty good job of locking for an
> entire operation (i.e. an entire writeln/readf).

I just hope it doesn't call internally locking C functions after that...

> C iobuf I think tries to avoid double buffering for some things (e.g.
> gcc's getline), but std.io takes that to a new level.

Yeah, AFAIK it translates calls for say few megabytes of data to direct read/write OS syscalls. Hard to say how reliable their heuristics are.

> With std.io you have SAFE access directly to the buffer.  So instead of
> getline being "read directly into my buffer, or copy into my buffer",
> it's "make sure there is a complete line in the file buffer, then give
> me a slice to it".  What's great about this is, you don't need to hack
> phobos to get buffer access like you need to hack C's stream to get
> buffer access to create something like getline.  So many more
> possibilities exist.
>
> So things like parsing xml files need no double buffering at all, AND
> you don't even have to provide a buffer!

Slicing the internal buffer is real darn nice. Hard to stress it enough ;)

There is one thing I found a nice abstraction while helping out on D's lexer in D and I call it mark-slice range. An extension to forward range it seems.

It's all about buffering and defining a position in input such that you don't care for anything up to this point. This means that starting from thusly marked point stuff needs to be kept in buffer, everything prior to it could be discarded. The 2nd operation "slice" is getting a slice of some internal buffer from last mark to the current position.

Would be interesting to see how it correlates with buffered I/O in std.io, what you say so far fits the bill.

> Note that it is still possible to provide a buffer, in case that is what
> you want to do, and it will only copy any data already in the stream
> buffer.

So if I use my own buffers exclusively there is nothing to worry about (no copy this - copy that)?

> Everything else is read directly in (I have some heuristics to
> try and prevent tiny reads, so if you want to say read 4 bytes, it will
> first fill the stream buffer, then copy 4 bytes).

This seems a bit like C one iff it's a smart libc. What if instead you read more then requested into target buffer (if it fits)? You can tweak the definition of read to say "buffer no less then X bytes, the actual amount is returned" :)

And if one want the direct and dumb way of get me these 4 bytes - just let them provide fixed buffer of 4 bytes in total, then std.io can't read more then that. (Could be useful to bench OS I/O layer and such)
Another consequence is that std.io wouldn't need to allocate internal buffer eagerly for tiny reads (in case they actually show up).

-- 
Dmitry Olshansky
March 05, 2013
On Tue, 05 Mar 2013 14:12:58 -0500, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:

> 05-Mar-2013 22:49, Steven Schveighoffer пишет:

>> Everything else is read directly in (I have some heuristics to
>> try and prevent tiny reads, so if you want to say read 4 bytes, it will
>> first fill the stream buffer, then copy 4 bytes).
>
> This seems a bit like C one iff it's a smart libc. What if instead you read more then requested into target buffer (if it fits)? You can tweak the definition of read to say "buffer no less then X bytes, the actual amount is returned" :)
>
> And if one want the direct and dumb way of get me these 4 bytes - just let them provide fixed buffer of 4 bytes in total, then std.io can't read more then that. (Could be useful to bench OS I/O layer and such)
> Another consequence is that std.io wouldn't need to allocate internal buffer eagerly for tiny reads (in case they actually show up).
>

The way I devised it is a "processor" delegate.  Basically, you provide a delegate that says "yep, this is enough".  While it's not enough, it keeps extending and filling the extended buffer.

Which buffer is used is your call, if you want it to use it's internal buffer, then it will, extending as necessary (I currently only use D arrays and built-in appending/extending).

Here is the a very simple readline implementation (only supports '\n', only supports UTF8, the real version supports much more):

const(char)[] readline(InputStream input)
{
   size_t checkLine(const(ubyte)[] data, size_t start)
   {
       foreach(size_t i; start..data.length)
          if(data[i] == '\n')
             return i+1; // consume this many bytes
       return size_t.max; // no eol found yet.
   }

   auto result = cast(const(char)[]) input.readUntil(&checkLine);
   if(result.length && result[$-1] == '\n')
      result = result[0..$-1];
   return result;
}

Note that I don't have to care about management of the return value, it is handled for me by the input stream.  If the user intends to save that for later, he can make a copy.  If not, just process it and move on to the next line.

There is also an appendUntil function which takes an already existing buffer and appends to it.

Also note that I have a shortcut for what is probably a very common requirement -- read until a delimiter is found.  That version accepts either a single ubyte or a ubyte array.  I just showed the above for effect.

input.readUntil('\n');

also will work (for utf-8 streams).

-Steve
March 05, 2013
On Tuesday, 5 March 2013 at 16:12:24 UTC, Steven Schveighoffer wrote:
> On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
>
>>
>> In general, a stream _is_ a range, making a lot of "stream" stuff basically
>> irrelevant. What's needed then is a solid, efficient range interface on top of
>> I/O (which we're lacking at the moment).
>
> This is not correct.  A stream is a good low-level representation of i/o.  A range is a good high-level abstraction of that i/o.

Ranges aren't necessarily higher- or lower-level than streams; they're completely orthogonal ways of looking at a data source. It's completely possible to build a stream interface on top of a range of characters, which is what I was suggesting. In that situation, the range is at a lower level of abstraction than the stream is.

> Ranges make terrible streams for two reasons:
>
> 1. r.front does not have room for 'read n bytes'.  Making it do that is awkward (e.g. r.nextRead = 20; r.front; // read 20 bytes)

Create a range operation like "r.takeArray(n)". You can optimize it to take a slice of the buffer when possible.

> 2. ranges have separate operations for getting data and progressing data.  Streams by their very nature combine the two in one operation (i.e. read)

Range operations like std.conv.parse implicitly progress their source ranges. For example:

auto stream = file.byChars;
while(!stream.empty) {
    doSomethingWithInt(stream.parse!int);
}

Except for the extra ".byChars", it's just as concise as any other stream, and it's more flexible than something that *only* provides a stream interface. It also saves some duplication of effort; everything can lean on std.conv.parse.

Besides, streams don't necessarily progress the data; C++ iostreams have peek(), after all.

From what I see, at least in terms of the interface, a stream is basically just a generalization of a range that supports more than one type as input/output. There's no reason that such a system couldn't be built on top of a range, especially when the internal representation is of a single type: characters.

March 06, 2013
On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768@gmail.com> wrote:

> On Tuesday, 5 March 2013 at 16:12:24 UTC, Steven Schveighoffer wrote:
>> On Tue, 05 Mar 2013 03:22:00 -0500, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
>>
>>>
>>> In general, a stream _is_ a range, making a lot of "stream" stuff basically
>>> irrelevant. What's needed then is a solid, efficient range interface on top of
>>> I/O (which we're lacking at the moment).
>>
>> This is not correct.  A stream is a good low-level representation of i/o.  A range is a good high-level abstraction of that i/o.
>
> Ranges aren't necessarily higher- or lower-level than streams; they're completely orthogonal ways of looking at a data source. It's completely possible to build a stream interface on top of a range of characters, which is what I was suggesting. In that situation, the range is at a lower level of abstraction than the stream is.

I think you misunderstand.  Ranges absolutely can be a source for streams, especially if they are arrays.  The point is that the range *interface* doesn't make a good stream interface.  So we need to invent new methods to access streams.

>> Ranges make terrible streams for two reasons:
>>
>> 1. r.front does not have room for 'read n bytes'.  Making it do that is awkward (e.g. r.nextRead = 20; r.front; // read 20 bytes)
>
> Create a range operation like "r.takeArray(n)". You can optimize it to take a slice of the buffer when possible.

This is not a good idea.  We want streams to be high performance.  Accepting any range, such as a dchar range that outputs one dchar at a time, is not going to be high performance.

On top of that, in some cases, the result will be a slice, in some cases it will be a copy.  Generic code will have to figure out that difference if it wants to save the data for later, or else risk double copying.

>> 2. ranges have separate operations for getting data and progressing data.  Streams by their very nature combine the two in one operation (i.e. read)
>
> Range operations like std.conv.parse implicitly progress their source ranges.

That's not a range operation.  Range operations are empty, popFront, front.  Anything built on top of ranges must use ONLY these three operations, otherwise you are talking about something else.

It is possible to use random-access ranges for a valid stream source.  But that is not a valid stream interface, streams aren't random-access ranges.

> Besides, streams don't necessarily progress the data; C++ iostreams have peek(), after all.

That is because the data is buffered.  At a low-level, we have to deal with the OS, which may not support peeking.

>  From what I see, at least in terms of the interface, a stream is basically just a generalization of a range that supports more than one type as input/output. There's no reason that such a system couldn't be built on top of a range, especially when the internal representation is of a single type: characters.

streams shouldn't have to support the front/popFront mechanism.  empty may be the only commonality.  I think that is an awkward fit for ranges.  Certainly it is possible to take a *specific* range, such as an array, and add a stream-like interface to it.  But not ranges in general.

-Steve
March 06, 2013
On 03/05/2013 08:12 PM, Dmitry Olshansky wrote:
> ...
>
> There is one thing I found a nice abstraction while helping out on D's
> lexer in D and I call it mark-slice range. An extension to forward range
> it seems.
>
> It's all about buffering and defining a position in input such that you
> don't care for anything up to this point. This means that starting from
> thusly marked point stuff needs to be kept in buffer, everything prior
> to it could be discarded. The 2nd operation "slice" is getting a slice
> of some internal buffer from last mark to the current position.
> ...

The lexer I have built last year does something similar. It allows the parser to save and restore sorted positions in FIFO order with one size_t of memory inside the parser's current stack frame (internally, the lexer only saves the first position). The data is kept in a circular buffer that grows dynamically in case the required lookahead is too large.
« First   ‹ Prev
1 2 3 4