September 04, 2011
On Sat, 03 Sep 2011 18:57:06 -0400, Andrej Mitrovic <andrej.mitrovich@gmail.com> wrote:

> Also, changing structs to classes is gonna *massively* break code
> everywhere. Why inheritance instead of a predicate like isInputStream
> = is(typeof(T t; t.put; t.close)), you know the drill..

Because it breaks runtime swapping of I/O.

For example, if you wanted to change stdin to a network socket, it's simple, just assign another InputStream.

However, if stdin is a templated struct, you cannot do this at runtime, you have to decide at compile time what your stdin is.  Believe it or not, this is not dissimilar to FILE *, except we have more flexibility.

But I realize the implications now.  I think I have to revisit this decision.

We definitely need classes at the lower level, but I think we can wrap them with structs that are commonly used for RAII and for not breaking existing code.

-Steve
September 04, 2011
On Sat, 03 Sep 2011 17:55:12 -0400, Michel Fortin <michel.fortin@michelf.com> wrote:

> On 2011-09-03 19:54:05 +0000, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
>
>> Hello,
>>   There are a number of issues related to D's current handling of streams, including the existence of the imperfect etc.stream and the over-specialization of std.stdio.
>>  Steve has worked on an extensive overhaul of std.stdio which would obviate the need for etc.stream and would improve both the generality and efficiency of std.stdio.
>>  Please chime in with feedback; he's away from the Usenet but allowed me to post this on his behalf. I uploaded the docs to
>>  http://erdani.com/d/new-stdio/phobos-prerelease/std_stdio.html
>
> Looks good…

Well, at least someone thinks so ;)

>
> Hum, inconsistent casing of enum members.

Can be fixed.

>
> And shouldn't there be a way to do non-blocking IO? ;-)

Yes.  I haven't gotten to that yet.  This is a very early version, not ready for inclusion.  It's mostly a proof-of-concept.

>
> I like that File is now a class because it's cleaner that way, but non-deterministic destruction is going to be a problem. That said, it was already a problem anyway if you stored a File struct in a class, so maybe we need a more general solution for reference-counted classes.

I agree, but I think I need to revisit that aspect.  As broken as the reference counting mechanism is, much code is based on it, so we can't say you have to revisit all source code in order to be compatible.

And as Andrei points out, it works in cases where you *don't* store the struct on the heap, why should that be disabled?

>
> Class names DInput and DOutput sounds silly. If all classes implemented purely in D had a D prefix, it'd get redundant pretty fast (like KDE apps beginning in K).

Yes, it made sense when I was going through the different iterations of my interface ideas.  But you are right.  BTW, these started out as DBufferedInput and DBufferedOutput, and CStream was CBufferedStream.

> I'd suggest BufferedInput and BufferedOutput, or something else that actually describes what the class does, instead of DInput and DOutput. And I'd make them final, that way there won't be any virtual call overhead until the buffer needs to be replenished or flushed from the wrapped input or output stream.

They are final, ddoc just doesn't expose that...

See my later post to the source.  Things might be clearer.

-Steve
September 04, 2011
On Sat, 03 Sep 2011 21:47:53 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 9/3/11 3:54 PM, Andrei Alexandrescu wrote:
>> http://erdani.com/d/new-stdio/phobos-prerelease/std_stdio.html
>
> Here are a few points following a pass through the dox:
>
> * After thinking some more about it, I find the approach seek() plus enumerated Anchor undesirable. It's a bad case of logical coupling as one never calls seek() passing an anchor as a variable. It's really three functions - seekForward, seekBackward, and seekAbsolute. Heck, knowing what seek does, it should be just seekAbsolute. But then there are several possible designs; a logically coupled seek() is not a good turn in any case.

I think you need to support all three, but they could be individual functions.  It just is easy to provide the same interface the OS handle provides.  Let's entertain changing to three separate functions.

But I think we need to support seek from front, seek from end, and seek from current.  I don't know about the three you mentioned.  How would you seek to the end if you didn't have seekEnd?  And seeking forward or backward I think is captured much better via a positive or negative integer.  I can imagine having to write code like this:

if(pos < cur)
   seekBackward(cur - pos);
else
   seekForward(pos - cur);

>
> * Seekable should document that tell() is O(1) and seek() can be considered O(1) but with a large constant factor.

OK, docs need lots of TLC for sure.

>
> * Why is close() not part of Seekable, since Seekable seems to be the base of all streams?

Hm... not really sure.  I suppose it could be!  But then, should the interface be called Seekable?  What about just Stream?

> * Class File is IMHO not going to cut the mustard. It needs to be a struct with a destructor. One should be able to _get_ an InputStream or an OutputStream interface out of a File object (i.e. a file is a factory of such interfaces), but the File itself must be a struct.

I'm seeing a large backlash on this decision.  I'm going to revisit it.

Note, however, that it was a poor choice of name for File on my part.  File is *not* equivalent to the current stdio.File, in that it's not buffered, and is not text-based.

>
> * I don't understand the difference between read() and readComplete().

read() gets as much data as it can from the buffer and from the stream using at most one low-level read.  readComplete() will continually read until either EOF is encountered, or the requested data is read.  I started making read() do what readComplete does, but it surprisingly is a very difficult low-level thing to write.  However, readComplete() is trivial to implement on top of read(), which is why I split the two functions.

Please, come up with a better name, I hate readComplete :)

>
> * readUntil is a bit tenuous. I was hoping for a simpler interface to buffered streams, e.g. just expose the buffer as a ubyte[].

I think we need a const(ubyte)[] peek(size_t nbytes).  Would this suffice?

>
> * readUntil(const(ubyte)[]) does not give a cheap means to figure whether the read ended because file ended or the terminator was met.

You are right.  I'll think about this.

>
> * There's several readUntil but only one appendUntil. Why?

Didn't get around to it yet.  The overloads for readUntil are trivial, so can be copied easily enough to appendUntil.

>
> * Document the difference between skip and seek. Also, skip should take a ulong.

skip is buffer-only.  It will never trigger a low-level call.
I will fill the docs more completely.

Given this, I think size_t is the right type, as a buffer cannot be more than size_t bytes in length.

> * I see encoder and decoder() in DInput, should both be decoder?

Yes.  encoder is for DOutput, copy-paste error.

>
> * StreamWidth, TextXXX and friends are a bit sudden because they introduce a higher-level abstraction in a module so far only preoccupied to transferring bytes. I was thinking that kind of stuff would belong to a formatter/serializer module.

Could be moved.  However, stdin stderr and stdout are traditionally text-based, and stdio contains them.  I wanted to split out text-handling from the basic buffered stream, since it's very specific.  For example, having to deal with an object that supports formatted text i/o for a network socket seems uncommon.

I'm open to suggestions.

Note, I must have had a brain-malfunction when I gave what I thought was a fairly completely-documented module.  I missed some very important declarations and functions.  I'll work on fixing the docs and giving you a new copy.  Thanks again for hosting it.

-Steve
September 04, 2011
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> I'm sure it will go up much more, but previously we've had a more
> accepting attitude to new functionality at the cost of scrutiny (e.g.
> std.xml and std.json, both written by episodic contributors). (I really
> regret having had that attitude, it hurt us.) So now that there are so
> many eyeballs focused on the code, and not just any eyeballs but
> eyeballs connected to good brains, there is pressure building up.
> There are quite a few pieces in Phobos that are withstanding scrutiny
> quite well: getopt, algorithm, variant (which can be, I think, safely
> extended to new great functionality), range, conv, random, and more.
> There are, unfortunately, others that didn't start off the right foot
> and right now are somewhat of an eyesore. I trust we will figure what to
> do about each on a by-case basis, though I agree with Walter that we
> should balance the breakage cost with correspondingly high rewards in
> terms of functionality improvements.
> Andrei

Yes, the quality standard has gone up massively.  When I was prepping std.parallelism for review a few months ago, I generally used the existing Phobos documentation as a guideline for what std.parallelism's docs should resemble. Andrei, of course, ripped the documentation apart.  In hindsight it led to massive improvements and was for the better.  It certainly set the tone for clear, precise documentation in the future and the same high standards were applied to std.path and the std.curl.  However, at the time I actually thought he just hated std.parallelism at a gut level and was looking for any excuse to keep it out of Phobos.  (I apologize for having thought this and therefore taken a much more adversarial view of the review process than I should have.)
September 04, 2011
On Saturday, September 03, 2011 18:53:00 Walter Bright wrote:
> On 9/3/2011 5:58 PM, Jonathan M Davis wrote:
> > However, if the code breakage
> > doesn't actually gain us anything, then we should avoid it. So,
> > complaints about code breakage are valid, but they aren't deal
> > breaking.
> 
> The larger the amount of code that is broken, the more gain there must be to justify it.
> 
> Breaking std.stdio, which is used everywhere, this thoroughly needs a very high bar of justification.

Agreed.

- Jonathan M Davis
September 04, 2011
On Sat, 03 Sep 2011 22:27:49 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 9/3/11 9:53 PM, Walter Bright wrote:
>> On 9/3/2011 5:58 PM, Jonathan M Davis wrote:
>>> However, if the code breakage
>>> doesn't actually gain us anything, then we should avoid it. So,
>>> complaints
>>> about code breakage are valid, but they aren't deal breaking.
>>
>> The larger the amount of code that is broken, the more gain there must
>> be to justify it.
>>
>> Breaking std.stdio, which is used everywhere, this thoroughly needs a
>> very high bar of justification.
>
> I agree. I'm hoping the new stuff could build on top of std.stdio.

It is my plan for the eventual result to break either no code, or as little code as possible.  The current library is mostly a proof-of-concept, to see what people think, and to show what might be possible.  I think the interfaces in this library make for a much easier-to-write xml library for instance.

It's by no means a proposal for immediate acceptance into Phobos, I'm sorry if it came across that way.

We have to break something in std.stdio, because it's fixated on FILE *.  We need something that allows FILE * to play the game, but is focused on a D-based solution.  Otherwise, we have no room for improvement.  that's what I'm striving for.  And along the way, I'm trying to make it as efficient as possible.

-Steve
September 04, 2011
On 9/3/11 11:33 PM, Steven Schveighoffer wrote:
> We have to break something in std.stdio, because it's fixated on FILE *.
> We need something that allows FILE * to play the game, but is focused on
> a D-based solution. Otherwise, we have no room for improvement.

I'm not 100% convinced of that. We can achieve a good deal of improvement by resorting to platform-specific code. Clearly that's not the best way to go but it's not difficult and it does have its merit.

Overall I think the design of std.stdio should be followed:

1. User opens a File (or whatever), which is a struct. The struct uses RAII.

2. Using the struct you can directly call primitives to read and write stuff.

3. You can also decide you want a polymorphic stream out of it, and you get to decide the parameters of the stream (buffering, chunking, synchronicity and whatnot). byChunk and byLine are good examples, although they aren't polymorphic. Once you have such a stream you're in polyland so you get to use all of its goodies (look ma no templates etc).

4. Once all copies of the struct is destroyed, all streams derived from it are automatically closed and will issue errors when used.

That's pretty much it! It's a simple design that does all we need.


Andrei



September 04, 2011
On Sunday, September 04, 2011 03:22:21 dsimcha wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> 
> > I'm sure it will go up much more, but previously we've had a more
> > accepting attitude to new functionality at the cost of scrutiny (e.g.
> > std.xml and std.json, both written by episodic contributors). (I really
> > regret having had that attitude, it hurt us.) So now that there are so
> > many eyeballs focused on the code, and not just any eyeballs but
> > eyeballs connected to good brains, there is pressure building up.
> > There are quite a few pieces in Phobos that are withstanding scrutiny
> > quite well: getopt, algorithm, variant (which can be, I think, safely
> > extended to new great functionality), range, conv, random, and more.
> > There are, unfortunately, others that didn't start off the right foot
> > and right now are somewhat of an eyesore. I trust we will figure what to
> > do about each on a by-case basis, though I agree with Walter that we
> > should balance the breakage cost with correspondingly high rewards in
> > terms of functionality improvements.
> > Andrei
> 
> Yes, the quality standard has gone up massively.  When I was prepping std.parallelism for review a few months ago, I generally used the existing Phobos documentation as a guideline for what std.parallelism's docs should resemble. Andrei, of course, ripped the documentation apart.  In hindsight it led to massive improvements and was for the better.  It certainly set the tone for clear, precise documentation in the future and the same high standards were applied to std.path and the std.curl.  However, at the time I actually thought he just hated std.parallelism at a gut level and was looking for any excuse to keep it out of Phobos.  (I apologize for having thought this and therefore taken a much more adversarial view of the review process than I should have.)

std.datetime is far better for having gone through multiple reviews as well. The resulting code isn't perfect, and reviews don't always catch everything, but thorough reviews really help improve the quality of code. Even just having other contributors look over pull requests tends to find stuff that can and should be improved. So, while there will likely always be some issues with code that make it into Phobos, the overall code quality is definitely improving.

- Jonathan M Davis
September 04, 2011
On Sat, 03 Sep 2011 23:45:17 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 9/3/11 11:33 PM, Steven Schveighoffer wrote:
>> We have to break something in std.stdio, because it's fixated on FILE *.
>> We need something that allows FILE * to play the game, but is focused on
>> a D-based solution. Otherwise, we have no room for improvement.
>
> I'm not 100% convinced of that. We can achieve a good deal of improvement by resorting to platform-specific code. Clearly that's not the best way to go but it's not difficult and it does have its merit.
>
> Overall I think the design of std.stdio should be followed:
>
> 1. User opens a File (or whatever), which is a struct. The struct uses RAII.

OK, I think that's the offer on the table I keep getting :)  I'm definitely going to use this, and its name will be File.  I think it has to be in order to be compatible with all current code.

>
> 2. Using the struct you can directly call primitives to read and write stuff.

Buffered reads and writes?  If so, don't you need to decide the items in point 3 before read/write?  If not buffered, then I think I can work with this.

>
> 3. You can also decide you want a polymorphic stream out of it, and you get to decide the parameters of the stream (buffering, chunking, synchronicity and whatnot). byChunk and byLine are good examples, although they aren't polymorphic. Once you have such a stream you're in polyland so you get to use all of its goodies (look ma no templates etc).
>
> 4. Once all copies of the struct is destroyed, all streams derived from it are automatically closed and will issue errors when used.

OK, I think I know how to do this.

I'm assuming if you want to use exclusively the poly versions, you can do that.  I.e. you don't have to keep an RAII File struct around.

> That's pretty much it! It's a simple design that does all we need.

I'll work on that.

How should text vs. non-text i/o work?  C currently conflates them at the same level, but I think they are two separate layers.  What do you think?

-Steve
September 04, 2011
On 9/3/2011 7:27 PM, dsimcha wrote:
> These changes are purely under
> the hood, though, and there should be zero code breakage.

Those are the great kind of changes, and it's also nice in that it means the API was done reasonably right.