September 04, 2011
What happens if I write:

   printf("hello ");
   writeln("world");

?
September 04, 2011
Am 04.09.2011, 00:57 Uhr, schrieb Andrej Mitrovic <andrej.mitrovich@gmail.com>:

> Also, changing structs to classes is gonna *massively* break code
> everywhere. Why inheritance instead of a predicate like isInputStream
> = is(typeof(T t; t.put; t.close)), you know the drill..

Wasn't this overhaul _meant_ to break existing code by offering a new API? Still that's a serious issue of course, but not too surprising. I'm ambivalent on the inheritance vs predicate debate. Interfaces are the way it is meant to be done and actually ensure correct types. Predicates work with structs as well. I don't know if this would be important.
September 04, 2011
On Sunday, September 04, 2011 02:49:40 Marco Leise wrote:
> Am 04.09.2011, 00:57 Uhr, schrieb Andrej Mitrovic
> 
> <andrej.mitrovich@gmail.com>:
> > Also, changing structs to classes is gonna *massively* break code
> > everywhere. Why inheritance instead of a predicate like isInputStream
> > = is(typeof(T t; t.put; t.close)), you know the drill..
> 
> Wasn't this overhaul _meant_ to break existing code by offering a new API? Still that's a serious issue of course, but not too surprising. I'm ambivalent on the inheritance vs predicate debate. Interfaces are the way it is meant to be done and actually ensure correct types. Predicates work with structs as well. I don't know if this would be important.

Any overhaul of existing functionality needs to improve on existing functionality. Changes just to change aren't valuable. So, changes should generally avoiding breaking backwards compatibility unless we gain something from it. So, as long as these changes are an overall improvement, then we'll just have to deal with the code breakage. However, if the code breakage doesn't actually gain us anything, then we should avoid it. So, complaints about code breakage are valid, but they aren't deal breaking.

- Jonathan M Davis
September 04, 2011
On 9/3/2011 3:53 PM, dsimcha wrote:
> Agreed, but in the big picture this overhaul still breaks way too much code
> without either a clear migration path or a clear argument about why such extensive
> breakage is necessary.  The part about File(someFileName, someMode) is just the
> first thing I noticed.

[rant]

I agree. I agree that std.stream should be replaced, but I have a lot of misgivings about replacing std.stdio. I do not want to rewrite every darn D program I've ever written. I think it is a bad idea to break everyone else's D program.

Everything in dsource will break in non-trivial ways. I don't think we can afford this. I do not know of any successful system or language that breaks user code with such aplomb as D does. Not even C++ dares to break that Piece Of S*** that everyone knows iostreams is. I can compile and run unix C code from 30 years ago on Linux with no changes at all. Same with DOS code.

There needs to be huge improvement to justify such breakage.

[I also don't like it that all my code that uses std.path is now broken.]

I would prefer to see all the energy that is going into refactoring existing, working modules go into designing new, not existing, modules that there's a crying need for.

[/rant]

Enough ranting for now, as for the proposed std.stdio,

1. It does look fairly straightforward, but:

2. There is only one example. Have any commonly done programming tasks been tried out with it to see how they work?

3. There is no indication of how it interacts with C stdio. A primary goal of std.stdio was interoperability with C stdio.

4. There are no benchmarks. The current std.stdio was designed/written in parallel with some benchmarks Andrei and others cooked up, as a primary goal was performance.

5. flushCheck - flushing should be done based on the file type. tty's should be \n flushed, files when the buffer is full. I question the performance of using a delegate to check for flushing. How often will it be called?

6. There is no provision for multithreaded writing, i.e. what happens when two threads write to stdout. Ideally, there should be a way to 'lock' the stream to oneself, in order to appropriately interleave the output.

7. I see nothing for 'raw' character by character input.

8. I see nothing for determining if a char is available on the input. How would one implement "press any key to continue"?

September 04, 2011
On 9/3/11 3:54 PM, Andrei Alexandrescu wrote:
> http://erdani.com/d/new-stdio/phobos-prerelease/std_stdio.html

Here are a few points following a pass through the dox:

* After thinking some more about it, I find the approach seek() plus enumerated Anchor undesirable. It's a bad case of logical coupling as one never calls seek() passing an anchor as a variable. It's really three functions - seekForward, seekBackward, and seekAbsolute. Heck, knowing what seek does, it should be just seekAbsolute. But then there are several possible designs; a logically coupled seek() is not a good turn in any case.

* Seekable should document that tell() is O(1) and seek() can be considered O(1) but with a large constant factor.

* Why is close() not part of Seekable, since Seekable seems to be the base of all streams?

* Class File is IMHO not going to cut the mustard. It needs to be a struct with a destructor. One should be able to _get_ an InputStream or an OutputStream interface out of a File object (i.e. a file is a factory of such interfaces), but the File itself must be a struct.

* I don't understand the difference between read() and readComplete().

* readUntil is a bit tenuous. I was hoping for a simpler interface to buffered streams, e.g. just expose the buffer as a ubyte[].

* readUntil(const(ubyte)[]) does not give a cheap means to figure whether the read ended because file ended or the terminator was met.

* There's several readUntil but only one appendUntil. Why?

* Document the difference between skip and seek. Also, skip should take a ulong.

* I see encoder and decoder() in DInput, should both be decoder?

* StreamWidth, TextXXX and friends are a bit sudden because they introduce a higher-level abstraction in a module so far only preoccupied to transferring bytes. I was thinking that kind of stuff would belong to a formatter/serializer module.

Overall, there are interesting elements in this proposal but I don't quite feel it hit the proverbial nail on the head.


Andrei
September 04, 2011
On 9/3/2011 5:58 PM, Jonathan M Davis wrote:
> However, if the code breakage
> doesn't actually gain us anything, then we should avoid it. So, complaints
> about code breakage are valid, but they aren't deal breaking.

The larger the amount of code that is broken, the more gain there must be to justify it.

Breaking std.stdio, which is used everywhere, this thoroughly needs a very high bar of justification.

September 04, 2011
On Sat, 03 Sep 2011 15:54:05 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Hello,
>
>
> There are a number of issues related to D's current handling of streams, including the existence of the imperfect etc.stream and the over-specialization of std.stdio.
>
> Steve has worked on an extensive overhaul of std.stdio which would obviate the need for etc.stream and would improve both the generality and efficiency of std.stdio.
>
> Please chime in with feedback; he's away from the Usenet but allowed me to post this on his behalf. I uploaded the docs to
>
> http://erdani.com/d/new-stdio/phobos-prerelease/std_stdio.html
>

Thank you Andrei for posting this.  Before I add some more details, let me first say, this is a very early version, but it does work (and spanks the pants off of the current stdio in the tests I've run).

I'll add several very important things:

1. At the moment, this is written for Linux *ONLY*.  I have very good experience with Windows i/o, and I am 100% certain I can implement this library for it.  However, it's not my main OS, so I wanted to first get something working with my main working environment.
2. This is *not* currently multithread aware.  But it will be.  However, I think one important aspect to consider is to make a *thread-local* aware i/o library to avoid unnecessary locking when an i/o connection is only used in one thread.  But please leave that part alone for now, I'm working on how to make the code reusable as shared types.  Actually, if anyone has good ideas on that, please share!
3. Although I am dead-set on getting *something* into Phobos, I am not attached at all to the symbol names, or even some major design choices.  I have seen so far it's one of the major concerns, and I think we can find good names.  The names I came up with are not exactly arbitrary, but they are somewhat based on earlier designs that I have since abandoned, so renaming is definitely in order.
4. You can get the full source here: https://github.com/schveiguy/phobos/tree/new-io  I used the 2.054 stock compiler, and a version of druntime that includes Lars' new std-process changes, also on my github account: https://github.com/schveiguy/druntime/tree/new-std-process  Please use those when trying out the code.

--------------------------

So let me tell you about the library design and why I did it the way I did it.  Then, I'll respond to individual concerns already posted.

The major problem I think the current std.stdio has is, it's buffered solution is based on C's FILE * implementation.  Specifically, we have very little control and access to the buffer implementation.  I think the key (or at least one of the keys) to uber-fast I/O is trying to copy as little as possible *needlessly*.  Seamless and safe buffer access I think is the key to this.  In addition to that, C's FILE * has several limitations:

1. On Windows, it's based on DMC's runtime, which limits 60 simultaneous open files (Windows OS limit is 10,000 I think)
2. 64-bit support is not standard in all C implementations (namely Windows)
3. All FILE * objects are inherently shared, meaning lock-free I/O is very cumbersome, especially considering we have D's shared/unshared system.
4. C supports UTF-8, and it's supposed to support UTF-16 (but I can't get UTF-16 to work).  I think D ought to support all forms of UTF, since UTF is an integral part of the language.

In addition to this, we have numerous D tools at our disposal -- delegates, closures, ranges, etc.  In other words, limiting us to C's interfaces means either duct-taping on those features, or abandoning them.  While a noble effort, and probably the best we could get, a prime example is the LockingFileReader range in std.stdio.  Just reading it made me cringe.  Have a look: https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1282

I felt, we must be able to do something better.

So I started creating what I thought would be a good i/o library.  I did not start from the existing code, but just rewrote everything.  The basic concept is, we implement buffering once, and implement low-level devices that can be wrapped by the buffering implementation.  Almost everything that would use I/O wants to use a buffered version of it, so make the low-level aggregate minimal, and put all the useful functionality into the buffer.  I also wanted to make sure it is very easy to implement *efficient* ranges.

One design decision early on is that the device-level should be a class.  There are a few good reasons for this:

1. an I/O device is a reference-type.  Copying it does not open another handle.  So even if we *wanted* structs, they would be pImpl structs.
2. One simple idea that works very well at the OS level is the file descriptor concept.  The file descriptor provides an *interface* to user code for operating on a stream.  And they are easily inter-changeable.  This means a fd could be a network socket, a file, a pipe, a COM port, and the basic interface never changes.  So we should use that same concept -- define a simple interface for a low-level device, and then you can implement the buffer around that interface.  Since classes are the only types which support interfaces, I chose them.

Yes, I know classes suffer from the dreaded "I don't know when the GC is going to get around to closing this file" problem.  I think though, we have ways to mediate that (I'll post some responses to points about that elsewhere in the thread).

One other important design decision I made was that the standard handles *must* be changable at runtime to C-based i/o.  This was mainly to appease Walter, as he insists on having compatible I/O with C functions (such as printf).  I think he has a good point, but I think limiting this to basically the standard handles is the right level of compatibility.

After going through many iterations (you can look at the github history if you are interested), I settled on this basic tree.  Note that I'm very open to changing any parts of this, as long as the basic concept of a common buffer type surrounding a low-level device type is kept intact.

interface Seekable => an interface defining seek functions for a device.
interface InputStream : Seekable => an interface defining functions that can be called on an input device.  This is non-buffered.
interface OutputStream : Seekable => an interface defining functions that can be called on an output device.  Also non-buffered.

class File : InputStream, OutputStream => The implementation for the OS handle-based input output stream.  This is akin to a file descriptor.  (Note, I realize this is a poor name choice for this, it should probably be changed).

final class DInput => The buffered input stream.  This implements the buffer which surrounds an InputStream.
final class DOutput => The buffered output stream.  This implements the buffer which surrounds an OutputStream.
final class CStream => A Buffered Input and output stream based on C's FILE *.  This is used if you want to be compatible with C input or output, and is used in TextInput and TextOutput when using the C standard handles.

struct TextInput => A text-based input stream.  This implements UTF translation of all forms and handles formatted input.  Main member function is readf.
struct TextOutput => A text-based output stream. This implements UTF translation of all forms and handles formatted output.  Main member functions are the write* family.

It seems like a lot.  But keep in mind that almost everyone will only ever used DInput, DOutput, TextInput and TextOutput.  These replace the current std.stdio.File.  The low level devices are for implementing low-level devices.  They are not really for being used, except to wrap in a buffer.  I expect that convenience functions will exist to create the correct buffered stream when given the right parameters.  The most obvious example is the function openFile (which is included).  The nice thing is, due to the auto return feature and templates, this takes care of some of the mess of having 4 main types to deal with.

I want to reiterate, I have created something that works, not something that is perfect.  I want everyone's input on how it should be changed -- including major design decisions.  I'm open to changing just about everything.  The *only* major concept I want to keep is the buffering surrounding a low-level device.

Thanks for taking the time to look at this.  I hope it will become good enough to be included in Phobos.  I plan to do everything I can to make it happen.

-Steve
September 04, 2011
Seems to me like virtually every module in Phobos gets a complete rewrite sooner or later. Yikes! Afaik the upcoming ones are also std.xml, std.variant, maybe std.json too? (can't recall). Was there really so much bad code written in Phobos all along that they all require a rewrite?
September 04, 2011
On Sat, 03 Sep 2011 17:20:53 -0400, dsimcha <dsimcha@yahoo.com> wrote:

> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>> Hello,
>> There are a number of issues related to D's current handling of streams,
>> including the existence of the imperfect etc.stream and the
>> over-specialization of std.stdio.
>> Steve has worked on an extensive overhaul of std.stdio which would
>> obviate the need for etc.stream and would improve both the generality
>> and efficiency of std.stdio.
>> Please chime in with feedback; he's away from the Usenet but allowed me
>> to post this on his behalf. I uploaded the docs to
>> http://erdani.com/d/new-stdio/phobos-prerelease/std_stdio.html
>> Thanks,
>> Andrei
>
> After a quick look, I have two concerns:
>
> 1.  File is a class, not a struct.  This precludes using reference counting as the
> current std.stdio.File does, meaning you have to close all your Files manually.  I
> loved the reference counting semantics, especially the last few releases since
> most of the relevant compiler bugs have been fixed.

As long as a class can contain a File as a member, this argument makes no sense to me.  In other words, it's impossible to remove the GC from the File destructor/refcounting system.

I think what may end up happening, in terms of File being a scoped entity is:

File becomes a struct.

File's sole member is a class that implements InputStream, OutputStream, and ref counting.  This would be roughly equivalent to today's File.  Except it's not buffered.  I think the names need work, and you are very right to point out that we should make existing code work as much as possible.

>
> 2.  File(someFileName, someMode) needs to work.  Not supporting this method of
> instantiating a File object would break way too much code.

I can change File.open to File.opCall, that will fix that.

-Steve
September 04, 2011
I will come back with some more detailed feedback later on, but a few nits that caught my eye:

 - I don't think changing file from being a struct to a class is a good idea. First, it breaks an awful lot of D/Phobos programs already out there, both because of the struct->class change and because of the other API changes. Second, I feel we should really try to make use of RAII for things like file handles – I know we have »scope (exit) file.close()«, but forcing the user to remember to always type that needs a very good reason, imho. Couldn't File rather have some factory methods returning stream interface implementations?

 - CStream and DInput/Output? I don't care how it is implemented under the hood, give me something that works! ;) In this case, I guess CStream is somewhat appropriate, as C (FILE*) streams are widely known, but still I'm not too fond of the names.

 - bufsize -> bufSize?

 - Why on earth does DDoc render the enum default parameter as »(Anchor).Begin«? Is there a bug report for this?

 - I am sure there is a reason why the design uses decoder delegates, but without the source being available, I didn't find it immediately obvious where the advantages of using it over processing what is being read() from the stream are. Is this so data can be processed before going into the buffer? On a related note, what seems to be the decoder property getter is named »encoder()«.

David


On 9/3/11 9:54 PM, Andrei Alexandrescu wrote:
> Hello,
>
>
> There are a number of issues related to D's current handling of streams,
> including the existence of the imperfect etc.stream and the
> over-specialization of std.stdio.
>
> Steve has worked on an extensive overhaul of std.stdio which would
> obviate the need for etc.stream and would improve both the generality
> and efficiency of std.stdio.
>
> Please chime in with feedback; he's away from the Usenet but allowed me
> to post this on his behalf. I uploaded the docs to
>
> http://erdani.com/d/new-stdio/phobos-prerelease/std_stdio.html
>
>
> Thanks,
>
> Andrei
>