March 09, 2013
On Fri, 08 Mar 2013 20:59:33 -0500, Stewart Gordon <smjg_1998@yahoo.com> wrote:

> On 07/03/2013 12:07, Steven Schveighoffer wrote:
> <snip>
>> I don't really understand the need to make ranges into streams.
> <snip>
>
> Ask Walter - from what I recall it was his idea to have range-based file I/O to replace std.stream.

I hope to convince Walter the error of his ways :)

The problem with this idea, is that there isn't a proven design.  All designs I've seen that involve ranges don't look attractive, and end up looking like streams with an awkward range API tacked on.  I could be wrong, there could be that really great range API that nobody has suggested yet.  But from what I can tell, the desire to have ranges be streams is based on having all these methods that work with ranges, wouldn't it be cool if you could do that with streams too.

> Thikning about it now, a range-based interface might be good for reading files of certain kinds, but isn't suited to general file I/O.

I think a range interface works great as a high level mechanism.  Like a range for xml parsing, front could be the current element, popFront could give you the next, etc.  I think with the design I have, it can be done with minimal buffering, and without double-buffering.

But I see no need to use a range to feed the range data from a file.

-Steve
March 09, 2013
On Saturday, March 09, 2013 01:59:33 Stewart Gordon wrote:
> On 07/03/2013 12:07, Steven Schveighoffer wrote:
> <snip>
> 
> > I don't really understand the need to make ranges into streams.
> 
> <snip>
> 
> Ask Walter - from what I recall it was his idea to have range-based file I/O to replace std.stream.
> 
> Thikning about it now, a range-based interface might be good for reading files of certain kinds, but isn't suited to general file I/O.

In general, ranges should work just fine for I/O as long as they have an efficient implementation which underneathbuffers (and preferably makes them forward ranges). Aside from how its implemented internally, there's no real difference between operating on a range over a file and any other range. The trick is making it efficient internally. Doing something like reading a character at a time from a file every time that popFront is called would be horrible, but with buffering, it should be just fine. Now, you're not going to get a random-access range that way, but it should work fine as a forward range, and std.mmfile will probably give you want you want if an RA range is what you really need (and that, we have already).

- Jonathan M Davis
March 09, 2013
On Fri, Mar 08, 2013 at 09:30:30PM -0500, Jonathan M Davis wrote:
> On Saturday, March 09, 2013 01:59:33 Stewart Gordon wrote:
> > On 07/03/2013 12:07, Steven Schveighoffer wrote:
> > <snip>
> > 
> > > I don't really understand the need to make ranges into streams.
> > 
> > <snip>
> > 
> > Ask Walter - from what I recall it was his idea to have range-based file I/O to replace std.stream.
> > 
> > Thikning about it now, a range-based interface might be good for reading files of certain kinds, but isn't suited to general file I/O.
> 
> In general, ranges should work just fine for I/O as long as they have an efficient implementation which underneathbuffers (and preferably makes them forward ranges). Aside from how its implemented internally, there's no real difference between operating on a range over a file and any other range. The trick is making it efficient internally. Doing something like reading a character at a time from a file every time that popFront is called would be horrible, but with buffering, it should be just fine. Now, you're not going to get a random-access range that way, but it should work fine as a forward range, and std.mmfile will probably give you want you want if an RA range is what you really need (and that, we have already).
[...]

I think the new std.stream should have a low-level stream API based on reading & simultaneously advancing by n bytes. This is still the most efficient approach for low-level file I/O.

On top of this core, we can provide range-based APIs which are backed by buffers implemented using the stream API. Conceptually, it could be something like this:

	module std.stream;

	struct FileStream {
		File _impl;
		...

		// Low-level stream API
		void read(T)(ref T[] buffer, size_t n);
		bool eof();
	}

	struct BufferedStream(T, SrcStream) {
		SrcStream impl;
		T[]    buffer;
		size_t readPos;

		enum BufSize = ...; // some suitable value

		this() {
			buffer.length = BufSize;
		}

		// Range API
		T front() { return buffer[readPos]; }
		bool empty() {
			return impl.eof && readPos >= buffer.length;
		}
		void popFront() {
			if (++readPos >= buffer.length) {
				// Load next chunk of file into buffer
				impl.read(buffer, BufSize);
				readPos = 0;
			}
		}
	}

Suitable adaptor functions/structs/etc. can be used for automatically converting between streams and range APIs via BufferedStream, etc..

As for making ranges into streams: it could be useful for transparently substituting, say, a string buffer for file input for generic code that operates on streams. I'm not sure if ranges are the right thing to use here, though; if all you have is an input stream, then generic code that uses BufferedStream on top that would be horribly inefficient. It may make more sense to require an array.

Another approach could be to extend the idea of a range, to have, for lack of a better term, a StreamRange or something of the sort, that provides a read() method (or maybe more suitably named, like copyFrontN() or something along those lines) that is equivalent to copying .front and calling popFront n times. But we already have trouble taming the current variety of ranges, so I'm not sure if this is a good idea or not.  Jonathan probably will hate the idea of introducing yet another range type to the mix. :)


T

-- 
"How are you doing?" "Doing what?"
March 09, 2013
> One thing to remember is that streams need to be runtime swappable.  For instance, I should be able to replace stdout with a stream of my choice.

That does make my solution a little tougher to implement. Hmmm...

It looks like a monolithic type is the easiest solution, but it definitely should have range support somewhere. Since that's already planned (at least as I understand it), I guess I don't really have any complaints about it.

Now, I wouldn't mind if you made the default source a "block-input range", since it could have very similar performance characteristics to an integrated source and would provide a useful range for other stuff, but an integrated source would be manageable and probably just a hair faster.


March 09, 2013
On 09/03/2013 02:30, Jonathan M Davis wrote:
<snip>
> In general, ranges should work just fine for I/O as long as they have an
> efficient implementation which underneathbuffers (and preferably makes them
> forward ranges). Aside from how its implemented internally, there's no real
> difference between operating on a range over a file and any other range. The
> trick is making it efficient internally. Doing something like reading a
> character at a time from a file every time that popFront is called would be
> horrible, but with buffering, it should be just fine.

If examining one byte at a time is what you want.  I mean this at the program logic level, not just the implementation level.  The fact remains that most applications want to look at bigger portions of the file at a time.

    ubyte[] data;
    data.length = 100;
    foreach (ref b; data) b = file.popFront();

Even with buffering, a block memory copy is likely to be more efficient than transferring each byte individually.

You could provide direct memory access to the buffer, but this creates further complications if you want to read variable-size chunks.  Further variables that affect the best way to do it include whether you want to keep hold of previously read chunks and whether you want to perform in-place modifications of the read-in data.

> Now, you're not going to
> get a random-access range that way, but it should work fine as a forward range,
> and std.mmfile will probably give you want you want if an RA range is what you
> really need (and that, we have already).

Yes, random-access file I/O is another thing.  I was thinking primarily of cases where you want to just read the file through and process it while doing so.  I imagine that most word processors, graphics editors, etc. will read the file and then generate the file afresh when you save, rather than just writing the changes to the file.

And then there are web browsers, which read files of various types both from the user's local file storage and over an HTTP connection.

Stewart.
March 10, 2013
Am Sat, 09 Mar 2013 16:30:24 +0000
schrieb Stewart Gordon <smjg_1998@yahoo.com>:

> Yes, random-access file I/O is another thing.  I was thinking primarily of cases where you want to just read the file through and process it while doing so.  I imagine that most word processors, graphics editors, etc. will read the file and then generate the file afresh when you save, rather than just writing the changes to the file.
> 
> And then there are web browsers, which read files of various types both from the user's local file storage and over an HTTP connection.
> 
> Stewart.

For most binary formats you need to deal with endianness for short/int/long and blocks of either fixed size or with two versions (e.g. a revised extended bitmap header) or alltogether dynamic size. Some formats may also reading the last bytes first, like ID3 tags in MP3s. And then there are compressed formats with data types of < 8 bits or dynamic bit allocations.

It's all obvious, but I had a feeling your use cases are too restricted. Anyways I no longer know what the discrimination between std.io and std.streams will be.

-- 
Marco

March 10, 2013
On 10/03/2013 15:48, Marco Leise wrote:
<snip>
> For most binary formats you need to deal with endianness for
> short/int/long

Endian conversion is really part of decoding the data, rather than of reading the file.  As such, it should be a layer over the raw file I/O API/implementation.

And probably as often as not, you want to read in or write out a struct that includes some multi-byte numerical values, e.g. an image file header which has width, height, colour type, bit depth, possibly a few other parameters such as compression or interlacing, and not all of which will be integers of the same size.  ISTM the most efficient way to do this is to read the block of bytes from the file, and then do the byte-order conversions in the file-format-specific code.

> and blocks of either fixed size or with two versions (e.g. a revised
> extended bitmap header)or alltogether dynamic size.

Yes, that's exactly why we have in std.stream a method that reads a number of bytes specified at runtime, and why it is a fundamental part of any stream API that is designed to work on binary files.

> Some formats may also reading the
> last bytes first, like ID3 tags in MP3s.

Do you mean ID3 data is stored backwards in MP3 files?  Still, that's half the reason that file streams tend to be seekable.

> And then there are compressed formats with data types of < 8 bits or
> dynamic bit allocations.

But:
- it's a very specialised application
- I would expect most compressed file formats to still have byte-level structure
- implementing this would be complicated given bit-order considerations and the way that the OS (and possibly even the hardware) manipulates files

As such, this should be implemented as a layer over the raw stream API.

> It's all obvious, but I had a feeling your use cases are too
> restricted.
<snip>

The cases I've covered are the cases that seem to me to be what should be covered by a general-purpose stream API.

Stewart.
July 05, 2013
On Saturday, 9 March 2013 at 02:13:36 UTC, Steven Schveighoffer wrote:
> On Fri, 08 Mar 2013 20:59:33 -0500, Stewart Gordon <smjg_1998@yahoo.com> wrote:
>
>> On 07/03/2013 12:07, Steven Schveighoffer wrote:
>> <snip>
>>> I don't really understand the need to make ranges into streams.
>> <snip>
>>
>> Ask Walter - from what I recall it was his idea to have range-based file I/O to replace std.stream.
>
> I hope to convince Walter the error of his ways :)
>
> The problem with this idea, is that there isn't a proven design.  All designs I've seen that involve ranges don't look attractive, and end up looking like streams with an awkward range API tacked on.  I could be wrong, there could be that really great range API that nobody has suggested yet.  But from what I can tell, the desire to have ranges be streams is based on having all these methods that work with ranges, wouldn't it be cool if you could do that with streams too.
>
>> Thikning about it now, a range-based interface might be good for reading files of certain kinds, but isn't suited to general file I/O.
>
> I think a range interface works great as a high level mechanism.  Like a range for xml parsing, front could be the current element, popFront could give you the next, etc.  I think with the design I have, it can be done with minimal buffering, and without double-buffering.
>
> But I see no need to use a range to feed the range data from a file.
>
> -Steve

I agree with this 100%, but I obviously am not the one making the decision.

My point in resurrecting this thread is that I'd like to start working on a few D libraries that will rely on streams, but I've been trying to hold off until this gets done. I'm sure there are plenty of others that would like to see streams get finished.

Do you have an ETA for when you'll have something for review? If not, do you have the code posted somewhere so others can help?

The projects I'm interested in working on are:

- HTTP library (probably end up pulling out some vibe.d stuff)
- SSH library (client/server)
- rsync library (built on SSH library)

You've probably already thought about this, but it would be really nice to either unread bytes or have some efficient way to get bytes without consuming them. This would help with writing an "until" function (read until either a new-line or N bytes have been read) when the exact number of bytes to read isn't known.

I'd love to help in testing things out. I'm okay with building against alpha-quality code, and I'm sure you'd like to get some feedback on the design as well.

Let me know if there's any way that I can help. I'm very interested in seeing this get finished sooner rather than later.
July 05, 2013
I think you can win with both. You can have very convenient and general abstractions like ranges which perform very well too. In addition, you can provide all of the usual range features to make them compatible with generic algorithms, and a few extra methods for extra features, like changing the block size.
December 14, 2013
On Thu, 04 Jul 2013 22:53:46 -0400, Tyler Jameson Little <beatgammit@gmail.com> wrote:

> On Saturday, 9 March 2013 at 02:13:36 UTC, Steven Schveighoffer wrote:
>>
>> I think a range interface works great as a high level mechanism.  Like a range for xml parsing, front could be the current element, popFront could give you the next, etc.  I think with the design I have, it can be done with minimal buffering, and without double-buffering.
>>
>> But I see no need to use a range to feed the range data from a file.
>>
>> -Steve
>
> I agree with this 100%, but I obviously am not the one making the decision.
>
> My point in resurrecting this thread is that I'd like to start working on a few D libraries that will rely on streams, but I've been trying to hold off until this gets done. I'm sure there are plenty of others that would like to see streams get finished.
>
> Do you have an ETA for when you'll have something for review? If not, do you have the code posted somewhere so others can help?

I realize this is really old, and I sort of dropped off the D cliff because all of a sudden I had 0 extra time.

But I am going to get back into working on this (if it's still an issue, I still need to peruse the NG completely to see what has happened in the last few months). I have something that is really old but was working. At this point, I wouldn't recommend reading the code, just the design, but it's in my github account here:

https://github.com/schveiguy/phobos/tree/new-io2

Wow, it's 2 years old. Time flies.

> The projects I'm interested in working on are:
>
> - HTTP library (probably end up pulling out some vibe.d stuff)
> - SSH library (client/server)
> - rsync library (built on SSH library)
>
> You've probably already thought about this, but it would be really nice to either unread bytes or have some efficient way to get bytes without consuming them. This would help with writing an "until" function (read until either a new-line or N bytes have been read) when the exact number of bytes to read isn't known.

Yes, this is part of the design.

> I'd love to help in testing things out. I'm okay with building against alpha-quality code, and I'm sure you'd like to get some feedback on the design as well.

At this point, the design is roughly done, and the code was working, but 2 years ago :) The new-io2 branch probably doesn't work. The new-io branch should work, but I had to rip apart the design due to objections of how I designed it. The guts will be the same though.

> Let me know if there's any way that I can help. I'm very interested in seeing this get finished sooner rather than later.

At this point, maybe you have lost interest. But if not, I wouldn't mind having help on it. Send me an email if you still are.

-Steve