May 15, 2012
On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote:
> On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
>> I keep trying to avoid talking about this, because I'm writing a replacement
>> library for std.stream, and I don't want to step on any toes while it's still
>> not accepted.
>>
>> But I have to say, ranges are *not* a good interface for generic data providers.
>> They are *very* good for structured data providers.
>>
>> In other words, a stream of bytes, not a good range (who wants to get one byte
>> at a time?). A stream of UTF text broken into lines, a very good range.
>>
>> [...]
>
> I'll say in advance without seeing your design that it'll be a tough sell if it is not range based.
>
> I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal.
>
> [...]

I have to say, I'm with Steve on this one.  While I do believe
ranges will have a very important role to play in D's future I/O
paradigm, I also think there needs to be a layer beneath the
ranges that more directly maps to OS primitives.  And as D is a
systems programming language, that layer needs to be publicly
available.  (Note that this is how std.stdio works now, more or
less.)

-Lars
May 15, 2012
On Tuesday, 15 May 2012 at 15:22:03 UTC, Lars T. Kyllingstad wrote:
> On Tuesday, 15 May 2012 at 02:56:20 UTC, Walter Bright wrote:
>> On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
>>> I keep trying to avoid talking about this, because I'm writing a replacement
>>> library for std.stream, and I don't want to step on any toes while it's still
>>> not accepted.
>>>
>>> But I have to say, ranges are *not* a good interface for generic data providers.
>>> They are *very* good for structured data providers.
>>>
>>> In other words, a stream of bytes, not a good range (who wants to get one byte
>>> at a time?). A stream of UTF text broken into lines, a very good range.
>>>
>>> [...]
>>
>> I'll say in advance without seeing your design that it'll be a tough sell if it is not range based.
>>
>> I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal.
>>
>> [...]
>
> I have to say, I'm with Steve on this one.  While I do believe
> ranges will have a very important role to play in D's future I/O
> paradigm, I also think there needs to be a layer beneath the
> ranges that more directly maps to OS primitives.  And as D is a
> systems programming language, that layer needs to be publicly
> available.  (Note that this is how std.stdio works now, more or
> less.)

Also, I wouldn't mind std.*stream getting deprecated.  Personally, I've never used those modules -- not even once.  As a first step their documentation could be removed from dlang.org, so new users aren't tempted to start using them.  No functionality is better than poor functionality, IMO.

-Lars

May 15, 2012
On Sunday, 13 May 2012 at 22:26:17 UTC, Walter Bright wrote:
> On 5/13/2012 3:16 PM, Nathan M. Swan wrote:
>> Trying to make it read lazily is even harder, as all std.utf functions work on
>> arrays, not ranges. I think this should change.
>
> Yes, std.utf should be upgraded to present range interfaces.

+1 on that.

I really needed it when doing the std.net.curl stuff and would be
happy to move it to a more generic handling in std.utf.


May 15, 2012
On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer wrote:
> In other words, a stream of bytes, not a good range (who wants to get one byte at a time?).  A stream of UTF text broken into lines, a very good range.

There are several cases where one would want one byte at the time; e.g. as an input to another range that produces the utf text as an output.

I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), but that doesn't mean most things shouldn't be ranged-based.

NMS
May 16, 2012
On May 15, 2012, at 3:34 PM, "Nathan M. Swan" <nathanmswan@gmail.com> wrote:

> On Monday, 14 May 2012 at 15:02:11 UTC, Steven Schveighoffer wrote:
>> In other words, a stream of bytes, not a good range (who wants to get one byte at a time?).  A stream of UTF text broken into lines, a very good range.
> 
> There are several cases where one would want one byte at the time; e.g. as an input to another range that produces the utf text as an output.
> 
> I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size), but that doesn't mean most things shouldn't be ranged-based.

You really want both, depending on the situation. I don't see what's weird about this. C++ iostreams have input and output iterators built on top as well, for much the same reason. The annoying part is that once you've moved to a range interface it's hard to go back. Like say I want a ZipRange on top of a FileRange.  But now I wan to read structs as binary blobs from that uncompressed output.

One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length?  Typically, you end up having to double buffer, which stinks.
May 16, 2012
On Tue, May 15, 2012 at 04:43:05PM -0700, Sean Kelly wrote: [...]
> One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length?  Typically, you end up having to double buffer, which stinks.

This would be very nice to have, but how would you go about implementing such a thing, though? Wouldn't you need OS-level support for it?


T

-- 
Let's eat some disquits while we format the biskettes.
May 16, 2012
On Tue, 15 May 2012 19:43:05 -0400, Sean Kelly <sean@invisibleduck.org> wrote:

> One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length?  Typically, you end up having to double buffer, which stinks.

My new design supports this.  I have a function called readUntil:

https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832

Essentially, it reads into its buffer until the condition is satisfied.  Therefore, you are not double buffering.  The return value is a slice of the buffer.

There is a way to opt-out of reading any data if you determine you cannot do a full read.  Just return 0 from the delegate.

-Steve
May 16, 2012
On Mon, 14 May 2012 22:56:08 -0400, Walter Bright <newshound2@digitalmars.com> wrote:

> On 5/14/2012 8:02 AM, Steven Schveighoffer wrote:
>> I keep trying to avoid talking about this, because I'm writing a replacement
>> library for std.stream, and I don't want to step on any toes while it's still
>> not accepted.
>>
>> But I have to say, ranges are *not* a good interface for generic data providers.
>> They are *very* good for structured data providers.
>>
>> In other words, a stream of bytes, not a good range (who wants to get one byte
>> at a time?). A stream of UTF text broken into lines, a very good range.
>>
>> I have no problem with getting rid of std.stream. I've never actually used it.
>> Still, we absolutely need a non-range based low-level streaming interface to
>> data. If nothing else, we need something we can build ranges upon, and I think
>> my replacement does a very good job of that.
>
> I'll say in advance without seeing your design that it'll be a tough sell if it is not range based.
>
> I've been doing some range based work on the side. I'm convinced there is enormous potential there, despite numerous shortcomings with them I ran across in Phobos. Those shortcomings can be fixed, they are not fatal.
>
> The ability to do things like:
>
>   void main() {
>    stdin.byChunk(1024).
>       map!(a => a.idup). // one of those shortcomings
>       joiner().
>       stripComments().
>       copy(stdout.lockingTextWriter());
>   }

I think we may have a misunderstanding.  My design is not range-based, but supports ranges, and actually makes them very easy to implement.

byChunk is a perfect example of good range -- it defines a specific criteria for determining an "element" of data, appropriate for specific situations.

But it must be built on top of something that allows reading arbitrary amounts of data.  At the lowest level, this is the OS file descriptor/HANDLE.

To be efficient, it should be based on a buffering stream.  That buffering stream *does not* need to be a range, and I don't think shoehorning such a construct into a range interface makes any sense.

To make this clear, I can say that any range File supports, my design will support *as a range*.

To make it even clearer, the current std.stdio.File structure, which you have shown to "kick ass" with ranges, is *NOT* range-based by my definition.

I should note, the output range idiom is directly supported, because the output range definition exactly maps to an output stream definition.

-Steve
May 16, 2012
On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
> I do agree for e.g. with binary data some data can't be read with ranges (when
> you need to read small chunks of varying size),

I don't see why that should be true.
May 16, 2012
On 5/15/2012 4:43 PM, Sean Kelly wrote:
> One thing I'd like in a buffered input API is a way to perform transactional
> reads such that if the full read can't be performed, the read state remains
> unchanged. The best you can do with most APIs is to check for a desired
> length, but what I'd I don't want to read until a full line is available, and
> I don't know the exact length?  Typically, you end up having to double
> buffer, which stinks.

std.stdio.byLine()