May 16, 2012
"Steven Schveighoffer" , dans le message (digitalmars.D:167548), a
> My new design supports this.  I have a function called readUntil:
> 
> https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832
> 
> Essentially, it reads into its buffer until the condition is satisfied. Therefore, you are not double buffering.  The return value is a slice of the buffer.
> 
> There is a way to opt-out of reading any data if you determine you cannot do a full read.  Just return 0 from the delegate.

Maybe I already told this some time ago, but I am not very comfortable with this design. The process delegate has to maintain an internal state, if you want to avoid reading everything again. It will be difficult to implement those process delegates. Do you have an example of moderately complicated reading process to show us it is not too complicated?

To avoid this issue, the design could be reversed: A method that would like to read a certain amount of character could take a delegate from the stream, which provides additionnal bytes of data.

Example:
// create a T by reading from stream. returns true if the T was
// successfully created, and false otherwise.
bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t);

The stream delegate returns a buffer of data to read from when called with consumed==0. It must return additionnal data when called repeatedly. When it is called with a consumed != 0, the corresponding amount of consumed bytes can be discared from the buffer.

This "stream" delegate (if should have a better name) should not be more difficult to implement than readUntil, but makes it more easy to use by the client. Did I miss some important information ?

-- 
Christophe
May 16, 2012
On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2@digitalmars.com> wrote:

> On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
>> I do agree for e.g. with binary data some data can't be read with ranges (when
>> you need to read small chunks of varying size),
>
> I don't see why that should be true.

How do you tell front and popFront how many bytes to read?

-Steve
May 16, 2012
On May 16, 2012, at 6:52 AM, Walter Bright <newshound2@digitalmars.com> wrote:

> On 5/15/2012 4:43 PM, Sean Kelly wrote:
>> One thing I'd like in a buffered input API is a way to perform transactional reads such that if the full read can't be performed, the read state remains unchanged. The best you can do with most APIs is to check for a desired length, but what I'd I don't want to read until a full line is available, and I don't know the exact length?  Typically, you end up having to double buffer, which stinks.
> 
> std.stdio.byLine()

That was just an example. What if I want to do a formatted read and I'm reading from a file that someone else is writing to?  I don't want to block or get a partial result and an EOF that needs to be reset.
May 16, 2012
On Wed, 16 May 2012 10:03:42 -0400, Christophe Travert <travert@phare.normalesup.org> wrote:

> "Steven Schveighoffer" , dans le message (digitalmars.D:167548), a
>> My new design supports this.  I have a function called readUntil:
>>
>> https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L832
>>
>> Essentially, it reads into its buffer until the condition is satisfied.
>> Therefore, you are not double buffering.  The return value is a slice of
>> the buffer.
>>
>> There is a way to opt-out of reading any data if you determine you cannot
>> do a full read.  Just return 0 from the delegate.
>
> Maybe I already told this some time ago, but I am not very comfortable
> with this design. The process delegate has to maintain an internal
> state, if you want to avoid reading everything again. It will be
> difficult to implement those process delegates.

The delegate is given which portion has already been "processed", that is the 'start' parameter.  If you can use this information, it's highly useful.

If you need more context, yes, you have to store it elsewhere, but you do have a delegate which contains a context pointer.  In a few places (take a look at TextStream's readln https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2149) I use inner functions that have access to the function call's frame pointer in order to configure or store data.

> Do you have an example
> of moderately complicated reading process to show us it is not too
> complicated?

The most complicated I have so far is reading UTF data as a range of dchar:

https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L2209

Note that I hand-inlined all the decoding because using std.utf or the runtime was too slow, so although it looks huge, it's pretty basic stuff, and can largely be ignored for the terms of this discussion.  The interesting part is how it specifies what to consume and what not to.

I realize it's a different way of thinking about how to do I/O, but it gives more control to the buffer, so it can reason about how best to buffer things.  I look at as a way of the buffered stream saying "I'll read some data, you tell me when you see something interesting, and I'll give you a slice to it".  The alternative is to double-buffer your data.  Each call to read can invalidate the previously buffered data.  But readUntil guarantees the data is contiguous and consumed all at once, no need to double-buffer

>
> To avoid this issue, the design could be reversed: A method that would
> like to read a certain amount of character could take a delegate from
> the stream, which provides additionnal bytes of data.
>
> Example:
> // create a T by reading from stream. returns true if the T was
> // successfully created, and false otherwise.
> bool readFrom(const(ubyte)[] delegate(size_t consumed) stream, out T t);
>
> The stream delegate returns a buffer of data to read from when called
> with consumed==0. It must return additionnal data when called
> repeatedly. When it is called with a consumed != 0, the corresponding
> amount of consumed bytes can be discared from the buffer.

I can see use cases for both your method and mine.

I think I can implement your idea in terms of mine.  I might just do that.  The only thing missing is, you need a way to specify to the delegate that it needs more data.  Probably using size_t.max as an argument.

In fact, I need a peek function anyways, your function will provide that ability as well.

-Steve
May 16, 2012
On 16/05/2012 15:38, Steven Schveighoffer wrote:
> On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
> <newshound2@digitalmars.com> wrote:
>
>> On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
>>> I do agree for e.g. with binary data some data can't be read with
>>> ranges (when
>>> you need to read small chunks of varying size),
>>
>> I don't see why that should be true.
>
> How do you tell front and popFront how many bytes to read?
>
> -Steve

A bit ugly but:
----
// Default to 4 byte chunks
auto range = myStream.byChunks(4);
foreach (chunk; range) {
   // Set the next chunk is 3 bytes
   // Chunk after is 4 bytes
   range.nextChunkSize = 3;

   // Next chunk is always 5 bytes
   range.chunkSize = 5;
}
----

-- 
Robert
http://octarineparrot.com/
May 16, 2012
On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham <robert@octarineparrot.com> wrote:

> On 16/05/2012 15:38, Steven Schveighoffer wrote:
>> On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
>> <newshound2@digitalmars.com> wrote:
>>
>>> On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
>>>> I do agree for e.g. with binary data some data can't be read with
>>>> ranges (when
>>>> you need to read small chunks of varying size),
>>>
>>> I don't see why that should be true.
>>
>> How do you tell front and popFront how many bytes to read?
>>
>> -Steve
>
> A bit ugly but:
> ----
> // Default to 4 byte chunks
> auto range = myStream.byChunks(4);
> foreach (chunk; range) {
>     // Set the next chunk is 3 bytes
>     // Chunk after is 4 bytes
>     range.nextChunkSize = 3;
>
>     // Next chunk is always 5 bytes
>     range.chunkSize = 5;
> }

Yeah, I've seen this before.  It's not convincing.

-Steve
May 16, 2012
On 16.05.2012 19:32, Steven Schveighoffer wrote:
> On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham
> <robert@octarineparrot.com> wrote:
>
>> On 16/05/2012 15:38, Steven Schveighoffer wrote:
>>> On Wed, 16 May 2012 09:50:12 -0400, Walter Bright
>>> <newshound2@digitalmars.com> wrote:
>>>
>>>> On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
>>>>> I do agree for e.g. with binary data some data can't be read with
>>>>> ranges (when
>>>>> you need to read small chunks of varying size),
>>>>
>>>> I don't see why that should be true.
>>>
>>> How do you tell front and popFront how many bytes to read?
>>>
>>> -Steve
>>
>> A bit ugly but:
>> ----
>> // Default to 4 byte chunks
>> auto range = myStream.byChunks(4);
>> foreach (chunk; range) {
>> // Set the next chunk is 3 bytes
>> // Chunk after is 4 bytes
>> range.nextChunkSize = 3;
>>
>> // Next chunk is always 5 bytes
>> range.chunkSize = 5;
>> }
>
> Yeah, I've seen this before. It's not convincing.
>

Yes, It's obvious that files do *not* generally follow range of items semantic. I mean not even range of various items.
In case of binary data it's most of the time header followed by various data. Or hierarchical structure. Or table of links + raw data.
Or whatever. I've yet to see standard way to deal with binary formats :)


-- 
Dmitry Olshansky
May 16, 2012
On Wed, 16 May 2012 11:48:32 -0400, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:

> On 16.05.2012 19:32, Steven Schveighoffer wrote:
>> On Wed, 16 May 2012 11:19:46 -0400, Robert Clipsham
>> <robert@octarineparrot.com> wrote:
>>> A bit ugly but:
>>> ----
>>> // Default to 4 byte chunks
>>> auto range = myStream.byChunks(4);
>>> foreach (chunk; range) {
>>> // Set the next chunk is 3 bytes
>>> // Chunk after is 4 bytes
>>> range.nextChunkSize = 3;
>>>
>>> // Next chunk is always 5 bytes
>>> range.chunkSize = 5;
>>> }
>>
>> Yeah, I've seen this before. It's not convincing.
>>
>
> Yes, It's obvious that files do *not* generally follow range of items semantic. I mean not even range of various items.
> In case of binary data it's most of the time header followed by various data. Or hierarchical structure. Or table of links + raw data.
> Or whatever. I've yet to see standard way to deal with binary formats :)

The best solution would be a range that's specific to your format.  My solution intends to support that.

But that's only if your format fits within the "range of elements" model.

Good old fashioned "read X bytes" needs to be supported, and insisting you do this range style is just plain wrong IMO.

-Steve
May 16, 2012
On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
> On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2@digitalmars.com>
> wrote:
>
>> On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
>>> I do agree for e.g. with binary data some data can't be read with ranges (when
>>> you need to read small chunks of varying size),
>>
>> I don't see why that should be true.
>
> How do you tell front and popFront how many bytes to read?

std.byLine() does it.

In general, you can read n bytes by calling empty, front, and popFront n times.
May 16, 2012
On 5/16/2012 7:49 AM, Sean Kelly wrote:
> On May 16, 2012, at 6:52 AM, Walter Bright<newshound2@digitalmars.com>
> wrote:
>
>> On 5/15/2012 4:43 PM, Sean Kelly wrote:
>>> One thing I'd like in a buffered input API is a way to perform
>>> transactional reads such that if the full read can't be performed, the
>>> read state remains unchanged. The best you can do with most APIs is to
>>> check for a desired length, but what I'd I don't want to read until a
>>> full line is available, and I don't know the exact length?  Typically,
>>> you end up having to double buffer, which stinks.
>>
>> std.stdio.byLine()
>
> That was just an example. What if I want to do a formatted read and I'm
> reading from a file that someone else is writing to?  I don't want to block
> or get a partial result and an EOF that needs to be reset.

Then you'll need an input range that can be reset - a ForwardRange.