May 18, 2012
On Fri, 18 May 2012 03:52:51 -0400, Mehrdad <wfunction@hotmail.com> wrote:

> On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer wrote:
>> 2. I realized, buffering input stream of type T is actually an input range of type T[].
>
> The trouble is, why a slice? Why not an std.array.Array? Why not some other data source?
> (Check/egg problem....)

Well, because that's what i/o buffers are :)  There isn't an OS primitive that reads a file descriptor into an e.g. linked list.  Anything other than a slice would go through a translation.

I don't know what std.array.Array is.

> Another problem I've noticed is the following:
>
>
> Say you're tokenizing some input range, and it happens to just be a huge, gigantic string.
>
> It *should* be possible to turn it into tokens with slices referring to the ORIGINAL string, which is VERY efficient because it doesn't require *any* heap allocations whatsoever. (You just tokenize with opApply() as you go, without every requiring a heap allocation...)
>
> However, this is *only* possible if you don't use the concept of an input range!

How so?  A slice is an input range, and so is a string.

> Since you can't slice an input range, you'd be forced to use the front() and popFront() properties. But, as soon as you do that, you're gonna have to store the data somewhere... so your next-best option is to append it to some new gigantic array (instead of a bunch of small arrays, which require a lot of heap allocations), but even then, it's not as efficient as possible, because there's O(n) extra memory involved -- which defeats the whole purpose of working on small chunks at a time with no heap allocations.
> (If you're going to do that, after all, you might as well read the entire thing into a giant string at the beginning, and work with an array anyway, discarding the whole idea of a range while doing your tokenization.)
>
>
> Any ideas on how to solve this problem?

I think I get what you are saying here -- if you are processing, say, an XML file, and you want to split that into tokens, you have to dup each token from the stream, because the buffer may be reused.

But doing the same thing for a string would be wasteful.

I think in these cases, we need two types of parsing.  One is process the stream as it's read into a temporary buffer.  If you need data from the temporary buffer beyond the scope of the processing loop, you need to dup it.

Other way is read the entire file/stream into a buffer, then process that buffer with the knowledge that it's never going to change.

We probably can have buffer identify which situation it's in, so the code can make a runtime decision on whether to dup or not.

-Steve
May 18, 2012
2012/5/18 Artur Skawina <art.08.09@gmail.com>:
> On 05/18/12 06:19, kenji hara wrote:
>> I think range interface is not useful for *efficient* IO. The expected IO interface will be more *abstract* than range primitives.
>>
>> ---
>> If you use range I/F to read bytes from device, we will always do blocking IO - even if the device is socket. It is not efficient.
>>
>> auto sock = new TcpSocketDevice();
>> if (sock.empty) { auto e = sock.front; }
>>   // In empty primitive, we *must* wait the socket gets one or more
>> bytes or really disconnected.
>
> No. 'empty' has to return true only _after_ seeing EOF.
>
> Something like 'available' can return the number of elements known to be fetchable w/o blocking. [1]
>
>>   // If not, what exactly returns sock.front?
>
> EWOULDBLOCK :^)
>
> But, yes, it needs to block, as there's no generic way to return
> EAGAIN/EWOULDBLOCK. This is where the primitive returning a slice
> comes in - that one /can/ return an empty slice.
> So '!r.empty && r.fronts.length==0)' is the equivalent to EAGAIN.
> (and note i'm oversimplifying -- 'fronts' can return something that
> /acts/ as a slice; which is what i'm in fact are doing)

OK. If reading bytes from underlying device failed, your 'fronts' can
return empty slice. I understood.
But, It is still *not efficient*. The returned slice will specifies a
buffer controlled by underlying device. If you want to gather bytes
into one chunk, you must copy bytes from returned slice to your chunk.
We should reduce copying memories as much as possible.

And, 'put' primitive in output range concept doesn't support non-blocikng write.
'put' should consume *all* of given data and write it  to underlying
device, then it would block.

Therefore, whole of range concept doesn't cover non-blocking I/O.

>>   // Then using range interface for socket reading enforces blocking IO. It is *really* inefficient.
>
>> I think IO primitives must be distinct from range ones for the reasons mentioned above...
>>
>> I'm designing experimental IO primitives: https://github.com/9rnsr/dio
>>
>> I call the input stream "source", and call output stream "sink".
>> "source" has a 'pull' primitive, and sink has 'push' primitive, and
>> they can avoid blocking.
>> If you want to construct input range interface from "source", you
>> should use 'ranged' helper function in io.core module. 'ranged'
>> returns a wrapper object, and in its front method, It reads bytes from
>> "source", and if the read bytes not sufficient, blocks the input.
>>
>> In other words, range is not almighty. We should think distinct primitives for the IO.
>
> Well, your 'pull' and 'push' are just different names for my 'fronts' and 'puts' (modulo the data transfer interface, which can be done both ways using a set of overloads, hence it doesn't matter).
>
> I don't see any reason to invent yet another abstraction, when ranges can be made to work with some improvements.

For efficiency and removing bottlenecks.
Even today, I / O is the slowest operation in the entire program.
Providing good primitives for I/O is enough value.

I have designed the 'pull' and 'push' primitives with two concepts:
1. Reduce copying memories as far as possible.
2. Control buffer memory under programer side, not device side.

> Ranges are just a convention; not a perfect one, but having /one/, not
> two or thirteen, is valuable. If you think ranges are flawed the
> discussion should be about ripping out every trace of them from the
> language and libraries and replacing them with something better. If
> you think that would be bad - well, having tens of different incompatible
> abstractions isn't good either. (and, yes, you can provide glue so that
> they can interact, but that does not scale well)

Range concept is good abstraction if underlying container controlls ownership. But, in I/O we want to *move* ownership of bytes. Range is not designed efficiently for the purpose, IMO.

> Hmm, how are 'flush()' and 'commit()' supposed to work? Is data lost
> if you omit one or both of them?

In my io library, BufferedSink requires three primitives, flush, commit, and writable.

> artur
>
> [1] Reminds me:
>
>   struct S(T) {
>      shared T a;
>      @property size_t available()() { return a; }
>   }
>
> The compiler infers length as 'pure', which, depending on the
> definition of 'shared' is wrong. ('shared' /shouldn't/ imply 'volatile',
> but, as it is now, it does - so omitting a call to 'available' would
> be wrong)
>
May 18, 2012
On Fri, 18 May 2012 07:05:50 -0400, Artur Skawina <art.08.09@gmail.com> wrote:

> On 05/18/12 06:19, kenji hara wrote:
>> I think range interface is not useful for *efficient* IO. The expected
>> IO interface will be more *abstract* than range primitives.
>>
>> ---
>> If you use range I/F to read bytes from device, we will always do
>> blocking IO - even if the device is socket. It is not efficient.
>>
>> auto sock = new TcpSocketDevice();
>> if (sock.empty) { auto e = sock.front; }
>>   // In empty primitive, we *must* wait the socket gets one or more
>> bytes or really disconnected.
>
> No. 'empty' has to return true only _after_ seeing EOF.
>
> Something like 'available' can return the number of elements known
> to be fetchable w/o blocking. [1]
>
>>   // If not, what exactly returns sock.front?
>
> EWOULDBLOCK :^)
>
> But, yes, it needs to block, as there's no generic way to return
> EAGAIN/EWOULDBLOCK. This is where the primitive returning a slice
> comes in - that one /can/ return an empty slice.
> So '!r.empty && r.fronts.length==0)' is the equivalent to EAGAIN.
> (and note i'm oversimplifying -- 'fronts' can return something that
> /acts/ as a slice; which is what i'm in fact are doing)

I think this is an example of what Kenji and I are talking about -- trying to make the range interface map to *all* I/O situations.

> I don't see any reason to invent yet another abstraction, when ranges
> can be made to work with some improvements.
>
> Ranges are just a convention; not a perfect one, but having /one/, not
> two or thirteen, is valuable. If you think ranges are flawed the
> discussion should be about ripping out every trace of them from the
> language and libraries and replacing them with something better. If
> you think that would be bad - well, having tens of different incompatible
> abstractions isn't good either. (and, yes, you can provide glue so that
> they can interact, but that does not scale well)

My opinion is that ranges should be available for i/o when you need to hook them to some other range processing code, but they shouldn't be the preferred interface for all I/O.

-Steve
May 18, 2012
2012/5/18 Steven Schveighoffer <schveiguy@yahoo.com>:
> On Fri, 18 May 2012 00:19:45 -0400, kenji hara <k.hara.pg@gmail.com> wrote:
>
>> I think range interface is not useful for *efficient* IO. The expected IO interface will be more *abstract* than range primitives.
>
>
> If all you are doing is consuming data and processing it, range interface is efficient.  Most streaming implementations that are synchronous use:
>
> 1. read block of data from low-level source into buffer
> 2. process buffer
> 3. If still data left, go to step 1.
>
> 1 is done via popFront, 2 is done via front.
>
> 3 is somewhat available via empty, but empty kind of depends on reading data.  I think it can work.
>
> It's not the ideal interface for all aspects of i/o, but it does map to ranges, and for single purpose tasks (such as parse an XML file), it will be most efficient.

Almost agree. When we want to do I/O, that is synchronous or asynchronous.
Only a few people would use non-blocking interface.
But for the library implementation, non-blocking interface is still important.
I think the non-blocking interface should be designed to avoid copying
as far as possible, and to achieve it with range interface is
impossible in general.

>> ---
>> If you use range I/F to read bytes from device, we will always do blocking IO - even if the device is socket. It is not efficient.
>>
>> auto sock = new TcpSocketDevice();
>> if (sock.empty) { auto e = sock.front; }
>>  // In empty primitive, we *must* wait the socket gets one or more
>> bytes or really disconnected.
>>  // If not, what exactly returns sock.front?
>>  // Then using range interface for socket reading enforces blocking
>> IO. It is *really* inefficient.
>> ---
>
>
> sockets do not have to be blocking, and I/O does not have to use the range portion of the interface.
>
> And efficient I/O has little to do with synchronicity and more to do with reading a large amount of data at a time instead of byte by byte.
>
> Using multi-threads or fibers, and using OS primitives such as select or poll can make I/O quite efficient and allow you to do other things while no I/O is happening.  These will not happen with range interface, but will be available through other interfaces.

I have talked about *good I/O primitives for library implementation*. I think range interface is one of the most useful concept for end users, but not good one for people who want to implement efficient libraries.

>> I think IO primitives must be distinct from range ones for the reasons mentioned above...
>
>
> Yes, I agree.  But ranges can be *mapped* to stream primitives.

No, we cannot map output range concept to non-blocking output. 'put' operation always requires blocking.

>> I'm designing experimental IO primitives: https://github.com/9rnsr/dio
>
>
> I'll take a look.

Thanks.

>>
>> In other words, range is not almighty. We should think distinct primitives for the IO.
>
>
> 100% agree.  The main thing I realized that brought me to propose the "range-based" (if you can call it that) version is that:
>
> 1. Ranges can be readily mapped to stream primitives *if* you use the
> concept of a range of T[] vs. a range of T.  So in essence, without changing
> anything I can slap on a range interface for free.
> 2. Arrays make very efficient data sources, and are easy to create.  We need
> a way to hook stream-using code onto an array.
>
> But be clear, I am *not* going to remove the existing stream I/O primitives I had for buffered i/o, I'm rather *adding* range primitives as well.

My policy is very similar. But, as described above, I think range
cannot cover non-blocing IO.
And I think non-blocking IO interface is important for library implementations.

Then I had taken a design that provides IO specific primitives. Additionally I have added primitives to control underlying buffers explicitly, because it is useful for some  byte processing - e.g. encoding, taking a string with slicing the buffer, and so on.

Kenji Hara
May 18, 2012
On 05/18/12 15:51, kenji hara wrote:
> 2012/5/18 Artur Skawina <art.08.09@gmail.com>:
>> On 05/18/12 06:19, kenji hara wrote:
>>> I think range interface is not useful for *efficient* IO. The expected IO interface will be more *abstract* than range primitives.
>>>
>>> ---
>>> If you use range I/F to read bytes from device, we will always do blocking IO - even if the device is socket. It is not efficient.
>>>
>>> auto sock = new TcpSocketDevice();
>>> if (sock.empty) { auto e = sock.front; }
>>>   // In empty primitive, we *must* wait the socket gets one or more
>>> bytes or really disconnected.
>>
>> No. 'empty' has to return true only _after_ seeing EOF.
>>
>> Something like 'available' can return the number of elements known to be fetchable w/o blocking. [1]
>>
>>>   // If not, what exactly returns sock.front?
>>
>> EWOULDBLOCK :^)
>>
>> But, yes, it needs to block, as there's no generic way to return
>> EAGAIN/EWOULDBLOCK. This is where the primitive returning a slice
>> comes in - that one /can/ return an empty slice.
>> So '!r.empty && r.fronts.length==0)' is the equivalent to EAGAIN.
>> (and note i'm oversimplifying -- 'fronts' can return something that
>> /acts/ as a slice; which is what i'm in fact are doing)
> 
> OK. If reading bytes from underlying device failed, your 'fronts' can
> return empty slice. I understood.
> But, It is still *not efficient*. The returned slice will specifies a
> buffer controlled by underlying device. If you want to gather bytes
> into one chunk, you must copy bytes from returned slice to your chunk.
> We should reduce copying memories as much as possible.

Depends if your input range supports zero-copy or not. IOW you avoid
the copy iff the range can somehow write the data directly to the caller
provided buffer. This can be true eg for file reads, where you can tell
the read(2) syscall to write into the user buffer. But what if you need to
buffer the stream? An intermediate buffer can become necessary anyway.
But, as i said before, i agree that a caller-provided-buffer-interface
is useful.

   E[] fronts();
   void fronts(ref E[]);

And one can be implemented in terms of the other, ie:

  E[] fronts[] { E[] els; fronts(els); return els; }
  void fronts(ref E[] e) { e[] = fronts()[]; }

depending on which is more efficient. A range can provide

  enum bool HasBuffer = 0 || 1;

so that the user can pick the more suited alternative.

> And, 'put' primitive in output range concept doesn't support non-blocikng write.
> 'put' should consume *all* of given data and write it  to underlying
> device, then it would block.

True, a write-as-much-as-possible-but not-more primitive is needed.

   size_t puts(E[], size_t atleast=size_t.max);

or something like that. (Doing it this way allows for explicit
non-blocking 'puts', ie '(written=puts(els, 0))==0' means EAGAIN.)

> Therefore, whole of range concept doesn't cover non-blocking I/O.

See above.

>>>   // Then using range interface for socket reading enforces blocking
>>> IO. It is *really* inefficient.
>>
>>> I think IO primitives must be distinct from range ones for the reasons mentioned above...
>>>
>>> I'm designing experimental IO primitives: https://github.com/9rnsr/dio
>>>
>>> I call the input stream "source", and call output stream "sink".
>>> "source" has a 'pull' primitive, and sink has 'push' primitive, and
>>> they can avoid blocking.
>>> If you want to construct input range interface from "source", you
>>> should use 'ranged' helper function in io.core module. 'ranged'
>>> returns a wrapper object, and in its front method, It reads bytes from
>>> "source", and if the read bytes not sufficient, blocks the input.
>>>
>>> In other words, range is not almighty. We should think distinct primitives for the IO.
>>
>> Well, your 'pull' and 'push' are just different names for my 'fronts' and 'puts' (modulo the data transfer interface, which can be done both ways using a set of overloads, hence it doesn't matter).
>>
>> I don't see any reason to invent yet another abstraction, when ranges can be made to work with some improvements.
> 
> For efficiency and removing bottlenecks.
> Even today, I / O is the slowest operation in the entire program.
> Providing good primitives for I/O is enough value.
> 
> I have designed the 'pull' and 'push' primitives with two concepts:
> 1. Reduce copying memories as far as possible.
> 2. Control buffer memory under programer side, not device side.

Do you have a contained microbenchmark? It would be easy to compare both approaches... If you do i'll write one using my scheme - so far i only did this for inter-thread communication, there's no file based backend.

>> Ranges are just a convention; not a perfect one, but having /one/, not
>> two or thirteen, is valuable. If you think ranges are flawed the
>> discussion should be about ripping out every trace of them from the
>> language and libraries and replacing them with something better. If
>> you think that would be bad - well, having tens of different incompatible
>> abstractions isn't good either. (and, yes, you can provide glue so that
>> they can interact, but that does not scale well)
> 
> Range concept is good abstraction if underlying container controlls ownership. But, in I/O we want to *move* ownership of bytes. Range is not designed efficiently for the purpose, IMO.
> 
>> Hmm, how are 'flush()' and 'commit()' supposed to work? Is data lost
>> if you omit one or both of them?
> 
> In my io library, BufferedSink requires three primitives, flush, commit, and writable.

But what happens if neither flush nor commit is called?

>> [1] Reminds me:
>>
>>   struct S(T) {
>>      shared T a;
>>      @property size_t available()() { return a; }
>>   }
>>
>> The compiler infers length as 'pure', which, depending on the
                       ^^^^^^
s/length/available/'.

>> definition of 'shared' is wrong. ('shared' /shouldn't/ imply 'volatile',
>> but, as it is now, it does - so omitting a call to 'available' would
>> be wrong)

artur
May 18, 2012
On Fri, 18 May 2012 10:39:55 -0400, kenji hara <k.hara.pg@gmail.com> wrote:

> 2012/5/18 Steven Schveighoffer <schveiguy@yahoo.com>:
>> On Fri, 18 May 2012 00:19:45 -0400, kenji hara <k.hara.pg@gmail.com> wrote:
>>
>>> I think range interface is not useful for *efficient* IO. The expected
>>> IO interface will be more *abstract* than range primitives.
>>
>>
>> If all you are doing is consuming data and processing it, range interface is
>> efficient.  Most streaming implementations that are synchronous use:
>>
>> 1. read block of data from low-level source into buffer
>> 2. process buffer
>> 3. If still data left, go to step 1.
>>
>> 1 is done via popFront, 2 is done via front.
>>
>> 3 is somewhat available via empty, but empty kind of depends on reading
>> data.  I think it can work.
>>
>> It's not the ideal interface for all aspects of i/o, but it does map to
>> ranges, and for single purpose tasks (such as parse an XML file), it will be
>> most efficient.
>
> Almost agree. When we want to do I/O, that is synchronous or asynchronous.
> Only a few people would use non-blocking interface.
> But for the library implementation, non-blocking interface is still important.
> I think the non-blocking interface should be designed to avoid copying
> as far as possible, and to achieve it with range interface is
> impossible in general.

On non-blocking i/o, why not just not support range interface at all?  I don't have any problem with that.  In other words, if your input source is non-blocking, and you try to use range primitives, it simply won't work.

I admit, all of my code so far is focused on blocking i/o.  I have some experience with non-blocking i/o, but it was to make a blocking interface that supported waiting for data with a timeout.  Making a cross-platform (i.e. both windows and Posix) non-blocking interface is difficult because you use very different mechanisms on both OSes.

And a lot of times, you don't want non-blocking i/o, but rather parallel i/o.

>>> ---
>>> If you use range I/F to read bytes from device, we will always do
>>> blocking IO - even if the device is socket. It is not efficient.
>>>
>>> auto sock = new TcpSocketDevice();
>>> if (sock.empty) { auto e = sock.front; }
>>>  // In empty primitive, we *must* wait the socket gets one or more
>>> bytes or really disconnected.
>>>  // If not, what exactly returns sock.front?
>>>  // Then using range interface for socket reading enforces blocking
>>> IO. It is *really* inefficient.
>>> ---
>>
>>
>> sockets do not have to be blocking, and I/O does not have to use the range
>> portion of the interface.
>>
>> And efficient I/O has little to do with synchronicity and more to do with
>> reading a large amount of data at a time instead of byte by byte.
>>
>> Using multi-threads or fibers, and using OS primitives such as select or
>> poll can make I/O quite efficient and allow you to do other things while no
>> I/O is happening.  These will not happen with range interface, but will be
>> available through other interfaces.
>
> I have talked about *good I/O primitives for library implementation*.
> I think range interface is one of the most useful concept for end
> users, but not good one for people who want to implement efficient
> libraries.

OK, I think we agree.  I am concerned about writing good library types that can efficiently use I/O.  The range interface will be for people who use the library and want to utilize existing range primitives for whatever purpose.

>
>>> I think IO primitives must be distinct from range ones for the reasons
>>> mentioned above...
>>
>>
>> Yes, I agree.  But ranges can be *mapped* to stream primitives.
>
> No, we cannot map output range concept to non-blocking output. 'put'
> operation always requires blocking.

Yes, but again, put can use whatever stream primitives we have.

In other words, it's quite possible to write range primitives which utilize stream primitivies.  It's impossible to write good stream primitives which utilize range primitives.

>
>>> I'm designing experimental IO primitives:
>>> https://github.com/9rnsr/dio
>>
>>
>> I'll take a look.
>
> Thanks.

I'm having trouble following the code, is there a place with the generated docs?   I'm looking for an overview to understand where to look.

Your lib is quite extensive, mine is only one file ;)

>
>>>
>>> In other words, range is not almighty. We should think distinct
>>> primitives for the IO.
>>
>>
>> 100% agree.  The main thing I realized that brought me to propose the
>> "range-based" (if you can call it that) version is that:
>>
>> 1. Ranges can be readily mapped to stream primitives *if* you use the
>> concept of a range of T[] vs. a range of T.  So in essence, without changing
>> anything I can slap on a range interface for free.
>> 2. Arrays make very efficient data sources, and are easy to create.  We need
>> a way to hook stream-using code onto an array.
>>
>> But be clear, I am *not* going to remove the existing stream I/O primitives
>> I had for buffered i/o, I'm rather *adding* range primitives as well.
>
> My policy is very similar. But, as described above, I think range
> cannot cover non-blocing IO.
> And I think non-blocking IO interface is important for library implementations.

I think you misunderstand, I'm not trying to make ranges be the base of i/o, I'm trying to expose a range interface *based on* stream i/o interface.

-Steve
May 18, 2012
On 5/18/12 2:52 AM, Mehrdad wrote:
> On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer wrote:
>> 2. I realized, buffering input stream of type T is actually an input
>> range of type T[].
>
> The trouble is, why a slice? Why not an std.array.Array? Why not some
> other data source?
> (Check/egg problem....)

Because T[] is the fundamental representation of a typed contiguous area of storage.

> Say you're tokenizing some input range, and it happens to just be a
> huge, gigantic string.
>
> It *should* be possible to turn it into tokens with slices referring to
> the ORIGINAL string, which is VERY efficient because it doesn't require
> *any* heap allocations whatsoever. (You just tokenize with opApply() as
> you go, without every requiring a heap allocation...)
>
> However, this is *only* possible if you don't use the concept of an
> input range!

But e.g. splitter() does exactly as you say. It's a range and does not use memory allocation.


Andrei
May 18, 2012
On Friday, 18 May 2012 at 13:44:43 UTC, Steven Schveighoffer wrote:
> On Fri, 18 May 2012 03:52:51 -0400, Mehrdad <wfunction@hotmail.com> wrote:
>
>> On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer wrote:
>>> 2. I realized, buffering input stream of type T is actually an input range of type T[].
>>
>> The trouble is, why a slice? Why not an std.array.Array? Why not some other data source?
>> (Check/egg problem....)
>
> Well, because that's what i/o buffers are :)  There isn't an OS primitive that reads a file descriptor into an e.g. linked list.


I beg to differ..

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365469.aspx
May 18, 2012
2012/5/19 Artur Skawina <art.08.09@gmail.com>:
> On 05/18/12 15:51, kenji hara wrote:
>> OK. If reading bytes from underlying device failed, your 'fronts' can
>> return empty slice. I understood.
>> But, It is still *not efficient*. The returned slice will specifies a
>> buffer controlled by underlying device. If you want to gather bytes
>> into one chunk, you must copy bytes from returned slice to your chunk.
>> We should reduce copying memories as much as possible.
>
> Depends if your input range supports zero-copy or not. IOW you avoid
> the copy iff the range can somehow write the data directly to the caller
> provided buffer. This can be true eg for file reads, where you can tell
> the read(2) syscall to write into the user buffer. But what if you need to
> buffer the stream? An intermediate buffer can become necessary anyway.
> But, as i said before, i agree that a caller-provided-buffer-interface
> is useful.
>
>   E[] fronts();
>   void fronts(ref E[]);
>
> And one can be implemented in terms of the other, ie:
>
>  E[] fronts[] { E[] els; fronts(els); return els; }
>  void fronts(ref E[] e) { e[] = fronts()[]; }

The flaw of your design is, the memory to store read bytes/elements is
allocated by the lower layer.
E.g. If you want to construct linked list of some some elements, you
must copy elements from returned slice to new allocated node. I think
it is still inefficient.

> depending on which is more efficient. A range can provide
>
>  enum bool HasBuffer = 0 || 1;
>
> so that the user can pick the more suited alternative.

I think fewer primitives as possible is better design than adding
extra/optional primitives.
How many primitives in your interface design?

>> And, 'put' primitive in output range concept doesn't support non-blocikng write.
>> 'put' should consume *all* of given data and write it  to underlying
>> device, then it would block.
>
> True, a write-as-much-as-possible-but not-more primitive is needed.
>
>   size_t puts(E[], size_t atleast=size_t.max);
>
> or something like that. (Doing it this way allows for explicit
> non-blocking 'puts', ie '(written=puts(els, 0))==0' means EAGAIN.)
>
>> Therefore, whole of range concept doesn't cover non-blocking I/O.

I can agree for the signatures. but the names 'fronts' and 'puts' are a little too similar.


>>>> I'm designing experimental IO primitives: https://github.com/9rnsr/dio
>>>>
>> I have designed the 'pull' and 'push' primitives with two concepts:
>> 1. Reduce copying memories as far as possible.
>> 2. Control buffer memory under programer side, not device side.
>
> Do you have a contained microbenchmark? It would be easy to compare both approaches... If you do i'll write one using my scheme - so far i only did this for inter-thread communication, there's no file based backend.

It has a sample benchmark to compare performance with std.stdio for
line iteration.
In my PC, it is 2x faster in maximum.

>> In my io library, BufferedSink requires three primitives, flush, commit, and writable.
>
> But what happens if neither flush nor commit is called?

If you forget to call 'commit', 0 length data will be written.
And if you forget to call 'flush', the committed data won't be written
to actual device.

Kenji Hara
May 18, 2012
On Fri, 18 May 2012 11:40:24 -0400, Mehrdad <wfunction@hotmail.com> wrote:

> On Friday, 18 May 2012 at 13:44:43 UTC, Steven Schveighoffer wrote:
>> On Fri, 18 May 2012 03:52:51 -0400, Mehrdad <wfunction@hotmail.com> wrote:
>>
>>> On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer wrote:
>>>> 2. I realized, buffering input stream of type T is actually an input range of type T[].
>>>
>>> The trouble is, why a slice? Why not an std.array.Array? Why not some other data source?
>>> (Check/egg problem....)
>>
>> Well, because that's what i/o buffers are :)  There isn't an OS primitive that reads a file descriptor into an e.g. linked list.
>
>
> I beg to differ..
>
> http://msdn.microsoft.com/en-us/library/windows/desktop/aa365469.aspx

It still reads into an array of buffers, which are slices.  And the resulting "range" looks *exactly* like a range of T[].

-Steve