May 16, 2012
On 16/05/2012 16:59, Walter Bright wrote:
> On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
>> On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2@digitalmars.com>
>> wrote:
>>
>>> On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
>>>> I do agree for e.g. with binary data some data can't be read with ranges (when
>>>> you need to read small chunks of varying size),
>>>
>>> I don't see why that should be true.
>>
>> How do you tell front and popFront how many bytes to read?
>
> std.byLine() does it.

And is what you want to do with a text file in many cases.

> In general, you can read n bytes by calling empty, front, and popFront n times.

Why would anybody want to read a large binary file _one byte at a time_?

Stewart.
May 16, 2012
On Wed, May 16, 2012 at 05:41:49PM +0100, Stewart Gordon wrote:
> On 16/05/2012 16:59, Walter Bright wrote:
> >On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
> >>On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2@digitalmars.com> wrote:
> >>
> >>>On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
> >>>>I do agree for e.g. with binary data some data can't be read with ranges (when you need to read small chunks of varying size),
> >>>
> >>>I don't see why that should be true.
> >>
> >>How do you tell front and popFront how many bytes to read?
> >
> >std.byLine() does it.
> 
> And is what you want to do with a text file in many cases.
> 
> >In general, you can read n bytes by calling empty, front, and popFront n times.
> 
> Why would anybody want to read a large binary file _one byte at a time_?
[...]

import std.range;
byte[] readNBytes(R)(R range, size_t n)
	if (isInputRange!R && hasSlicing!R)
{
	return R[0..n];
}


T

-- 
MAS = Mana Ada Sistem?
May 16, 2012
On Wed, 16 May 2012 11:59:37 -0400, Walter Bright <newshound2@digitalmars.com> wrote:

> On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
>> On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2@digitalmars.com>
>> wrote:
>>
>>> On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
>>>> I do agree for e.g. with binary data some data can't be read with ranges (when
>>>> you need to read small chunks of varying size),
>>>
>>> I don't see why that should be true.
>>
>> How do you tell front and popFront how many bytes to read?
>
> std.byLine() does it.

Have you looked at how std.byLine works?  It certainly does not use a range interface as a source.

> In general, you can read n bytes by calling empty, front, and popFront n times.

I hope you are not serious!  This will make D *the worst performing* i/o language.

This should be evidence enough:

steves@steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1 count=1000000
1000000+0 records in
1000000+0 records out
1000000 bytes (1.0 MB) copied, 0.74052 s, 1.4 MB/s

real	0m0.744s
user	0m0.176s
sys	0m0.564s
steves@steves-laptop:~$ time dd if=/dev/zero of=/dev/null bs=1000 count=1000
1000+0 records in
1000+0 records out
1000000 bytes (1.0 MB) copied, 0.00194096 s, 515 MB/s

real	0m0.006s
user	0m0.000s
sys	0m0.004s

-Steve
May 16, 2012
On 5/16/2012 9:41 AM, Stewart Gordon wrote:
> On 16/05/2012 16:59, Walter Bright wrote:
>> On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
>>> On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2@digitalmars.com>
>>> wrote:
>>>
>>>> On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
>>>>> I do agree for e.g. with binary data some data can't be read with ranges (when
>>>>> you need to read small chunks of varying size),
>>>>
>>>> I don't see why that should be true.
>>>
>>> How do you tell front and popFront how many bytes to read?
>>
>> std.byLine() does it.
>
> And is what you want to do with a text file in many cases.
>
>> In general, you can read n bytes by calling empty, front, and popFront n times.
>
> Why would anybody want to read a large binary file _one byte at a time_?

You can have that range read from byChunk(). It's really the same thing that C's stdio does.

May 16, 2012
On 5/16/2012 10:18 AM, Steven Schveighoffer wrote:
> On Wed, 16 May 2012 11:59:37 -0400, Walter Bright <newshound2@digitalmars.com>
> wrote:
>
>> On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
>>> On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2@digitalmars.com>
>>> wrote:
>>>
>>>> On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
>>>>> I do agree for e.g. with binary data some data can't be read with ranges (when
>>>>> you need to read small chunks of varying size),
>>>>
>>>> I don't see why that should be true.
>>>
>>> How do you tell front and popFront how many bytes to read?
>>
>> std.byLine() does it.
>
> Have you looked at how std.byLine works? It certainly does not use a range
> interface as a source.

It presents a range interface, though. Not a streaming one.

>
>> In general, you can read n bytes by calling empty, front, and popFront n times.
>
> I hope you are not serious! This will make D *the worst performing* i/o language.

You can read arbitrary numbers of bytes by tacking a range on after byChunk().
May 16, 2012
On Wed, 16 May 2012 13:21:37 -0400, Walter Bright <newshound2@digitalmars.com> wrote:

> On 5/16/2012 9:41 AM, Stewart Gordon wrote:
>> On 16/05/2012 16:59, Walter Bright wrote:
>>> On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
>>>> On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2@digitalmars.com>
>>>> wrote:
>>>>
>>>>> On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
>>>>>> I do agree for e.g. with binary data some data can't be read with ranges (when
>>>>>> you need to read small chunks of varying size),
>>>>>
>>>>> I don't see why that should be true.
>>>>
>>>> How do you tell front and popFront how many bytes to read?
>>>
>>> std.byLine() does it.
>>
>> And is what you want to do with a text file in many cases.
>>
>>> In general, you can read n bytes by calling empty, front, and popFront n times.
>>
>> Why would anybody want to read a large binary file _one byte at a time_?
>
> You can have that range read from byChunk(). It's really the same thing that C's stdio does.

This is very wrong.  byChunk doesn't cut it.  The number of bytes to consume from the stream can depend on any number of factors, including the actual data in the stream.  For instance, I challenge you to write an efficient (meaning no extra buffering) byLine using byChunk as a base.

-Steve
May 16, 2012
On Wed, 16 May 2012 13:23:07 -0400, Walter Bright <newshound2@digitalmars.com> wrote:

> On 5/16/2012 10:18 AM, Steven Schveighoffer wrote:
>> On Wed, 16 May 2012 11:59:37 -0400, Walter Bright <newshound2@digitalmars.com>
>> wrote:
>>
>>> On 5/16/2012 7:38 AM, Steven Schveighoffer wrote:
>>>> On Wed, 16 May 2012 09:50:12 -0400, Walter Bright <newshound2@digitalmars.com>
>>>> wrote:
>>>>
>>>>> On 5/15/2012 3:34 PM, Nathan M. Swan wrote:
>>>>>> I do agree for e.g. with binary data some data can't be read with ranges (when
>>>>>> you need to read small chunks of varying size),
>>>>>
>>>>> I don't see why that should be true.
>>>>
>>>> How do you tell front and popFront how many bytes to read?
>>>
>>> std.byLine() does it.
>>
>> Have you looked at how std.byLine works? It certainly does not use a range
>> interface as a source.
>
> It presents a range interface, though. Not a streaming one.

But that is *the point*!  The code deciding how much data to read (i.e. the entity I referenced above that 'tells front and popFront how many bytes to read') is *not* using a range interface.  In other words, ranges aren't enough.

Ranges can be built on top of streaming interfaces.  But there is *still* a need for a comprehensive streaming toolkit.  And C's streaming toolkit is not as good as a native D toolkit can be.

>>
>>> In general, you can read n bytes by calling empty, front, and popFront n times.
>>
>> I hope you are not serious! This will make D *the worst performing* i/o language.
>
> You can read arbitrary numbers of bytes by tacking a range on after byChunk().

No, this doesn't work in most cases.  See my other post.  You can't get everything you want out of just byChunk and byLine.

what about byMySpecificPacketProtocol?

-Steve
May 16, 2012
On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
> In other words, ranges aren't enough.

This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges.

Andrei
May 16, 2012
tbh, I've found byChunk to be less than worthless
in my experience; it's a liability because I still
have to wrap it somehow to real real world files.

Consider reading a series of strings in the format
<length><data>,[...].

I'd like it to be this simple (neglecting priming the loop):

string[] s;
while(!file.eof)) {
    ubyte length = file.read!ubyte;
    s ~= file.read!string(length);
}


The C fgetc/fread interface can do this reasonably
well.

string[] s;
while(!feof(fp)) {
   ubyte length = fgetc(fp);
   char[] buffer;
   buffer.length = length;
   fread(buffer.ptr, 1, length, fp);
   s ~= assumeUnique(buffer);
}


But, doing it with byChunk is an exercise in pain
that I don't even feel like writing here.




Another problem is consider a network interface. You
want to handle the packets as they come in.

byChunk doesn't work at all because it blocks until it
gets the chunk of the requested size.

foreach(chunk; socket.byChunk(1024))


suppose you get a packet of length 1000 and you have
to answer it. That will block forever.

So, if you use byChunk as the underlying thing to fill
your buffer... you don't get anywhere.


I think a better input primitive is byPacket(max_size).
This works more like the read primitive on the operating
system.

Moreover, I want it to buffer, and control how much is consumed.


auto packetSource = socket.byPacket(1024);
foreach(packet; packetSource) {
   // as soon as some data comes in we can get the length
   if(packet.length < 2) continue;
   auto length = packet.peek!(ushort); // neglect endian for now
   if(packet.length < length + 2) continue; // wait for more data

   packet.consume(2);
   handle(packet.consume(length));
}



In addition to the byChunk blocking problem...
what if the length straddles the edge?



byChunk is just a huge hassle to work with for every file
format I've tried so far. byLine is a little better
(some file formats are defined as being line based)
but still a bit of a pain for anything that can spill
into two lines.
May 16, 2012
On Wed, 16 May 2012 13:48:49 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 5/16/12 12:34 PM, Steven Schveighoffer wrote:
>> In other words, ranges aren't enough.
>
> This is copiously clear to me, but the way I like to think about it is by extending the notion of range (with notions such as e.g. BufferedRange, LookaheadRange, and such) instead of developing an abstraction independent from ranges and then working on stitching that with ranges.

What I think we would end up with is a streaming API with range primitives tacked on.

- empty is clunky, but possible to implement.  However, it may become invalid (think of reading a file that is being appended to by another process).
- popFront and front do not have any clear definition of what they refer to.  The only valid thing I can think of is bytes, and then nobody will use them.

That's hardly saying it's "range based".  I refuse to believe that people will be thrilled by having to 'pre-configure' each front and popFront call in order to get work done.  If you want to try and convince me, I'm willing to listen, but so far I haven't seen anything that looks at all appetizing.

-Steve