May 11, 2010
I think these could be actually byByte and byLine.

Regarding your initial question - generally the range defintion "belongs" to the file/container as it has intimate contact with it. You could find incidental commonality and exploit it, but conceptually they should appear as independent types.


Andrei

Lars Tandle Kyllingstad wrote:
> Well, that would at least mean less work for me. :)
> 
> Which I/O methods should it contain, then, in your opinion?  Would
> 
>         bool read(ref ubyte b);
>         size_t read(ref ubyte[] b);
>         void write(ubyte b);
>         void write(ubyte[] b);
> 
> suffice?
> 
> -Lars
> 
> 
> 
> On Mon, 2010-05-10 at 05:02 -0700, Steve Schveighoffer wrote:
>> Re: byLine and byChunk, I don't think these are a good idea on unbuffered files.
>>
>> For example, your current implementation will be extremely slow. Reading one char at a time is OK on a buffered file, because most times its just a simple fetch of a char from a buffer.  But your implementation reads a single character at a time from the actual file on disk, a very slow operation.
>>
>> I think unbuffered files are good for when you want to handle the buffering yourself, or when you want to pass them to child processes.
>>
>> -Steve
>>
>>
>>
>>
>> ______________________________________________________________________
>> From: Lars Tandle Kyllingstad <lars at kyllingen.net>
>> To: Phobos mailing list <phobos at puremagic.com>
>> Sent: Mon, May 10, 2010 7:40:15 AM
>> Subject: [phobos] UnbufferedFile, or, abstracting the File ranges
>>
>> In the process of designing std.process it has become obvious, as pointed out by Steve, that Phobos needs facilities for unbuffered I/O. To that end, I've started writing an UnbufferedFile type, the current status of which can be seen here:
>>
>>         Code:
>> http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
>>         Docs: http://kyllingen.net/code/ltk/doc/stdio.html
>>
>> (Disclaimer: This is very much a work-in-progress, there's lots of
>> stuff
>> that needs to be added yet, and I'd be surprised if there wasn't lots
>> of
>> room for improvement, performance-wise.)
>>
>>
>> Now, while writing this it has kind of annoyed me that I have to write
>> new implementations of the byLine and byChunk ranges.  I've personally
>> found them incredibly useful, so I want them in UnbufferedFile, but
>> the
>> ones in std.stdio are tailored for File.
>>
>> I therefore suggest we try to abstract these ranges, so they can
>> operate
>> on general types that define a set of primitives such as read(),
>> readc()
>> and readln().
>>
>> Are there problems with this?  Any comments?
>>
>> -Lars
>>
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
>>
>>
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
May 11, 2010
Do you mean byByte and byChunk?

-Lars



On Tue, 2010-05-11 at 08:16 -0700, Andrei Alexandrescu wrote:
> I think these could be actually byByte and byLine.
> 
> Regarding your initial question - generally the range defintion "belongs" to the file/container as it has intimate contact with it. You could find incidental commonality and exploit it, but conceptually they should appear as independent types.
> 
> 
> Andrei
> 
> Lars Tandle Kyllingstad wrote:
> > Well, that would at least mean less work for me. :)
> > 
> > Which I/O methods should it contain, then, in your opinion?  Would
> > 
> >         bool read(ref ubyte b);
> >         size_t read(ref ubyte[] b);
> >         void write(ubyte b);
> >         void write(ubyte[] b);
> > 
> > suffice?
> > 
> > -Lars
> > 
> > 
> > 
> > On Mon, 2010-05-10 at 05:02 -0700, Steve Schveighoffer wrote:
> >> Re: byLine and byChunk, I don't think these are a good idea on unbuffered files.
> >>
> >> For example, your current implementation will be extremely slow. Reading one char at a time is OK on a buffered file, because most times its just a simple fetch of a char from a buffer.  But your implementation reads a single character at a time from the actual file on disk, a very slow operation.
> >>
> >> I think unbuffered files are good for when you want to handle the buffering yourself, or when you want to pass them to child processes.
> >>
> >> -Steve
> >>
> >>
> >>
> >>
> >> ______________________________________________________________________
> >> From: Lars Tandle Kyllingstad <lars at kyllingen.net>
> >> To: Phobos mailing list <phobos at puremagic.com>
> >> Sent: Mon, May 10, 2010 7:40:15 AM
> >> Subject: [phobos] UnbufferedFile, or, abstracting the File ranges
> >>
> >> In the process of designing std.process it has become obvious, as pointed out by Steve, that Phobos needs facilities for unbuffered I/O. To that end, I've started writing an UnbufferedFile type, the current status of which can be seen here:
> >>
> >>         Code:
> >> http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
> >>         Docs: http://kyllingen.net/code/ltk/doc/stdio.html
> >>
> >> (Disclaimer: This is very much a work-in-progress, there's lots of
> >> stuff
> >> that needs to be added yet, and I'd be surprised if there wasn't lots
> >> of
> >> room for improvement, performance-wise.)
> >>
> >>
> >> Now, while writing this it has kind of annoyed me that I have to write
> >> new implementations of the byLine and byChunk ranges.  I've personally
> >> found them incredibly useful, so I want them in UnbufferedFile, but
> >> the
> >> ones in std.stdio are tailored for File.
> >>
> >> I therefore suggest we try to abstract these ranges, so they can
> >> operate
> >> on general types that define a set of primitives such as read(),
> >> readc()
> >> and readln().
> >>
> >> Are there problems with this?  Any comments?
> >>
> >> -Lars
> >>
> >> _______________________________________________
> >> phobos mailing list
> >> phobos at puremagic.com
> >> http://lists.puremagic.com/mailman/listinfo/phobos
> >>
> >>
> >> _______________________________________________
> >> phobos mailing list
> >> phobos at puremagic.com
> >> http://lists.puremagic.com/mailman/listinfo/phobos
> > 
> > 
> > _______________________________________________
> > phobos mailing list
> > phobos at puremagic.com
> > http://lists.puremagic.com/mailman/listinfo/phobos
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos


May 11, 2010
Yah, sorry. so byByte, byChar, and byChunk make sense for unbuffered input, but byLine is onerous.

Andrei

Lars Tandle Kyllingstad wrote:
> Do you mean byByte and byChunk?
> 
> -Lars
> 
> 
> 
> On Tue, 2010-05-11 at 08:16 -0700, Andrei Alexandrescu wrote:
>> I think these could be actually byByte and byLine.
>>
>> Regarding your initial question - generally the range defintion "belongs" to the file/container as it has intimate contact with it. You could find incidental commonality and exploit it, but conceptually they should appear as independent types.
>>
>>
>> Andrei
>>
>> Lars Tandle Kyllingstad wrote:
>>> Well, that would at least mean less work for me. :)
>>>
>>> Which I/O methods should it contain, then, in your opinion?  Would
>>>
>>>         bool read(ref ubyte b);
>>>         size_t read(ref ubyte[] b);
>>>         void write(ubyte b);
>>>         void write(ubyte[] b);
>>>
>>> suffice?
>>>
>>> -Lars
>>>
>>>
>>>
>>> On Mon, 2010-05-10 at 05:02 -0700, Steve Schveighoffer wrote:
>>>> Re: byLine and byChunk, I don't think these are a good idea on unbuffered files.
>>>>
>>>> For example, your current implementation will be extremely slow. Reading one char at a time is OK on a buffered file, because most times its just a simple fetch of a char from a buffer.  But your implementation reads a single character at a time from the actual file on disk, a very slow operation.
>>>>
>>>> I think unbuffered files are good for when you want to handle the buffering yourself, or when you want to pass them to child processes.
>>>>
>>>> -Steve
>>>>
>>>>
>>>>
>>>>
>>>> ______________________________________________________________________
>>>> From: Lars Tandle Kyllingstad <lars at kyllingen.net>
>>>> To: Phobos mailing list <phobos at puremagic.com>
>>>> Sent: Mon, May 10, 2010 7:40:15 AM
>>>> Subject: [phobos] UnbufferedFile, or, abstracting the File ranges
>>>>
>>>> In the process of designing std.process it has become obvious, as pointed out by Steve, that Phobos needs facilities for unbuffered I/O. To that end, I've started writing an UnbufferedFile type, the current status of which can be seen here:
>>>>
>>>>         Code:
>>>> http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
>>>>         Docs: http://kyllingen.net/code/ltk/doc/stdio.html
>>>>
>>>> (Disclaimer: This is very much a work-in-progress, there's lots of
>>>> stuff
>>>> that needs to be added yet, and I'd be surprised if there wasn't lots
>>>> of
>>>> room for improvement, performance-wise.)
>>>>
>>>>
>>>> Now, while writing this it has kind of annoyed me that I have to write
>>>> new implementations of the byLine and byChunk ranges.  I've personally
>>>> found them incredibly useful, so I want them in UnbufferedFile, but
>>>> the
>>>> ones in std.stdio are tailored for File.
>>>>
>>>> I therefore suggest we try to abstract these ranges, so they can
>>>> operate
>>>> on general types that define a set of primitives such as read(),
>>>> readc()
>>>> and readln().
>>>>
>>>> Are there problems with this?  Any comments?
>>>>
>>>> -Lars
>>>>
>>>> _______________________________________________
>>>> phobos mailing list
>>>> phobos at puremagic.com
>>>> http://lists.puremagic.com/mailman/listinfo/phobos
>>>>
>>>>
>>>> _______________________________________________
>>>> phobos mailing list
>>>> phobos at puremagic.com
>>>> http://lists.puremagic.com/mailman/listinfo/phobos
>>>
>>> _______________________________________________
>>> phobos mailing list
>>> phobos at puremagic.com
>>> http://lists.puremagic.com/mailman/listinfo/phobos
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
May 11, 2010
Yes, you are right.  byChunk is OK.  byByte is still not, calling a syscall for each byte is not very fast, no matter what the source of the input.

I would surmise that better than byByte is byVariableChunk (for lack of a better term) -- read as much as possible up to n bytes.  This should be good for tty inputs and the most efficient for not-always-available input.  byChunk I would imagine should block until it gets exactly the chunk size or EOF, whereas byVariableChunk would block only until it gets at least one byte.

You can always do byChunk with a chunk size of 1 if you insist on byte-by-byte reading :)  I just don't think it's worth the special case.

byChar is along the same lines as byLine, it requires parsing each byte.

-Steve



----- Original Message ----
> From: Andrei Alexandrescu <andrei at erdani.com>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Tue, May 11, 2010 11:14:55 AM
> Subject: Re: [phobos] UnbufferedFile, or, abstracting the File ranges
> 
> Steve Schveighoffer wrote:
> Re: byLine and byChunk, I don't think these
> are a good idea on unbuffered files.

byLine is onerous because it reads
> one character at a time, but byChunk makes a lot of sense. It essentially is buffering straight in user's buffers. A lot of Unix utilities do that.

> For example, your current implementation will be extremely slow.  Reading one char at a time is OK on a buffered file, because most times its just a simple fetch of a char from a buffer.  But your implementation reads a single character at a time from the actual file on disk, a very slow operation.
> 
> I think unbuffered files are good for when you want to handle the buffering yourself, or when you want to pass them to child processes.

Actually a great use case is character-level unbuffered
> files when you want to do something for each keypress off a tty.


Andrei
_______________________________________________
phobos
> mailing list

> href="mailto:phobos at puremagic.com">phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos



May 11, 2010
Steve Schveighoffer wrote:
> Yes, you are right.  byChunk is OK.  byByte is still not, calling a syscall for each byte is not very fast, no matter what the source of the input.

The point is that sometimes you need exactly that kind of control, the most trivial example being "press any key".


Andrei
May 11, 2010
Denis wrote:
> On Mon, May 10, 2010 at 4:24 PM, Steve Schveighoffer <schveiguy at yahoo.com> wrote:
>> I would define only 2 i/o functions:
>>
>> size_t read(void[] b);
>> size_t write(const(void)[] b);
>>
>> And then the various paraphernalia around it (close, open, etc).
>>
>> The reason to use void[] is because any array data type can be passed to it without casting (imagine you wanted to read an array of ints).
>>
>> Reading and writing a single byte should be discouraged with unbuffered streams.  This is how it is in most I/O libs.  You build your unbuffered I/O to abstract the OS functions, then build your buffered I/O and fancy functionality on top of it.
>>
>> -Steve
>>
> 
> I think it should be byte[], not void[]. First, byte[] aren't scanned for pointers by GC. Second, it hijacks type safety. I believe you need an explicit cast, even if you are sure about the type of data in the file (throw in an Endianness if you are still not convienced).

This is an interesting point. It essentially raises a question of what situations are best use cases for void[].


Andrei
May 11, 2010
Lars Tandle Kyllingstad wrote:
> Yeah, the ubyte functions were just examples.  My intention was to use templates:
> 
>   size_t read(T)(ref T[] b);
>   size_t write(T)(T[] b);
> 
> Then you get a sensible error message if the number of raw bytes read isn't a multiple of the size of your target type.  Also, the returned number is the length of the resulting array, and not the number of raw bytes read.
> 
> Thanks for the tips! :)

That would make the functions type-unsafe. You'd need to limit T to types that have no pointers (at least). Whoa, serialization rears its ugly head.

Andrei
May 11, 2010
Lars Tandle Kyllingstad wrote:
> On Mon, 2010-05-10 at 16:43 +0400, Denis wrote:
>> On Mon, May 10, 2010 at 3:40 PM, Lars Tandle Kyllingstad <lars at kyllingen.net> wrote:
>>> In the process of designing std.process it has become obvious, as pointed out by Steve, that Phobos needs facilities for unbuffered I/O. To that end, I've started writing an UnbufferedFile type, the current status of which can be seen here:
>>>
>>> [...]
>> A little bit of a bike-shed and an off-topic discussion, by I think it's time to drop the C-style "r", "b", "w+" archaism in favor of type safe enums. It might be better to also cover access rights - what access is available to the file while it is being used by our process (read, write, remove etc).
> 
> 
> Actually, I was planning to do just that -- the enum thing, that is. But I did a little investigation, and found that the file mode is specified with a string in Python too, and I thought:  If C and Python actually agrees about something, who am I to argue? :)

I had that question to solve when working on std.stdio.File and took the path of least resistance. Now that we want to make File work with more general files, perhaps we could complement the string-style flags with an enum.

Andrei

May 11, 2010



----- Original Message ----
> From: Andrei Alexandrescu <andrei at erdani.com>
> Denis wrote:
> > I think
> > it should be byte[], not void[]. First, byte[] aren't scanned
> > for
> > pointers by GC. Second, it hijacks type safety. I believe you need
> > an
> > explicit cast, even if you are sure about the type of data in the
> > file
> > (throw in an Endianness if you are still not convienced).
> 
> This is an interesting point. It essentially raises a question of what situations are best use cases for void[].

It has been debated forever.

The compiler allows implicit casting to void[], it does not allow implicit casting to byte[] or ubyte[].  So whenever a function's job is just to fill in data from an untyped source (such as a stream) or write data to an untyped sink, void[] is the preferred option.

Whenever you want to *own* an untyped buffer, for example a buffer for buffered I/O, ubyte[] is preferred because of the GC noscan flag associated with it.  However, you can present that buffer as a void[] as necessary.

I guess what I'm saying is, void[] is for interfacing, ubyte[] is for storage.

-Steve




May 12, 2010
On Tue, 2010-05-11 at 11:18 -0700, Andrei Alexandrescu wrote:
> Lars Tandle Kyllingstad wrote:
> > Yeah, the ubyte functions were just examples.  My intention was to use templates:
> > 
> >   size_t read(T)(ref T[] b);
> >   size_t write(T)(T[] b);
> > 
> > Then you get a sensible error message if the number of raw bytes read isn't a multiple of the size of your target type.  Also, the returned number is the length of the resulting array, and not the number of raw bytes read.
> > 
> > Thanks for the tips! :)
> 
> That would make the functions type-unsafe. You'd need to limit T to types that have no pointers (at least). Whoa, serialization rears its ugly head.


But you'd have the same problem with void[], wouldn't you?  I mean, this compiles and runs just fine:

        void read(void[] buf) { }

        struct S { int* p; }

        auto a = new S[3];
        read(a);

At least with templates one can restrict T to built-in types, and if anyone wants to read/write compound types they can just set T to void. This has the added advantage of making it very explicit that if you do that, you're on your own.

Another potential gotcha with

        size_t read(void[]);

is that even if you pass it an array of ints, say, the return value will
still be the number of raw bytes read (which, to make matters worse, may
or may not be a multiple of int.sizeof).

I would also like to point out that all the read/write functions of
std.stdio.File, including rawRead() and rawWrite(), are templated.  Has
there been any problems with these?

-Lars