Jump to page: 1 2 3
Thread overview
[phobos] UnbufferedFile, or, abstracting the File ranges
May 10, 2010
Denis
May 10, 2010
Denis
May 10, 2010
In the process of designing std.process it has become obvious, as pointed out by Steve, that Phobos needs facilities for unbuffered I/O. To that end, I've started writing an UnbufferedFile type, the current status of which can be seen here:

        Code: http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
        Docs: http://kyllingen.net/code/ltk/doc/stdio.html

(Disclaimer: This is very much a work-in-progress, there's lots of stuff that needs to be added yet, and I'd be surprised if there wasn't lots of room for improvement, performance-wise.)


Now, while writing this it has kind of annoyed me that I have to write new implementations of the byLine and byChunk ranges.  I've personally found them incredibly useful, so I want them in UnbufferedFile, but the ones in std.stdio are tailored for File.

I therefore suggest we try to abstract these ranges, so they can operate
on general types that define a set of primitives such as read(), readc()
and readln().

Are there problems with this?  Any comments?

-Lars

May 10, 2010
Re: byLine and byChunk, I don't think these are a good idea on unbuffered files.

For example, your current implementation will be extremely slow.  Reading one char at a time is OK on a buffered file, because most times its just a simple fetch of a char from a buffer.  But your implementation reads a single character at a time from the actual file on disk, a very slow operation.

I think unbuffered files are good for when you want to handle the buffering yourself, or when you want to pass them to child processes.

-Steve





________________________________
From: Lars Tandle Kyllingstad <lars at kyllingen.net>
To: Phobos mailing list <phobos at puremagic.com>
Sent: Mon, May 10, 2010 7:40:15 AM
Subject: [phobos] UnbufferedFile, or, abstracting the File ranges

In the process of designing std.process it has become obvious, as pointed out by Steve, that Phobos needs facilities for unbuffered I/O. To that end, I've started writing an UnbufferedFile type, the current status of which can be seen here:

        Code: http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
        Docs: http://kyllingen.net/code/ltk/doc/stdio.html

(Disclaimer: This is very much a work-in-progress, there's lots of stuff that needs to be added yet, and I'd be surprised if there wasn't lots of room for improvement, performance-wise.)


Now, while writing this it has kind of annoyed me that I have to write new implementations of the byLine and byChunk ranges.  I've personally found them incredibly useful, so I want them in UnbufferedFile, but the ones in std.stdio are tailored for File.

I therefore suggest we try to abstract these ranges, so they can operate
on general types that define a set of primitives such as read(), readc()
and readln().

Are there problems with this?  Any comments?

-Lars

_______________________________________________
phobos mailing list
phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100510/30877048/attachment.html>
May 10, 2010
Well, that would at least mean less work for me. :)

Which I/O methods should it contain, then, in your opinion?  Would

        bool read(ref ubyte b);
        size_t read(ref ubyte[] b);
        void write(ubyte b);
        void write(ubyte[] b);

suffice?

-Lars



On Mon, 2010-05-10 at 05:02 -0700, Steve Schveighoffer wrote:
> Re: byLine and byChunk, I don't think these are a good idea on unbuffered files.
> 
> For example, your current implementation will be extremely slow. Reading one char at a time is OK on a buffered file, because most times its just a simple fetch of a char from a buffer.  But your implementation reads a single character at a time from the actual file on disk, a very slow operation.
> 
> I think unbuffered files are good for when you want to handle the buffering yourself, or when you want to pass them to child processes.
> 
> -Steve
> 
> 
> 
> 
> ______________________________________________________________________
> From: Lars Tandle Kyllingstad <lars at kyllingen.net>
> To: Phobos mailing list <phobos at puremagic.com>
> Sent: Mon, May 10, 2010 7:40:15 AM
> Subject: [phobos] UnbufferedFile, or, abstracting the File ranges
> 
> In the process of designing std.process it has become obvious, as pointed out by Steve, that Phobos needs facilities for unbuffered I/O. To that end, I've started writing an UnbufferedFile type, the current status of which can be seen here:
> 
>         Code:
> http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
>         Docs: http://kyllingen.net/code/ltk/doc/stdio.html
> 
> (Disclaimer: This is very much a work-in-progress, there's lots of
> stuff
> that needs to be added yet, and I'd be surprised if there wasn't lots
> of
> room for improvement, performance-wise.)
> 
> 
> Now, while writing this it has kind of annoyed me that I have to write
> new implementations of the byLine and byChunk ranges.  I've personally
> found them incredibly useful, so I want them in UnbufferedFile, but
> the
> ones in std.stdio are tailored for File.
> 
> I therefore suggest we try to abstract these ranges, so they can
> operate
> on general types that define a set of primitives such as read(),
> readc()
> and readln().
> 
> Are there problems with this?  Any comments?
> 
> -Lars
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos


May 10, 2010
I would define only 2 i/o functions:

size_t read(void[] b);
size_t write(const(void)[] b);

And then the various paraphernalia around it (close, open, etc).

The reason to use void[] is because any array data type can be passed to it without casting (imagine you wanted to read an array of ints).

Reading and writing a single byte should be discouraged with unbuffered streams.  This is how it is in most I/O libs.  You build your unbuffered I/O to abstract the OS functions, then build your buffered I/O and fancy functionality on top of it.

-Steve




________________________________
From: Lars Tandle Kyllingstad <lars at kyllingen.net>
To: Discuss the phobos library for D <phobos at puremagic.com>
Sent: Mon, May 10, 2010 8:14:26 AM
Subject: Re: [phobos] UnbufferedFile, or, abstracting the File ranges

Well, that would at least mean less work for me. :)

Which I/O methods should it contain, then, in your opinion?  Would

        bool read(ref ubyte b);
        size_t read(ref ubyte[] b);
        void write(ubyte b);
        void write(ubyte[] b);

suffice?

-Lars



On Mon, 2010-05-10 at 05:02 -0700, Steve Schveighoffer wrote:
> Re: byLine and byChunk, I don't think these are a good idea on unbuffered files.
> 
> For example, your current implementation will be extremely slow. Reading one char at a time is OK on a buffered file, because most times its just a simple fetch of a char from a buffer.  But your implementation reads a single character at a time from the actual file on disk, a very slow operation.
> 
> I think unbuffered files are good for when you want to handle the buffering yourself, or when you want to pass them to child processes.
> 
> -Steve
> 
> 
> 
> 
> ______________________________________________________________________
> From: Lars Tandle Kyllingstad <lars at kyllingen.net>
> To: Phobos mailing list <phobos at puremagic.com>
> Sent: Mon, May 10, 2010 7:40:15 AM
> Subject: [phobos] UnbufferedFile, or, abstracting the File ranges
> 
> In the process of designing std.process it has become obvious, as pointed out by Steve, that Phobos needs facilities for unbuffered I/O. To that end, I've started writing an UnbufferedFile type, the current status of which can be seen here:
> 
>         Code:
> http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
>         Docs: http://kyllingen.net/code/ltk/doc/stdio.html
> 
> (Disclaimer: This is very much a work-in-progress, there's lots of
> stuff
> that needs to be added yet, and I'd be surprised if there wasn't lots
> of
> room for improvement, performance-wise.)
> 
> 
> Now, while writing this it has kind of annoyed me that I have to write
> new implementations of the byLine and byChunk ranges.  I've personally
> found them incredibly useful, so I want them in UnbufferedFile, but
> the
> ones in std.stdio are tailored for File.
> 
> I therefore suggest we try to abstract these ranges, so they can
> operate
> on general types that define a set of primitives such as read(),
> readc()
> and readln().
> 
> Are there problems with this?  Any comments?
> 
> -Lars
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos


_______________________________________________
phobos mailing list
phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100510/b57c5fba/attachment-0001.html>
May 10, 2010
On Mon, May 10, 2010 at 4:24 PM, Steve Schveighoffer <schveiguy at yahoo.com> wrote:
> I would define only 2 i/o functions:
>
> size_t read(void[] b);
> size_t write(const(void)[] b);
>
> And then the various paraphernalia around it (close, open, etc).
>
> The reason to use void[] is because any array data type can be passed to it without casting (imagine you wanted to read an array of ints).
>
> Reading and writing a single byte should be discouraged with unbuffered streams.? This is how it is in most I/O libs.? You build your unbuffered I/O to abstract the OS functions, then build your buffered I/O and fancy functionality on top of it.
>
> -Steve
>

I think it should be byte[], not void[]. First, byte[] aren't scanned for pointers by GC. Second, it hijacks type safety. I believe you need an explicit cast, even if you are sure about the type of data in the file (throw in an Endianness if you are still not convienced).
May 10, 2010
On Mon, May 10, 2010 at 3:40 PM, Lars Tandle Kyllingstad <lars at kyllingen.net> wrote:
> In the process of designing std.process it has become obvious, as pointed out by Steve, that Phobos needs facilities for unbuffered I/O. To that end, I've started writing an UnbufferedFile type, the current status of which can be seen here:
>
> ? ? ? ?Code: http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d ? ? ? ?Docs: http://kyllingen.net/code/ltk/doc/stdio.html
>
> (Disclaimer: This is very much a work-in-progress, there's lots of stuff that needs to be added yet, and I'd be surprised if there wasn't lots of room for improvement, performance-wise.)
>
>
> Now, while writing this it has kind of annoyed me that I have to write new implementations of the byLine and byChunk ranges. ?I've personally found them incredibly useful, so I want them in UnbufferedFile, but the ones in std.stdio are tailored for File.
>
> I therefore suggest we try to abstract these ranges, so they can operate
> on general types that define a set of primitives such as read(), readc()
> and readln().
>
> Are there problems with this? ?Any comments?
>
> -Lars
>

A little bit of a bike-shed and an off-topic discussion, by I think it's time to drop the C-style "r", "b", "w+" archaism in favor of type safe enums. It might be better to also cover access rights - what access is available to the file while it is being used by our process (read, write, remove etc).
May 10, 2010
Yeah, the ubyte functions were just examples.  My intention was to use templates:

  size_t read(T)(ref T[] b);
  size_t write(T)(T[] b);

Then you get a sensible error message if the number of raw bytes read isn't a multiple of the size of your target type.  Also, the returned number is the length of the resulting array, and not the number of raw bytes read.

Thanks for the tips! :)

-Lars



On Mon, 2010-05-10 at 05:24 -0700, Steve Schveighoffer wrote:
> I would define only 2 i/o functions:
> 
> size_t read(void[] b);
> size_t write(const(void)[] b);
> 
> And then the various paraphernalia around it (close, open, etc).
> 
> The reason to use void[] is because any array data type can be passed to it without casting (imagine you wanted to read an array of ints).
> 
> Reading and writing a single byte should be discouraged with unbuffered streams.  This is how it is in most I/O libs.  You build your unbuffered I/O to abstract the OS functions, then build your buffered I/O and fancy functionality on top of it.
> 
> -Steve
> 
> 
> 
> ______________________________________________________________________
> From: Lars Tandle Kyllingstad <lars at kyllingen.net>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Mon, May 10, 2010 8:14:26 AM
> Subject: Re: [phobos] UnbufferedFile, or, abstracting the File ranges
> 
> Well, that would at least mean less work for me. :)
> 
> Which I/O methods should it contain, then, in your opinion?  Would
> 
>         bool read(ref ubyte b);
>         size_t read(ref ubyte[] b);
>         void write(ubyte b);
>         void write(ubyte[] b);
> 
> suffice?
> 
> -Lars
> 
> 
> 
> On Mon, 2010-05-10 at 05:02 -0700, Steve Schveighoffer wrote:
> > Re: byLine and byChunk, I don't think these are a good idea on unbuffered files.
> > 
> > For example, your current implementation will be extremely slow. Reading one char at a time is OK on a buffered file, because most times its just a simple fetch of a char from a buffer.  But your implementation reads a single character at a time from the actual
> file
> > on disk, a very slow operation.
> > 
> > I think unbuffered files are good for when you want to handle the buffering yourself, or when you want to pass them to child
> processes.
> > 
> > -Steve
> > 
> > 
> > 
> > 
> >
> ______________________________________________________________________
> > From: Lars Tandle Kyllingstad <lars at kyllingen.net>
> > To: Phobos mailing list <phobos at puremagic.com>
> > Sent: Mon, May 10, 2010 7:40:15 AM
> > Subject: [phobos] UnbufferedFile, or, abstracting the File ranges
> > 
> > In the process of designing std.process it has become obvious, as pointed out by Steve, that Phobos needs facilities for unbuffered
> I/O.
> > To that end, I've started writing an UnbufferedFile type, the
> current
> > status of which can be seen here:
> > 
> >        Code:
> > http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
> >        Docs: http://kyllingen.net/code/ltk/doc/stdio.html
> > 
> > (Disclaimer: This is very much a work-in-progress, there's lots of
> > stuff
> > that needs to be added yet, and I'd be surprised if there wasn't
> lots
> > of
> > room for improvement, performance-wise.)
> > 
> > 
> > Now, while writing this it has kind of annoyed me that I have to
> write
> > new implementations of the byLine and byChunk ranges.  I've
> personally
> > found them incredibly useful, so I want them in UnbufferedFile, but
> > the
> > ones in std.stdio are tailored for File.
> > 
> > I therefore suggest we try to abstract these ranges, so they can
> > operate
> > on general types that define a set of primitives such as read(),
> > readc()
> > and readln().
> > 
> > Are there problems with this?  Any comments?
> > 
> > -Lars
> > 
> > _______________________________________________
> > phobos mailing list
> > phobos at puremagic.com
> > http://lists.puremagic.com/mailman/listinfo/phobos
> > 
> > 
> > _______________________________________________
> > phobos mailing list
> > phobos at puremagic.com
> > http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos


May 10, 2010
The type says nothing about whether it's scanned for pointers or not.  Remember that it's the caller supplying the array, the void[] type just says "pass any type of array in".  The function should not reallocate the array (and even if it did, the scanning bits are copied from the original, not determined by the type).

In other words, the GC cares nothing about type, it only cares what bits are set in the memory block.  And those bits are set on allocation, not when a cast is made or a parameter is passed.

And on hijacking type safety, there is no type safety when reading and writing a stream.  Stream data comes in or goes out as untyped data, so I think using void[] is actually more accurate to what is happening.

-Steve




________________________________
From: Denis <2korden at gmail.com>
To: Discuss the phobos library for D <phobos at puremagic.com>
Sent: Mon, May 10, 2010 8:38:44 AM
Subject: Re: [phobos] UnbufferedFile, or, abstracting the File ranges

On Mon, May 10, 2010 at 4:24 PM, Steve Schveighoffer <schveiguy at yahoo.com> wrote:
> I would define only 2 i/o functions:
>
> size_t read(void[] b);
> size_t write(const(void)[] b);
>
> And then the various paraphernalia around it (close, open, etc).
>
> The reason to use void[] is because any array data type can be passed to it without casting (imagine you wanted to read an array of ints).
>
> Reading and writing a single byte should be discouraged with unbuffered streams.  This is how it is in most I/O libs.  You build your unbuffered I/O to abstract the OS functions, then build your buffered I/O and fancy functionality on top of it.
>
> -Steve
>

I think it should be byte[], not void[]. First, byte[] aren't scanned
for pointers by GC. Second, it hijacks type safety. I believe you need
an explicit cast, even if you are sure about the type of data in the
file (throw in an Endianness if you are still not convienced).
_______________________________________________
phobos mailing list
phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100510/c8389dfc/attachment.html>
May 10, 2010
On Mon, 2010-05-10 at 16:43 +0400, Denis wrote:
> On Mon, May 10, 2010 at 3:40 PM, Lars Tandle Kyllingstad <lars at kyllingen.net> wrote:
> > In the process of designing std.process it has become obvious, as pointed out by Steve, that Phobos needs facilities for unbuffered I/O. To that end, I've started writing an UnbufferedFile type, the current status of which can be seen here:
> >
> > [...]
> 
> A little bit of a bike-shed and an off-topic discussion, by I think it's time to drop the C-style "r", "b", "w+" archaism in favor of type safe enums. It might be better to also cover access rights - what access is available to the file while it is being used by our process (read, write, remove etc).


Actually, I was planning to do just that -- the enum thing, that is. But I did a little investigation, and found that the file mode is specified with a string in Python too, and I thought:  If C and Python actually agrees about something, who am I to argue? :)

-Lars

May 11, 2010
Steve Schveighoffer wrote:
> Re: byLine and byChunk, I don't think these are a good idea on unbuffered files.

byLine is onerous because it reads one character at a time, but byChunk makes a lot of sense. It essentially is buffering straight in user's buffers. A lot of Unix utilities do that.

> For example, your current implementation will be extremely slow. Reading one char at a time is OK on a buffered file, because most times its just a simple fetch of a char from a buffer.  But your implementation reads a single character at a time from the actual file on disk, a very slow operation.
> 
> I think unbuffered files are good for when you want to handle the buffering yourself, or when you want to pass them to child processes.

Actually a great use case is character-level unbuffered files when you want to do something for each keypress off a tty.


Andrei
« First   ‹ Prev
1 2 3