Thread overview
[phobos] std.stdio : ByChunk for File
May 30, 2010
Masahiro Nakagawa
May 31, 2010
Masahiro Nakagawa
May 31, 2010
Shin Fujishiro
Aug 28, 2010
Shin Fujishiro
May 30, 2010
File has byChunk method that returns chunks but chunks isn't Range.
So, Range can't treat chunks.
I want chunks of Range version like ByLine(and I hope byChunk returns
ByChunk).

Following code is a simple implementation.
-----
/**
  * Range that reads a chunk at a time.
  */
struct ByChunk
{
   private:
     File    file_;
     ubyte[] chunk_;


   public:
     this(File file, size_t size)
     in
     {
         assert(size, "size must be larger than 0");
     }
     body
     {
         file_  = file;
         chunk_ = new ubyte[](size);

         popFront();
     }

     /// Range primitive operations.
     @property bool empty() const
     {
         return !file_.isOpen;
     }

     /// ditto
     @property ubyte[] front()
     {
         return chunk_;
     }

     /// ditto
     void popFront()
     {
         enforce(file_.isOpen);

         chunk_ = file_.rawRead(chunk_);
         if (!chunk_.length)
             file_.detach();
     }
}
-----

What do you think?


Masahiro
May 30, 2010
On Sun, 2010-05-30 at 18:44 +0900, Masahiro Nakagawa wrote:
> File has byChunk method that returns chunks but chunks isn't Range.
> So, Range can't treat chunks.
> I want chunks of Range version like ByLine(and I hope byChunk returns
> ByChunk).
> 
> Following code is a simple implementation.
> -----
> /**
>   * Range that reads a chunk at a time.
>   */
> struct ByChunk
> {
>    private:
>      File    file_;
>      ubyte[] chunk_;
> 
> 
>    public:
>      this(File file, size_t size)
>      in
>      {
>          assert(size, "size must be larger than 0");
>      }
>      body
>      {
>          file_  = file;
>          chunk_ = new ubyte[](size);
> 
>          popFront();
>      }
> 
>      /// Range primitive operations.
>      @property bool empty() const
>      {
>          return !file_.isOpen;
>      }
> 
>      /// ditto
>      @property ubyte[] front()
>      {
>          return chunk_;
>      }
> 
>      /// ditto
>      void popFront()
>      {
>          enforce(file_.isOpen);
> 
>          chunk_ = file_.rawRead(chunk_);
>          if (!chunk_.length)
>              file_.detach();
>      }
> }
> -----
> 
> What do you think?


I agree.  That's what I did for UnbufferedFile, which I hope to add to std.stdio:

http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d

-Lars

May 31, 2010
On Sun, 30 May 2010 19:05:25 +0900, Lars Tandle Kyllingstad <lars at kyllingen.net> wrote:

> On Sun, 2010-05-30 at 18:44 +0900, Masahiro Nakagawa wrote:
>> File has byChunk method that returns chunks but chunks isn't Range.
>> So, Range can't treat chunks.
>> I want chunks of Range version like ByLine(and I hope byChunk returns
>> ByChunk).
>>
>> Following code is a simple implementation.
>> -----
>> /**
>>   * Range that reads a chunk at a time.
>>   */
>> struct ByChunk
>> {
>>    private:
>>      File    file_;
>>      ubyte[] chunk_;
>>
>>
>>    public:
>>      this(File file, size_t size)
>>      in
>>      {
>>          assert(size, "size must be larger than 0");
>>      }
>>      body
>>      {
>>          file_  = file;
>>          chunk_ = new ubyte[](size);
>>
>>          popFront();
>>      }
>>
>>      /// Range primitive operations.
>>      @property bool empty() const
>>      {
>>          return !file_.isOpen;
>>      }
>>
>>      /// ditto
>>      @property ubyte[] front()
>>      {
>>          return chunk_;
>>      }
>>
>>      /// ditto
>>      void popFront()
>>      {
>>          enforce(file_.isOpen);
>>
>>          chunk_ = file_.rawRead(chunk_);
>>          if (!chunk_.length)
>>              file_.detach();
>>      }
>> }
>> -----
>>
>> What do you think?
>
>
> I agree.  That's what I did for UnbufferedFile, which I hope to add to std.stdio:
>
> http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
>
> -Lars
>

Your templated ByChunk is good!


Masahiro
June 01, 2010
> I want chunks of Range version like ByLine(and I hope byChunk returns ByChunk).

I want it too.

But ByChunk should be more generic IMO.  It should be an adaptor which converts some "bulk input" to range.

This is a bulk input model (similar to the Source concept*1):
----------
struct BulkInput
{
    // Returns true if there is no more data
    bool eof();

    // Reads data in buf and returns the number of elements read
    size_t read(T[] buf);
    size_t read(U[] buf);
    ...
}
----------

File is a bulk input (currently File.read doesn't exist though).  Other possible examples are sockets and FIFO buffers.  All of them can be converted to ranges with generic ByChunk.

Here's a (messy) code which I played around with: http://code.google.com/p/kabe/source/browse/trunk/etc/test_bulkinput_z.d


[*1] Boost: the Source concept http://www.boost.org/doc/libs/1_43_0/libs/iostreams/doc/concepts/source.html


Shin


"Masahiro Nakagawa" <repeatedly at gmail.com> wrote:
> File has byChunk method that returns chunks but chunks isn't Range.
> So, Range can't treat chunks.
> I want chunks of Range version like ByLine(and I hope byChunk returns
> ByChunk).
> 
> Following code is a simple implementation.
> -----
> /**
>   * Range that reads a chunk at a time.
>   */
> struct ByChunk
> {
>    private:
>      File    file_;
>      ubyte[] chunk_;
> 
> 
>    public:
>      this(File file, size_t size)
>      in
>      {
>          assert(size, "size must be larger than 0");
>      }
>      body
>      {
>          file_  = file;
>          chunk_ = new ubyte[](size);
> 
>          popFront();
>      }
> 
>      /// Range primitive operations.
>      @property bool empty() const
>      {
>          return !file_.isOpen;
>      }
> 
>      /// ditto
>      @property ubyte[] front()
>      {
>          return chunk_;
>      }
> 
>      /// ditto
>      void popFront()
>      {
>          enforce(file_.isOpen);
> 
>          chunk_ = file_.rawRead(chunk_);
>          if (!chunk_.length)
>              file_.detach();
>      }
> }
> -----
> 
> What do you think?
> 
> 
> Masahiro
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
August 26, 2010
Hi Masa, everyone,


Yes please. ByChunk should be a range. Please commit the proposed change. We can then discuss further generalizations.

Andrei

On 5/30/10 2:44 PDT, Masahiro Nakagawa wrote:
> File has byChunk method that returns chunks but chunks isn't Range.
> So, Range can't treat chunks.
> I want chunks of Range version like ByLine(and I hope byChunk returns
> ByChunk).
>
> Following code is a simple implementation.
> -----
> /**
> * Range that reads a chunk at a time.
> */
> struct ByChunk
> {
> private:
> File file_;
> ubyte[] chunk_;
>
>
> public:
> this(File file, size_t size)
> in
> {
> assert(size, "size must be larger than 0");
> }
> body
> {
> file_ = file;
> chunk_ = new ubyte[](size);
>
> popFront();
> }
>
> /// Range primitive operations.
> @property bool empty() const
> {
> return !file_.isOpen;
> }
>
> /// ditto
> @property ubyte[] front()
> {
> return chunk_;
> }
>
> /// ditto
> void popFront()
> {
> enforce(file_.isOpen);
>
> chunk_ = file_.rawRead(chunk_);
> if (!chunk_.length)
> file_.detach();
> }
> }
> -----
>
> What do you think?
>
>
> Masahiro
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
August 26, 2010
Regarding UnbufferedFile, I think we should sit down and analyze again what we want to achieve. I understand it's not easy to remember the buffering mode for a file once opened, but since File intercepts fopen() and setvbuf(), it should be easy to add a field to File that tells whether it uses buffering or not, and to set buffering upon opening the file. I don't think an entire new type is justified here.

Andrei

On 5/30/10 3:05 PDT, Lars Tandle Kyllingstad wrote:
> I agree.  That's what I did for UnbufferedFile, which I hope to add to std.stdio:
>
> http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
>
> -Lars
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
August 26, 2010
On 5/31/10 10:00 PDT, Shin Fujishiro wrote:
>> I want chunks of Range version like ByLine(and I hope byChunk returns
>> ByChunk).
>
> I want it too.
>
> But ByChunk should be more generic IMO.  It should be an adaptor which converts some "bulk input" to range.
>
> This is a bulk input model (similar to the Source concept*1):
> ----------
> struct BulkInput
> {
>      // Returns true if there is no more data
>      bool eof();
>
>      // Reads data in buf and returns the number of elements read
>      size_t read(T[] buf);
>      size_t read(U[] buf);
>      ...
> }
> ----------

What types are T and U?

Andrei
August 27, 2010
I've been thinking some more about this, and I think you're right. Creating an entirely new type for this one use case may be overkill.  A buffering-on/-off flag in File should suffice.

Another point worth noting is that we only need the input stream to not be buffered.  For the output and error streams, spawnProcess() would just call flush() before passing them on to the child process.

Steven, do you have any comments?

-Lars


On Thu, 2010-08-26 at 22:14 -0700, Andrei Alexandrescu wrote:
> Regarding UnbufferedFile, I think we should sit down and analyze again what we want to achieve. I understand it's not easy to remember the buffering mode for a file once opened, but since File intercepts fopen() and setvbuf(), it should be easy to add a field to File that tells whether it uses buffering or not, and to set buffering upon opening the file. I don't think an entire new type is justified here.
> 
> Andrei
> 
> On 5/30/10 3:05 PDT, Lars Tandle Kyllingstad wrote:
> > I agree.  That's what I did for UnbufferedFile, which I hope to add to std.stdio:
> >
> > http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
> >
> > -Lars
> >
> > _______________________________________________
> > phobos mailing list
> > phobos at puremagic.com
> > http://lists.puremagic.com/mailman/listinfo/phobos


August 27, 2010
I don't really understand the concern for adding another type... We add types all the time for less important reasons.

What a separate type does is move the error from runtime to compile-time.  When spawning a process, it's going to be annoying if some times trying to start a process throws a runtime error for a possibly trivial problem.  With a compiler error, the user is made aware of what is going to happen.  If we go with the runtime route, I'd elect to just ignore the fact that the buffer is not passed, and have some sort of guideline in spawn's docs ("if you open a file with the intent of passing it to spawn, make sure you open it with no buffer!").

-Steve



----- Original Message ----
> From: Lars Tandle Kyllingstad <lars at kyllingen.net>
> To: Phobos mailing list <phobos at puremagic.com>
> Sent: Fri, August 27, 2010 2:16:09 AM
> Subject: Re: [phobos] std.stdio : ByChunk for File
> 
> I've been thinking some more about this, and I think you're right. Creating  an entirely new type for this one use case may be overkill.   A buffering-on/-off flag in File should suffice.
> 
> Another point worth  noting is that we only need the input stream to not be buffered.  For  the output and error streams, spawnProcess() would just call flush() before  passing them on to the child process.
> 
> Steven, do you have any  comments?
> 
> -Lars
> 
> 
> On Thu, 2010-08-26 at 22:14 -0700, Andrei  Alexandrescu wrote:
> > Regarding UnbufferedFile, I think we should sit down  and analyze again what we want to achieve. I understand it's not easy  to remember the buffering mode for a file once opened, but since File  intercepts fopen() and setvbuf(), it should be easy to add a field to  File that tells whether it uses buffering or not, and to set buffering  upon opening the file. I don't think an entire new type is justified  here.
> > 
> > Andrei
> > 
> > On 5/30/10 3:05 PDT, Lars Tandle  Kyllingstad wrote:
> > > I agree.  That's what I did for  UnbufferedFile, which I hope to add to std.stdio:
> >  >
> > >  http://github.com/kyllingstad/ltk/blob/master/ltk/stdio.d
> > >
> >  > -Lars
> > >
> > >  _______________________________________________
> > > phobos mailing  list
> > > phobos at puremagic.com
> > >  http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
> _______________________________________________
> phobos  mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
> 



August 28, 2010
Andrei Alexandrescu <andrei at erdani.com> wrote:
> On 5/31/10 10:00 PDT, Shin Fujishiro wrote:
> >> I want chunks of Range version like ByLine(and I hope byChunk returns
> >> ByChunk).
> >
> > I want it too.
> >
> > But ByChunk should be more generic IMO.  It should be an adaptor which converts some "bulk input" to range.
> >
> > This is a bulk input model (similar to the Source concept*1):
> > ----------
> > struct BulkInput
> > {
> >      // Returns true if there is no more data
> >      bool eof();
> >
> >      // Reads data in buf and returns the number of elements read
> >      size_t read(T[] buf);
> >      size_t read(U[] buf);
> >      ...
> > }
> > ----------
> 
> What types are T and U?

They were meant as placeholders.  Although I can't recall why I wrote the multiple reads.  That didn't make much sense.


Shin