Another new io library (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Another new io library (page 3)

February 18, 2016

Re: Another new io library

Posted by Steven Schveighoffer
in reply to Wyatt

Steven Schveighoffer

Posted in reply to Wyatt

On 2/18/16 2:53 PM, Wyatt wrote:
> On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:

>> But the concept of what constitutes an "item" in a stream may not be
>> the "element type". That's what I'm getting at.
>>
> Hmm, I guess I'm not seeing it.  Like, what even is an "item" in a
> stream?  It sort of precludes that by definition, which is why we have
> to give it a type manually.  What benefit is there to giving the buffer
> type separately from the window that gives you a typed slice into it? (I
> like that, btw.)

An "item" in a stream may be a line of text, it may be a packet of data, it may actually be a byte. But the compiler requires we type the buffer as something rigid that it can work with.

The elements of the stream are the basic fixed-sized units we use (the array element type). The items are less concrete.

>> And I think parsing/processing stream data works better by examining
>> the buffer than shoehorning range functions in there.
>>
> I think it's debatable.  But part of stream semantics is being able to
> use it like a stream, and my BER toy was in that vein. Sorry again, this
> is probably not the place for it unless you try to replace the
> std.stream for real.

I think stream semantics are what you should use. I haven't used std.stream, so I don't know what the API looks like.

I assumed as! was something that returns a range of that type. Maybe I'm wrong?

-Steve

February 18, 2016

Re: Another new io library

Posted by H. S. Teoh
in reply to Steven Schveighoffer

H. S. Teoh

Posted in reply to Steven Schveighoffer

On Thu, Feb 18, 2016 at 03:20:58PM -0500, Steven Schveighoffer via Digitalmars-d wrote:
> On 2/18/16 2:53 PM, Wyatt wrote:
> >On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:
> 
> >>But the concept of what constitutes an "item" in a stream may not be the "element type". That's what I'm getting at.
> >>
> >Hmm, I guess I'm not seeing it.  Like, what even is an "item" in a stream?  It sort of precludes that by definition, which is why we have to give it a type manually.  What benefit is there to giving the buffer type separately from the window that gives you a typed slice into it? (I like that, btw.)
> 
> An "item" in a stream may be a line of text, it may be a packet of data, it may actually be a byte. But the compiler requires we type the buffer as something rigid that it can work with.
> 
> The elements of the stream are the basic fixed-sized units we use (the array element type). The items are less concrete.
[...]

But array elements don't necessarily have to be fixed-sized, do they? For example, an array of lines can be string[] (or const(char)[][]). Of course, dealing with variable-sized items is messy, and probably rather annoying to implement.  But it's *possible*, in theory.


T

-- 
People tell me that I'm paranoid, but they're just out to get me.

February 18, 2016

Re: Another new io library

Posted by Steven Schveighoffer
in reply to H. S. Teoh

Steven Schveighoffer

Posted in reply to H. S. Teoh

On 2/18/16 4:02 PM, H. S. Teoh via Digitalmars-d wrote:
> On Thu, Feb 18, 2016 at 03:20:58PM -0500, Steven Schveighoffer via Digitalmars-d wrote:
>> On 2/18/16 2:53 PM, Wyatt wrote:
>>> On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:
>>
>>>> But the concept of what constitutes an "item" in a stream may not be
>>>> the "element type". That's what I'm getting at.
>>>>
>>> Hmm, I guess I'm not seeing it.  Like, what even is an "item" in a
>>> stream?  It sort of precludes that by definition, which is why we
>>> have to give it a type manually.  What benefit is there to giving the
>>> buffer type separately from the window that gives you a typed slice
>>> into it? (I like that, btw.)
>>
>> An "item" in a stream may be a line of text, it may be a packet of
>> data, it may actually be a byte. But the compiler requires we type the
>> buffer as something rigid that it can work with.
>>
>> The elements of the stream are the basic fixed-sized units we use (the
>> array element type). The items are less concrete.
> [...]
>
> But array elements don't necessarily have to be fixed-sized, do they?
> For example, an array of lines can be string[] (or const(char)[][]). Of
> course, dealing with variable-sized items is messy, and probably rather
> annoying to implement.  But it's *possible*, in theory.

But the point of a stream is that it's contiguous data. A string[] has contiguous data that are pointers and lengths of a fixed size (sizeof(string) is fixed).

This is not how you'd get data from a file or socket.

Since this library doesn't discriminate what the data source provides (it will accept string[] as window type), it's possible. In this case, the element type might make sense as the range front type, but it's not a typical case. However, it might be interesting as, say, a message stream from one thread to another.

-Steve

February 18, 2016

Re: Another new io library

Posted by Chad Joan
in reply to Steven Schveighoffer

Chad Joan

Posted in reply to Steven Schveighoffer

On Wednesday, 17 February 2016 at 06:45:41 UTC, Steven Schveighoffer wrote:
> It's no secret that I've been looking to create an updated io library for phobos. In fact, I've been working on one on and off since 2011 (ouch).
>
> ...

Hi everyone, it's been a while.

I wanted to chime in on the streams-as-ranges thing, since I've thought about this quite a bit in the past and discussed it with Wyatt outside of the forum.

Steve: My apologies in advance if I a misunderstood any of the functionality of your IO library.  I haven't read any of the documentation, just this thread, and I my time is over-committed as usual.

Anyhow...

I believe that when I am dealing with streams, >90% of the time I am dealing with data that is *structured* and *heterogeneous*.  Here are some use-cases:
1. Parsing/writing configuration files (ex: XML, TOML, etc)
2. Parsing/writing messages from some protocol, possibly over a network socket (or sockets).  Example: I am writing a PostgreSQL client and need to deserialize messages: http://www.postgresql.org/docs/9.2/static/protocol-message-formats.html
3. Serializing/deserializing some data structures to/from disk.  Example: I am writing a game and I need to implement save/load functionality.
4. Serializing/deserializing tabular data to/from disk (ex: .CSV files).
5. Reading/writing binary data, such as images or video, from/to disk.  This will probably involve doing a bunch of (3), which is kind of like (2), but followed by large homogenous arrays of some data (ex: pixels).
6. Receiving unstructured user input.  This is my <10%.

Note that (6) is likely to happen eventually but also likely to be minuscule: why are we receiving user input?  Maybe it's just to store it for retrieval later.  BUT, maybe we actually want it to DO something.  If we want it to do something, then we need to structure it before code will be able to operate on it.

(5) is a mix of structured heterogeneous data and structured homogenous data.  In aggregate, this is structured heterogeneous data, because you need to do parsing to figure out where the arrays of homogeneous data start and end (and what they *mean*).

This is why I think it will be much more important to have at least these two interfaces take front-and-center:
A.  The presence of a .popAs!(...) operation (mentioned by Wyatt in this thread, IIRC) for simple deserialization, and maybe for other miscellaneous things like structured user interaction.
B.  The ability to attach parsers to streams easily.  This might be as easy as coercing the input stream into the basic encoding that the parser expects (ex: char/wchar/dchar Ranges for compilers, or maybe ubyte Ranges for our PostgreSQL client's network layer), though it might need (A) to help a bit first if the encoding isn't known in advance (text files can be represented in sooo many ways!  isn't it fabulous!).

I understand that most unsuspecting programmers will arrive at a stream library expecting to immediately see an InputRange interface.  This /probably/ is not what they really want at the end of the day.  So, I think it will be very important for any such library to concisely and convincingly explain the design methodology and rationale early and aggressively.  Neglect to do this, and the library and it's documentation will become a frustration and a violation of expectations (an "astonishment").  Do it right, and the library's documentation will become a teaching tool that leaves visitors feeling enlightened and empowered.

Of course, I have to wonder if someone else has contrasting experiences with stream use-cases.  Maybe they really would be frustrated with a range-agnostic design.  I don't want to alienate this hypothetical individual either, so if this is you, then please share your experiences.

I hope this helps and is worth making a bunch of you read a wall of text ;)

- Chad

February 18, 2016

Re: Another new io library

Posted by Steven Schveighoffer
in reply to Chad Joan

Steven Schveighoffer

Posted in reply to Chad Joan

On 2/18/16 6:52 PM, Chad Joan wrote:
> Steve: My apologies in advance if I a misunderstood any of the
> functionality of your IO library.  I haven't read any of the
> documentation, just this thread, and I my time is over-committed as usual.

Understandable.

>
> Anyhow...
>
> I believe that when I am dealing with streams, >90% of the time I am
> dealing with data that is *structured* and *heterogeneous*. Here are
> some use-cases:
> 1. Parsing/writing configuration files (ex: XML, TOML, etc)
> 2. Parsing/writing messages from some protocol, possibly over a network
> socket (or sockets).  Example: I am writing a PostgreSQL client and need
> to deserialize messages:
> http://www.postgresql.org/docs/9.2/static/protocol-message-formats.html
> 3. Serializing/deserializing some data structures to/from disk. Example:
> I am writing a game and I need to implement save/load functionality.
> 4. Serializing/deserializing tabular data to/from disk (ex: .CSV files).
> 5. Reading/writing binary data, such as images or video, from/to disk.
> This will probably involve doing a bunch of (3), which is kind of like
> (2), but followed by large homogenous arrays of some data (ex: pixels).
> 6. Receiving unstructured user input.  This is my <10%.
>
> Note that (6) is likely to happen eventually but also likely to be
> minuscule: why are we receiving user input?  Maybe it's just to store it
> for retrieval later.  BUT, maybe we actually want it to DO something.
> If we want it to do something, then we need to structure it before code
> will be able to operate on it.
>
> (5) is a mix of structured heterogeneous data and structured homogenous
> data.  In aggregate, this is structured heterogeneous data, because you
> need to do parsing to figure out where the arrays of homogeneous data
> start and end (and what they *mean*).
>
> This is why I think it will be much more important to have at least
> these two interfaces take front-and-center:
> A.  The presence of a .popAs!(...) operation (mentioned by Wyatt in this
> thread, IIRC) for simple deserialization, and maybe for other
> miscellaneous things like structured user interaction.

To me, this is a higher-level function. popAs cannot assume to know how to read what it is reading. If you mean something like reading an entire struct in binary form, that's not difficult to do.

> B.  The ability to attach parsers to streams easily.  This might be as
> easy as coercing the input stream into the basic encoding that the
> parser expects (ex: char/wchar/dchar Ranges for compilers, or maybe
> ubyte Ranges for our PostgreSQL client's network layer), though it might
> need (A) to help a bit first if the encoding isn't known in advance
> (text files can be represented in sooo many ways!  isn't it fabulous!).

This is the fundamental goal for my library -- enabling parsers to read data from a "stream" efficiently no matter how that data is sourced. I know your time is limited, but I would invite you to take a look at the convert program example that I created in my library. In it, I handle converting any UTF format to any other UTF format.

https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.d

>
> I understand that most unsuspecting programmers will arrive at a stream
> library expecting to immediately see an InputRange interface.  This
> /probably/ is not what they really want at the end of the day.  So, I
> think it will be very important for any such library to concisely and
> convincingly explain the design methodology and rationale early and
> aggressively.  Neglect to do this, and the library and it's
> documentation will become a frustration and a violation of expectations
> (an "astonishment"). Do it right, and the library's documentation will
> become a teaching tool that leaves visitors feeling enlightened and
> empowered.

Good points! I will definitely spend some time explaining this.

> Of course, I have to wonder if someone else has contrasting experiences
> with stream use-cases.  Maybe they really would be frustrated with a
> range-agnostic design.  I don't want to alienate this hypothetical
> individual either, so if this is you, then please share your experiences.
>
> I hope this helps and is worth making a bunch of you read a wall of text ;)

Thanks for taking the time.

-Steve

February 19, 2016

Re: Another new io library

Posted by Kagamin
in reply to Steven Schveighoffer

Kagamin

Posted in reply to Steven Schveighoffer

On Thursday, 18 February 2016 at 18:27:28 UTC, Steven Schveighoffer wrote:
> The philosophy that I settled on is to create an iopipe that extends one "item" at a time, even if more are available. Then, apply the range interface on that.
>
> When I first started to write byLine, I made it a range. Then I thought, "what if you wanted to iterate by 2 lines at a time, or iterate by one line at a time, but see the last 2 for context?", well, then that would be another type, and I'd have to abstract out the functionality of line searching.

You mean window has current element and context - lookahead and lookbehind? I stumbled across this article http://blog.jooq.org/2016/01/06/2016-will-be-the-year-remembered-as-when-java-finally-had-window-functions/ it suggests that such window abstraction is generally useful for data analysis.

February 19, 2016

Re: Another new io library

Posted by Dejan Lekic
in reply to Steven Schveighoffer

Dejan Lekic

Posted in reply to Steven Schveighoffer

Steven, this is superb!

Some 10+ years ago, I talked to Tango guys when they worked on I/O part of the Tango library and told them that in my head ideal abstraction for any I/O work is pipe and that I would actually build an I/O library around this abstraction instead of the Channel in Java or Conduit in Tango (well, we all know Tango borrowed ideas from Java API).

Your work is precisely what I was talking about. Well-done!

February 19, 2016

Re: Another new io library

Posted by Steven Schveighoffer
in reply to Kagamin

Steven Schveighoffer

Posted in reply to Kagamin

On 2/19/16 5:22 AM, Kagamin wrote:
> On Thursday, 18 February 2016 at 18:27:28 UTC, Steven Schveighoffer wrote:
>> The philosophy that I settled on is to create an iopipe that extends
>> one "item" at a time, even if more are available. Then, apply the
>> range interface on that.
>>
>> When I first started to write byLine, I made it a range. Then I
>> thought, "what if you wanted to iterate by 2 lines at a time, or
>> iterate by one line at a time, but see the last 2 for context?", well,
>> then that would be another type, and I'd have to abstract out the
>> functionality of line searching.
>
> You mean window has current element and context - lookahead and
> lookbehind? I stumbled across this article
> http://blog.jooq.org/2016/01/06/2016-will-be-the-year-remembered-as-when-java-finally-had-window-functions/
> it suggests that such window abstraction is generally useful for data
> analysis.

window doesn't have any "current" pointer. The window itself is the current data. But with byLine, you could potentially remember where the last N lines were delineated. Hm...

auto byLineWithContext(size_t extraLines = 1, Chain)(Chain c)
{
   auto input = byLine(c);
   static struct ByLineWithContext
   {
      typeof(input) chain;
      size_t[extraLines] prevLines;
      auto front() { return chain.window[prevLines[$-1] .. $]; }
      void popFront()
      {
          auto offset = prevLines[0];
          foreach(i; 0 .. prevLines.length-1)
          {
              prevLines[i] = prevLines[i+1] - offset;
          }
          prevLines[$-1] = chain.window.length - offset;
          chain.release(offset);
          chain.extend(0); // extend in the next line
      }
      void empty()
      {
          return chain.window.length != prevLines[$-1];
      }
      // previous line of context (i = 0 is the oldest context line)
      auto contextLine(size_t i)
      {
          assert(i < prevLines.length);
          return chain.window[i == 0 ? 0 : prevLines[i-1] .. prevLines[i])
      }
   }
   return ByLineWithContext(input);
}

It's an interesting transition to think about looking at an entire buffer of data instead of some pointer to a single point in a stream as the primitive that you have.

-Steve

February 19, 2016

Re: Another new io library

Posted by Steven Schveighoffer
in reply to Dejan Lekic

Steven Schveighoffer

Posted in reply to Dejan Lekic

On 2/19/16 6:27 AM, Dejan Lekic wrote:
> Steven, this is superb!
>
> Some 10+ years ago, I talked to Tango guys when they worked on I/O part
> of the Tango library and told them that in my head ideal abstraction for
> any I/O work is pipe and that I would actually build an I/O library
> around this abstraction instead of the Channel in Java or Conduit in
> Tango (well, we all know Tango borrowed ideas from Java API).
>
> Your work is precisely what I was talking about. Well-done!
>

Thanks! It is definitely true that my time with Tango opened up my eyes to how I/O could be better. I actually wrote the ThreadPipe conduit: https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/device/ThreadPipe.d

This is one of those libraries where the source code is almost writing itself. I feel like I got it right :) Took 5 tries though...

-Steve

February 19, 2016

Re: Another new io library

Posted by Chad Joan
in reply to Steven Schveighoffer

Chad Joan

Posted in reply to Steven Schveighoffer

On Friday, 19 February 2016 at 01:29:15 UTC, Steven Schveighoffer wrote:
> On 2/18/16 6:52 PM, Chad Joan wrote:
>> ...
>>
>> This is why I think it will be much more important to have at least
>> these two interfaces take front-and-center:
>> A.  The presence of a .popAs!(...) operation (mentioned by Wyatt in this
>> thread, IIRC) for simple deserialization, and maybe for other
>> miscellaneous things like structured user interaction.
>
> To me, this is a higher-level function. popAs cannot assume to know how to read what it is reading. If you mean something like reading an entire struct in binary form, that's not difficult to do.
>

I think I understand what you mean.  We are entering the problem domain of serializing and deserializing arbitrary types.

I think what I'd expect is to have the basic language types (ubyte, int, char, string, etc) all covered, and to provide some way (or ways) to integrate with serialization code provided by other types.  So you can do ".popAs!int" out of the box, but ".popAs!MyType" will require MyType to provide a .deserialize member function.  Understandably, this may require some thought (ex: what if MyType is already under constraints from some other API that expects serialization? what does this look like if there are multiple serialization frameworks? etc etc).  I don't have the answer right now and I don't expect it to be solved quickly ;)

>> B.  The ability to attach parsers to streams easily.  This might be as
>> easy as coercing the input stream into the basic encoding that the
>> parser expects (ex: char/wchar/dchar Ranges for compilers, or maybe
>> ubyte Ranges for our PostgreSQL client's network layer), though it might
>> need (A) to help a bit first if the encoding isn't known in advance
>> (text files can be represented in sooo many ways!  isn't it fabulous!).
>
> This is the fundamental goal for my library -- enabling parsers to read data from a "stream" efficiently no matter how that data is sourced. I know your time is limited, but I would invite you to take a look at the convert program example that I created in my library. In it, I handle converting any UTF format to any other UTF format.
>
> https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.d
>

Awesome!

>>
>> I understand that most unsuspecting programmers will arrive at a stream
>> library expecting to immediately see an InputRange interface.  This
>> /probably/ is not what they really want at the end of the day.
>>  So, I
>> think it will be very important for any such library to concisely and
>> convincingly explain the design methodology and rationale early and
>> aggressively.  Neglect to do this, and the library and it's
>> documentation will become a frustration and a violation of expectations
>> (an "astonishment"). Do it right, and the library's documentation will
>> become a teaching tool that leaves visitors feeling enlightened and
>> empowered.
>
> Good points! I will definitely spend some time explaining this.
>

Best of luck :)

>> Of course, I have to wonder if someone else has contrasting experiences
>> with stream use-cases.  Maybe they really would be frustrated with a
>> range-agnostic design.  I don't want to alienate this hypothetical
>> individual either, so if this is you, then please share your experiences.
>>
>> I hope this helps and is worth making a bunch of you read a wall of text ;)
>
> Thanks for taking the time.
>
> -Steve

Thank you for making progress on this problem!

- Chad

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation