March 06, 2013
07-Mar-2013 00:52, Timon Gehr пишет:
> On 03/05/2013 08:12 PM, Dmitry Olshansky wrote:
>> ...
>>
>> There is one thing I found a nice abstraction while helping out on D's
>> lexer in D and I call it mark-slice range. An extension to forward range
>> it seems.
>>
>> It's all about buffering and defining a position in input such that you
>> don't care for anything up to this point. This means that starting from
>> thusly marked point stuff needs to be kept in buffer, everything prior
>> to it could be discarded. The 2nd operation "slice" is getting a slice
>> of some internal buffer from last mark to the current position.
>> ...
>
> The lexer I have built last year does something similar. It allows the
> parser to save and restore sorted positions in FIFO order with one
> size_t of memory inside the parser's current stack frame (internally,
> the lexer only saves the first position). The data is kept in a circular
> buffer that grows dynamically in case the required lookahead is too large.

Exactly. Nice to see common patterns resurface, would be good to fit it elegantly into a native D i/o subsystem.

-- 
Dmitry Olshansky
March 07, 2013
On 06/03/2013 16:36, Steven Schveighoffer wrote:
> On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768@gmail.com> wrote:
<snip>
>> Create a range operation like "r.takeArray(n)". You can optimize it to
>> take a slice of the buffer when possible.
>
> This is not a good idea.  We want streams to be high performance.
> Accepting any range, such as a dchar range that outputs one dchar at a
> time, is not going to be high performance.
<snip>

That certain specific types of range can't implement a given operation efficiently isn't a reason to reject the idea.

If somebody tries using takeArray on a range that by its very nature can only pick off elements one by one, they should expect it to be as slow as a for loop.  OTOH, when used on a file, array or similar structure, it will perform much better than this.

But thinking about it now, maybe what we need is the concept of a "block input" range, which is an input range with the addition of the takeArray method.  Of course, standard D arrays would be block input ranges.  Then (for example) a library that reads a binary file format can be built to accept a block input range of bytes.

Stewart.
March 07, 2013
On Wednesday, 6 March 2013 at 16:36:38 UTC, Steven Schveighoffer wrote:
> On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768@gmail.com> wrote:
>>
>> Ranges aren't necessarily higher- or lower-level than streams; they're completely orthogonal ways of looking at a data source. It's completely possible to build a stream interface on top of a range of characters, which is what I was suggesting. In that situation, the range is at a lower level of abstraction than the stream is.
>
> I think you misunderstand.  Ranges absolutely can be a source for streams, especially if they are arrays.  The point is that the range *interface* doesn't make a good stream interface.  So we need to invent new methods to access streams.

Although I probably didn't communicate it very well, my idea was that since we already have functions like std.conv.parse that essentially provide parts of a stream interface on top of ranges, the most convenient way to implement a stream might be to build it on top of a range interface so no code duplication is needed.

>> Create a range operation like "r.takeArray(n)". You can optimize it to take a slice of the buffer when possible.
>
> This is not a good idea.  We want streams to be high performance.  Accepting any range, such as a dchar range that outputs one dchar at a time, is not going to be high performance.

If the function is optimized, it can essentially bypass the range layer and operate directly on the buffer while using the same interface it would use if it were operating on the range. As I understand it, some of the operations in Phobos do that as well when given arrays.

>
> On top of that, in some cases, the result will be a slice, in some cases it will be a copy.  Generic code will have to figure out that difference if it wants to save the data for later, or else risk double copying.
>

That could definitely be an issue. It should be possible to enforce slicing semantics somehow, but I'd have to think about it.

>>
>> Range operations like std.conv.parse implicitly progress their source ranges.
>
> That's not a range operation.  Range operations are empty, popFront, front.  Anything built on top of ranges must use ONLY these three operations, otherwise you are talking about something else.

I guess that's not the right terminology for what I'm trying to express. I was thinking of "operations that act on ranges."


>> From what I see, at least in terms of the interface, a stream is basically just a generalization of a range that supports more than one type as input/output. There's no reason that such a system couldn't be built on top of a range, especially when the internal representation is of a single type: characters.
>
> streams shouldn't have to support the front/popFront mechanism.
>  empty may be the only commonality.  I think that is an awkward fit for ranges.  Certainly it is possible to take a *specific* range, such as an array, and add a stream-like interface to it.
>  But not ranges in general.

I hadn't considered the case of r.front; I was only thinking about r.popFront. Looks like they're a little more different than I was thinking, but they're still very similar under certain conditions.

Ultimately, we do need some type of a traditional stream interface; I was just thinking about using ranges behind the scenes and using existing pieces of the standard library for stream operations rather than putting all of the operations into a unified data type. I'm not sure if it could really be called an "ideal" design, but I do think that it could provide a good minimalist solution with performance that would be acceptable for at least many applications.

March 07, 2013
>
> That certain specific types of range can't implement a given operation efficiently isn't a reason to reject the idea.
>
> If somebody tries using takeArray on a range that by its very nature can only pick off elements one by one, they should expect it to be as slow as a for loop.  OTOH, when used on a file, array or similar structure, it will perform much better than this.
>
> But thinking about it now, maybe what we need is the concept of a "block input" range, which is an input range with the addition of the takeArray method.  Of course, standard D arrays would be block input ranges.  Then (for example) a library that reads a binary file format can be built to accept a block input range of bytes.
>
> Stewart.

That's basically what my thinking was, but you've expressed it in a better way than I think I could have. I'd definitely like to see this idea implemented; it could be useful for just about anything involving a buffer.
March 07, 2013
On Wed, 06 Mar 2013 19:08:40 -0500, Stewart Gordon <smjg_1998@yahoo.com> wrote:

> On 06/03/2013 16:36, Steven Schveighoffer wrote:
>> On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768@gmail.com> wrote:
> <snip>
>>> Create a range operation like "r.takeArray(n)". You can optimize it to
>>> take a slice of the buffer when possible.
>>
>> This is not a good idea.  We want streams to be high performance.
>> Accepting any range, such as a dchar range that outputs one dchar at a
>> time, is not going to be high performance.
> <snip>
>
> That certain specific types of range can't implement a given operation efficiently isn't a reason to reject the idea.

Sorry, but that is.

If we make it so streams are implicitly built out of low-performance ranges, they will be built out of low performance ranges.

There is always a mechanism to build a stream out of a range, it shouldn't be implicit.  Not every range makes a good stream.

> But thinking about it now, maybe what we need is the concept of a "block input" range, which is an input range with the addition of the takeArray method.  Of course, standard D arrays would be block input ranges.  Then (for example) a library that reads a binary file format can be built to accept a block input range of bytes.

I don't really understand the need to make ranges into streams.  Streams require a completely separate interface.  An object can be a range and a stream (e.g. array), but to say a stream is a specific kind of range, when ranges have nothing significant that streams need (front, popFront), is just "range fever".  Not everything is a range.

The range interface and the stream interface are orthogonal.  There is no overlap.

-Steve
March 07, 2013
Am Thu, 07 Mar 2013 07:07:25 -0500
schrieb "Steven Schveighoffer" <schveiguy@yahoo.com>:

> 
> The range interface and the stream interface are orthogonal.  There is no overlap.
> 
> -Steve

Exactly. There's also some precedent for this: C# Enumerators (IEnumerator) are basically the same thing as input ranges (current/front, moveNext/popFront) but C# has streams nevertheless. C++ also has iterators and streams.
March 07, 2013
On Wed, 06 Mar 2013 20:15:31 -0500, BLM768 <blm768@gmail.com> wrote:

> On Wednesday, 6 March 2013 at 16:36:38 UTC, Steven Schveighoffer wrote:
>> On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768@gmail.com> wrote:
>>>
>>> Ranges aren't necessarily higher- or lower-level than streams; they're completely orthogonal ways of looking at a data source. It's completely possible to build a stream interface on top of a range of characters, which is what I was suggesting. In that situation, the range is at a lower level of abstraction than the stream is.
>>
>> I think you misunderstand.  Ranges absolutely can be a source for streams, especially if they are arrays.  The point is that the range *interface* doesn't make a good stream interface.  So we need to invent new methods to access streams.
>
> Although I probably didn't communicate it very well, my idea was that since we already have functions like std.conv.parse that essentially provide parts of a stream interface on top of ranges, the most convenient way to implement a stream might be to build it on top of a range interface so no code duplication is needed.

My point is, we should not build streams from ranges.  We have to establish terminology here.  A range is an API which provides a way to iterate over each element in a source using the methods front, popFront, and empty.

A basic stream provides a single function: read.  This function reads N bytes into an array, and advances the stream position.  Not a range, an array.  That is the basic building block that the OS gives us.  You can make read out of front, popFront, and empty, but it's going to be horribly low-performing, and I see no benefit to have read sit alongside the range primitives.

On top of that, we provide a buffered stream which manages the array the lower-level stream outputs, and allows access to data a chunk at a time.  What defines that chunk is application-specific.

At a higher level is where ranges and streams meet.  front can provide access to a chunk, popFront can move on to the next chunk, and empty maps to EOF (last read returned 0 bytes).  That is a great mapping, and I expect it will be the preferred interface.  What I want to provide with std.io is an easy way to build ranges on top of streams by defining a mechanism to build the chunk.

But to say that streams are ranges at heart is incorrect.  Streams need the read feature, they don't need range features.

Now, if you want to shoehorn a range into a stream, I certainly can see how it will be possible.  Extremely slow, but possible.  That should be the last resort.  It shouldn't be the foundation.

There is the temptation to say "hey, arrays are ranges, and arrays make good stream sources!  Why can't all ranges make good stream sources?"  But arrays are good stream sources NOT because they are ranges, but because they are arrays.  Reading an array into an array is a noop.

>
>>> Create a range operation like "r.takeArray(n)". You can optimize it to take a slice of the buffer when possible.
>>
>> This is not a good idea.  We want streams to be high performance.  Accepting any range, such as a dchar range that outputs one dchar at a time, is not going to be high performance.
>
> If the function is optimized, it can essentially bypass the range layer and operate directly on the buffer while using the same interface it would use if it were operating on the range. As I understand it, some of the operations in Phobos do that as well when given arrays.

This is the wrong track to take.

There have been quite a few people in the D community that have advocated for the syntax:

int[] arr;

auto p = 5 in arr;

Just like AAs.  It looks great!  Why shouldn't we have a way to search for data with such a concise interface?  The problem is then that diminishes the value of 'in'.  For AAs, this lookup is O(1) amortized, For an array, it's O(n).  This means any time a coder sees x in y, he has to consider whether that is a "slow lookup" or a "quick lookup".  Not only that, but generic code that uses the in operation has to insert caveats "this function is O(n) if T is an array, otherwise it's O(1)".  The situation is not something we want.

But if you still want to find 5 in arr, there is the not-as-nice, but certainly reasonable looking:

auto p = arr.find(5).ptr;

My point is, we don't want any range to substitute for a stream.  I think it might be worth considering accepting random-access ranges, or slice-assignable ranges to be stream sources, but not just any range.  We could provide a "RangeStream" type which shoehorns any range into a stream, but I'd want it tucked in some shadowy corner of Phobos, not to be used except in emergencies when nothing else will do.  It should be discouraged.

>>> Range operations like std.conv.parse implicitly progress their source ranges.
>>
>> That's not a range operation.  Range operations are empty, popFront, front.  Anything built on top of ranges must use ONLY these three operations, otherwise you are talking about something else.
>
> I guess that's not the right terminology for what I'm trying to express. I was thinking of "operations that act on ranges."

What I don't want is to accept ranges as streams.  For example, if we have an isInputStream trait, it should not accept ranges.  But you certainly can use existing phobos functions to shoehorn ranges into a stream-like API.

> Ultimately, we do need some type of a traditional stream interface; I was just thinking about using ranges behind the scenes and using existing pieces of the standard library for stream operations rather than putting all of the operations into a unified data type. I'm not sure if it could really be called an "ideal" design, but I do think that it could provide a good minimalist solution with performance that would be acceptable for at least many applications.

I hope my above comments have made clear that I am not against having ranges be forcibly changed into streams.  What I don't want is ranges implicitly treated as streams.  Certainly, we have a lot of existing range-processing code that could be leveraged.  But streams and ranges are different concepts, different APIs even.  Building bridges between the two should be possible, and ranges will make great interfaces to streams.

-Steve
March 08, 2013
On Thursday, 7 March 2013 at 12:42:23 UTC, Steven Schveighoffer wrote:
>>
>> If the function is optimized, it can essentially bypass the range layer and operate directly on the buffer while using the same interface it would use if it were operating on the range. As I understand it, some of the operations in Phobos do that as well when given arrays.
>
> This is the wrong track to take.
>
> There have been quite a few people in the D community that have advocated for the syntax:
>
> int[] arr;
>
> auto p = 5 in arr;
>
> Just like AAs.  It looks great!  Why shouldn't we have a way to search for data with such a concise interface?  The problem is then that diminishes the value of 'in'.  For AAs, this lookup is O(1) amortized, For an array, it's O(n).  This means any time a coder sees x in y, he has to consider whether that is a "slow lookup" or a "quick lookup".  Not only that, but generic code that uses the in operation has to insert caveats "this function is O(n) if T is an array, otherwise it's O(1)".  The situation is not something we want.

Maybe "takeArray" is a bad design, but it was just an example. The "block input"/"slice-assignable" range idea would still work well, though.

> We could provide a "RangeStream" type which shoehorns any range into a stream, but I'd want it tucked in some shadowy corner of Phobos, not to be used except in emergencies when nothing else will do.  It should be discouraged.
>

One of my main reasons for wanting ranges as the input was to allow this sort of an interface. This looks like a usable solution for that need.

> I hope my above comments have made clear that I am not against having ranges be forcibly changed into streams.  What I don't want is ranges implicitly treated as streams.

I'd say that my idea is more about having ranges implicitly treated as stream sources rather than as true streams, but having a method to explicitly make them stream sources would still be quite usable.

Ultimately, I think that the differences between our designs boil down to having a more monolithic stream interface with an internal stream source or having a lighter-weight but more ad-hoc stream interface with an external and more exposed stream source. At this point, I'd probably be happy with either as long as they have equivalent functionality.
March 08, 2013
On Thu, 07 Mar 2013 20:52:49 -0500, BLM768 <blm768@gmail.com> wrote:


> Ultimately, I think that the differences between our designs boil down to having a more monolithic stream interface with an internal stream source or having a lighter-weight but more ad-hoc stream interface with an external and more exposed stream source. At this point, I'd probably be happy with either as long as they have equivalent functionality.

One thing to remember is that streams need to be runtime swappable.  For instance, I should be able to replace stdout with a stream of my choice.

This isn't possible if we only use compile-time API (i.e. templates).  But that doesn't preclude us from having templates and ranges on TOP of those streams.

When it is all finished, I think it won't be that bad to use.

-Steve
March 09, 2013
On 07/03/2013 12:07, Steven Schveighoffer wrote:
<snip>
> I don't really understand the need to make ranges into streams.
<snip>

Ask Walter - from what I recall it was his idea to have range-based file I/O to replace std.stream.

Thikning about it now, a range-based interface might be good for reading files of certain kinds, but isn't suited to general file I/O.

Stewart.