iopipe alpha 0.0.1 version (page 3)

October 23, 2017

Re: iopipe alpha 0.0.1 version

Posted by Steven Schveighoffer
in reply to Martin Nowak

Permalink

Steven Schveighoffer

Posted in reply to Martin Nowak

Permalink

On 10/21/17 6:33 AM, Martin Nowak wrote:
> On 10/19/2017 03:12 PM, Steven Schveighoffer wrote:
>> On 10/19/17 7:13 AM, Martin Nowak wrote:
>>> On 10/13/2017 08:39 PM, Steven Schveighoffer wrote:
>>>> What would be nice is a mechanism to detect this situation, since the
>>>> above is both un-@safe and incorrect code.
>>>>
>>>> Possibly you could instrument a window with a mechanism to check to see
>>>> if it's still correct on every access, to be used when compiled in
>>>> non-release mode for checking program correctness.
>>>>
>>>> But in terms of @safe code in release mode, I think the only option is
>>>> really to rely on the GC or reference counting to allow the window to
>>>> still exist.
>>>
>>> We should definitely find a @nogc solution to this, but it's a good
>>> litmus test for the RC compiler support I'll work on.
>>> Why do IOPipe have to hand over the window to the caller?
>>> They could just implement the RandomAccessRange interface themselves.
>>>
>>> Instead of
>>> ```d
>>> auto w = f.window();
>>> f.extend(random());
>>> w[0];
>>> ```
>>> you could only do
>>> ```d
>>> f[0];
>>> f.extend(random());
>>> f[0]; // bug, but no memory corruption
>>> ```
>>
>> So the idea here (If I understand correctly) is to encapsulate the
>> window into the pipe, such that you don't need to access the buffer
>> separately? I'm not quite sure because of that last comment. If f[0] is
>> equivalent to previous code f.window[0], then the second f[0] is not a
>> bug, it's valid, and accessing the first element of the window (which
>> may have moved).
> 
> The above sample with the window is a bug and memory corruption because
> of iterator/window invalidation by extend.
> If you didn't thought of the invalidation, then the latter example would
> still be a bug to you, but not a memory corruption.

The issue with the original code is that the window may move *within the buffer*. That is, if your current window is looking at the last 1k of a 2M buffer, and you extend, the buffer manager may move the data from the end of the buffer to the beginning, and re-fill the rest of the buffer with new data from the source.

In this case, the old window reference that you saved is pointing at completely different data. That is, f.window[0] may not be the same as w[0]. Still @safe, but not correct.

Whereas in your new code, you are looking at the correct window data every time.

>> Some downsides however:
>>
>> 1. iopipes can be complex and windows are not. They were a fixed view of
>> the current buffer. The idea that I can fetch a window of data from an
>> iopipe and then deal simply with that part of the data was attractive.
> 
> You could still have a window internally and just forward to that.

My attention is really on algorithms that may use the range interface. It may be less efficient and maybe not even correct to use the whole iopipe as a range. At first look, I wanted to create an abstraction on the data itself, and then build a range on top of it. It's a different way to look at it.

>> 2. The iopipe is generally not copyable once usage begins. In other
>> words, the feature of ranges that you can copy them and they just work,
>> would be difficult to replicate in iopipe.
> 
> That's a general problem. Unique ownership is really useful, but most
> phobos range methods don't care, and assume copying is implicit saving.
> Not too nice and I guess this will bite us again with RC/Unique/Weak.
> 
> The current workaround for this is `refRange`.

There is actually quite a bit of this problem in Phobos. Most range wrapper functions do not take ranges by reference, but by value, making copies everywhere. However, most of the time, this is only during construction, where the copy is a move.

But many of the functions do not actually move the parameters into the wrapper, so disabling postblit would be horrific.

iopipe, unfortunately, follows that precedent. I should probably correct it.

>> A possible way forward could be:
>>
>> * iopipe is a random-access range (not necessarily a forward range).
>> * iopipe.window returns a non-extendable window of the buffer itself,
>> which is a forward/random-access range. If backed by the GC or some form
>> of RC, everything is @safe.
>> * Functions which now take iopipes could be adjusted to take
>> random-access ranges, and if they are also iopipes, could use the extend
>> features to get more data.
>> * iopipe.release(size_t) could be hooked by popFrontN. I don't like the
>> idea of supporting slicing on iopipes, for the non-forward aspect of
>> iopipe. Much better to have an internal hook that modifies the range
>> in-place.
>>
>> This would make iopipes fit right into the range hierarchy, and
>> therefore could be integrated easily into Phobos.
> 
> I made an interesting experiment with buffered input ranges quite a
> while ago.
> https://gist.github.com/MartinNowak/1257196
> 
> This would use popFront to fetch new data and ref-counts a list of
> buffers depending on older saved ranges still using earlier buffers.
> With a bit of creative use, the existing Range primitives could be used
> to implement infinite look-ahead.
> 
> auto beg = rng.save;
> auto end = rng.find("bla");
> auto window = beg[0 .. end]; // get a random access window

This is similar to Dmitry's attempt as well (which unfortunately is no longer available that I can see), but his did not use the range primitives I think.

It's solving a different problem than iopipe is solving. I plan on adding iopipe-on-range capability soon as well, since many times, all you have is a range.

-Steve

On Monday, 23 October 2017 at 16:34:19 UTC, Steven Schveighoffer wrote: > On 10/21/17 6:33 AM, Martin Nowak wrote: >> On 10/19/2017 03:12 PM, Steven Schveighoffer wrote: >>> On 10/19/17 7:13 AM, Martin Nowak wrote: >>>> On 10/13/2017 08:39 PM, Steven Schveighoffer wrote: > It's solving a different problem than iopipe is solving. I plan on adding iopipe-on-range capability soon as well, since many times, all you have is a range. You mean chunk based processing vs. infinite lookahead for parsing? They both provide a similar API, sth. to extend the current window and sth. to release data. The example input here was an input range, but it's read in page sizes and could as well be a socket.

On 10/24/17 5:32 AM, Martin Nowak wrote: > On Monday, 23 October 2017 at 16:34:19 UTC, Steven Schveighoffer wrote: >> On 10/21/17 6:33 AM, Martin Nowak wrote: >>> On 10/19/2017 03:12 PM, Steven Schveighoffer wrote: >>>> On 10/19/17 7:13 AM, Martin Nowak wrote: >>>>> On 10/13/2017 08:39 PM, Steven Schveighoffer wrote: >> It's solving a different problem than iopipe is solving. I plan on adding iopipe-on-range capability soon as well, since many times, all you have is a range. > > You mean chunk based processing vs. infinite lookahead for parsing? > They both provide a similar API, sth. to extend the current window and sth. to release data. Yes, definitely. > The example input here was an input range, but it's read in page sizes and could as well be a socket. iopipe provides "infinite" lookahead, which is central to its purpose. The trouble with bolting that on top of ranges, as you said, is that we have to copy everything out of the range, which necessarily buffers somehow (if it's efficient i/o), so you are double buffering. iopipe's purpose is to get rid of this unnecessary buffering. This is why it's a great fit for being the *base* of a range. In other words, if you want something to have optional lookahead and range support, it's better to start out with an extendable buffering type like an iopipe, and bolt ranges on top, vs. the other way around. -Steve

On Tuesday, 24 October 2017 at 14:47:02 UTC, Steven Schveighoffer wrote: > iopipe provides "infinite" lookahead, which is central to its purpose. The trouble with bolting that on top of ranges, as you said, is that we have to copy everything out of the range, which necessarily buffers somehow (if it's efficient i/o), so you are double buffering. iopipe's purpose is to get rid of this unnecessary buffering. This is why it's a great fit for being the *base* of a range. > > In other words, if you want something to have optional lookahead and range support, it's better to start out with an extendable buffering type like an iopipe, and bolt ranges on top, vs. the other way around. Arguably this it is somewhat hacky to use a range as end marker for slicing sth., but you'd get the same benefit, access to the random buffer with zero-copying. auto beg = rng.save; // save current position auto end = rng.find("bla"); // lookahead using popFront auto window = beg[0 .. end]; // get a random access window to underlying buffer So basically forward ranges with slicing. At least that would require to extend all algorithms with `extend` support, though likely you could have a small extender proxy range for IOPipes. Note that rng could be a wrapper around unbuffered IO reads.

On Tuesday, 24 October 2017 at 19:05:02 UTC, Martin Nowak wrote: > On Tuesday, 24 October 2017 at 14:47:02 UTC, Steven Schveighoffer wrote: >> iopipe provides "infinite" lookahead, which is central to its purpose. The trouble with bolting that on top of ranges, as you said, is that we have to copy everything out of the range, which necessarily buffers somehow (if it's efficient i/o), so you are double buffering. iopipe's purpose is to get rid of this unnecessary buffering. This is why it's a great fit for being the *base* of a range. >> >> In other words, if you want something to have optional lookahead and range support, it's better to start out with an extendable buffering type like an iopipe, and bolt ranges on top, vs. the other way around. > > Arguably this it is somewhat hacky to use a range as end marker for slicing sth., but you'd get the same benefit, access to the random buffer with zero-copying. > > auto beg = rng.save; // save current position > auto end = rng.find("bla"); // lookahead using popFront > auto window = beg[0 .. end]; // get a random access window to underlying buffer I had a design like that except save returned a “mark” (not full range) and there was a slice primitive. It even worked with patched std.regex, but at a non-zero performance penalty. I think that maintaining the illusion of a full copy of range when you do “save” for buffered I/O stream is too costly. Because a user can now legally advance both - you need to RC buffers behind the scenes with separate “pointers” for each range that effectively pin them. > So basically forward ranges with slicing. > At least that would require to extend all algorithms with `extend` support, though likely you could have a small extender proxy range for IOPipes. > > Note that rng could be a wrapper around unbuffered IO reads.

On Monday, 16 October 2017 at 20:58:43 UTC, Martin Nowak wrote: > On Friday, 13 October 2017 at 17:08:18 UTC, Steven Schveighoffer wrote: >>> I keep https://github.com/MartinNowak/bloom also as example/scaffold repo, it's using an automated docs setup with gh-branches. >>> >>> Just create a doc deployment token (https://github.com/settings/tokens) with public_repo access and store that encrypted in your .travis-ci.yml. >> >> Martin, I would appreciate and I think many people would, a blog/tutorial on how to do this. > > Indeed, that already crossed my mind a couple of times ;). I am searching for a blog like that. Is it been written yet?

Forums