May 30, 2016
On 05/30/2016 11:22 AM, Steven Schveighoffer wrote:
> On Monday, 30 May 2016 at 15:12:41 UTC, Andrei Alexandrescu wrote:
>> On 05/30/2016 09:30 AM, Steven Schveighoffer wrote:
>>> My iopipe library is 2x as fast.
>>
>> Cool! What is the trick to the speedup? -- Andrei
>
> Direct buffer access and not using FILE * as a base.

So what primitives do you use? The lower-level descriptor-based primitives open() and friends? http://linux.die.net/man/2/open

What do you mean by direct buffer access?

What is the relative contributions of these (and possibly other) design decisions to the overall speed?

How can we integrate some of these in std.stdio without breaking backward compatibility, or offer additional artifacts that integrate seamlessly and don't need to be compatible?


Andrei

May 30, 2016
On 05/30/2016 12:22 AM, Jack Stouffer wrote:
> On Sunday, 29 May 2016 at 17:36:24 UTC, Steven Schveighoffer wrote:
>> Wholly disagree. If we didn't cache the element, D would be a
>> laughingstock of performance-minded tests.
>
> byLine already is a laughingstock performance wise:
> https://issues.dlang.org/show_bug.cgi?id=11810

That has been fixed a long time ago by https://github.com/dlang/phobos/pull/3089, I just marked it as fixed.

> It's way faster to read the entire file into a buffer and iterate by
> line over that.

Sometimes yes, but that's a change of approach with its own tradeoffs (e.g. what if you want to stop early, the file is very large etc. etc).


Andrei

May 30, 2016
On Mon, May 30, 2016 at 09:26:35AM -0400, Steven Schveighoffer via Digitalmars-d wrote:
> On 5/30/16 12:17 AM, Jonathan M Davis via Digitalmars-d wrote:
[...]
> > Having byLine not copy its buffer is fine. Having it be a range is not.  Algorithms in general just do not play well with that behavior, and I don't think that it's reasonable to expect them to.
> 
> I disagree. Most algorithms in std.algorithm are fine with transient ranges.

Yes, some years ago I submitted a series of PRs to fix poorly-written code in Phobos that unnecessarily depended on .front not changing when popFront is called.  I had found that in the majority of cases it was actually unnecessary to require non-transience; most algorithms in std.algorithm, properly-written, work just fine with transient ranges.

For the few algorithms that don't work with transient ranges, arguably they ought to require forward ranges and use .save instead. The only exception I can think of right now is array().


[...]
> Here is how I think about it: the front element is valid and stable until you call popFront. After that, anything goes for the old front.
> 
> This is entirely reasonable, and fits into many many algorithms. This isn't a functional-only language, mutation is valid in D.

I agree, most range algorithms work just fine with transient ranges.  I consider it a reasonable consequence of defensive programming: don't assume anything about the API beyond what it actually guarantees. The current range API says that the current range element is accessible via .front; from this one may conclude that the value returned by .front should be valid immediately afterwards. However, the API says nothing about this value lasting past the next call to .popFront, and in fact it specifies that .front will return something different afterwards, so defensively-written code should assume the worst and regard the previous value of .front as invalidated.

Understandably, writing code this way is more constrained and less convenient, but in a standard library I'd expect that code should be at least of this calibre, if not better.  And as far as I have found, the majority of range algorithms in Phobos *can* be written this way and work perfectly fine with transient ranges.  As I've already said, most of the remaining algorithms can be implemented by requiring forward ranges and using .save instead of assuming that .front is cacheable -- .save is something guaranteed by the API, and therefore defensively written generic code should use it rather than making unfounded assumptions about .front.


[...]
> > If it's a range, then it can be passed around to other algorithms with impunity, and almost nothing is written with the idea that a range's front is transient.

And nothing about the range API guarantees that .front is *not* transient either.  Generic code should not assume either way.  It should be written with the minimal assumptions necessary to work.


[...]
> > There's no way to check for transience, and I don't think that it's even vaguely worth adding yet another range primitive that has to be checked for everywhere just for this case. Transience does _not_ play nicely with algorithms in general.

I disagree. Of the algorithms I surveyed in Phobos at the time, I found that the majority of them can be written in such a way that they are transience-agnostic.  Only a small set of algorithms actually require non-transience.  That's hardly "not play nicely with algorithms in general".


[...]
> > Pretty much no range-based code is written with the idea that front is transient.
> 
> Provably false, see above.
[...]

I'd argue that range-based generic code that assumes non-transience is inherently buggy, because generic code ought not to make any assumptions beyond what the range API guarantees. Currently, the range API does not guarantee non-transience, therefore code that assumes so is broken by definition.  Just because they happen to work most of the time does not change the fact that they're written wrongly.


T

-- 
Just because you can, doesn't mean you should.
May 30, 2016
On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer wrote:
> On 5/30/16 5:35 AM, Dicebot wrote:
>> On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer wrote:
>>> What problems are solvable only by not caching the front element? I
>>> can't think of any.
>>
>> As far as I know, currently it is done mostly for performance reasons -
>> if result is fitting in the register there is no need to allocate stack
>> space for the cache, or something like that. One of most annoying
>> examples is map which calls lambda on each `front` call :
>> https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590
>
> Maybe my understanding of your post is incorrect. You said "It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact."
>
> I'm trying to figure out which cases caching makes the solution impossible.

One case is wrapping a network stream: a TCP input range that yields ubyte[] chunks of data as they are read from the socket, a joiner to join the blocks together, a map that converts the bytes to characters, and take to consume each message.

The issue is that, in a naive implementation, creating the TCP range requires reading a chunk from the socket to populate front. Since I usually set up my stream objects, before using them, the receiver range will try to receive a chunk, but I haven't sent a request to the server yet, so it would block indefinitely.

Same issue with popFront: I need to pop the bytes I've already used for the previous request, but calling popFront would mean waiting for the response for the next request which I haven't sent out yet.

The solution I used was to delay actually receiving the chunk until front was called, which complicates the implementation a bit.
May 31, 2016
On 5/30/16 2:32 PM, Alex Parrill wrote:
> On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer wrote:

>> I'm trying to figure out which cases caching makes the solution
>> impossible.
>
> One case is wrapping a network stream: a TCP input range that yields
> ubyte[] chunks of data as they are read from the socket, a joiner to
> join the blocks together, a map that converts the bytes to characters,
> and take to consume each message.
>
> The issue is that, in a naive implementation, creating the TCP range
> requires reading a chunk from the socket to populate front. Since I
> usually set up my stream objects, before using them, the receiver range
> will try to receive a chunk, but I haven't sent a request to the server
> yet, so it would block indefinitely.
>
> Same issue with popFront: I need to pop the bytes I've already used for
> the previous request, but calling popFront would mean waiting for the
> response for the next request which I haven't sent out yet.
>
> The solution I used was to delay actually receiving the chunk until
> front was called, which complicates the implementation a bit.

This is not the same issue. Note that populating on call to front instead of on popFront is still valid within the definition of a range (though I think the most straightforward way to implement a range is to populate only on popFront).

What makes this awkward is that you need to check to see if front has already been populated, which also kind of requires a separate flag (though you may be able to do this without an extra flag). However, you still need a place to put the data, and using a buffer for this is a good fit.

I would suggest that custom i/o patterns like this (where calling popFront is coupled to some other operation) do not fit ranges very well. In the iopipe library I wrote, I have two separate operations for "fetch more data" and "forget the oldest data", which are based on the requirements for i/o. These can be cajoled into a range interface, of course, but those operations suit i/o requirements much better than front/popFront/empty.

-Steve
May 31, 2016
On 5/30/16 11:46 AM, Andrei Alexandrescu wrote:
> On 05/30/2016 11:22 AM, Steven Schveighoffer wrote:
>> On Monday, 30 May 2016 at 15:12:41 UTC, Andrei Alexandrescu wrote:
>>> On 05/30/2016 09:30 AM, Steven Schveighoffer wrote:
>>>> My iopipe library is 2x as fast.
>>>
>>> Cool! What is the trick to the speedup? -- Andrei
>>
>> Direct buffer access and not using FILE * as a base.
>
> So what primitives do you use? The lower-level descriptor-based
> primitives open() and friends? http://linux.die.net/man/2/open

Yes. That is not the focus of my library, though. I have a very crude and basic I/O type that just wraps the system calls into a nice API.

> What do you mean by direct buffer access?

I mean instead of pulling data out one character at a time from another buffer into your buffer, you have direct access, using all the speedups that CPUs provide when accessing arrays.

I'm also allowing the compiler to inline as much as possible because all the code is available.

> What is the relative contributions of these (and possibly other) design
> decisions to the overall speed?

I don't know. I started with a vastly different design and it ended up being twice as fast. So it's difficult to say for certain what are the major factor that is slowing down std.stdio.

At DConf, Adrian Matoga presented his very similar flod library, and I was surprised when it was faster than mine at processing lines, even though he was still using std.stdio.File. So I found a few reasons why mine was slower and fixed them. The things that made it much faster were:

1. Switched to direct pointer access (i.e. doing while(*ptr++ != lineTerminator) {} ) when the buffer type was an array.
2. I was stupidly accessing the member of the struct which held the terminator instead of allocating a stack variable for it. After doing that (and making it immutable), the compiler was much better informed on how to optimize.

There may even be more opportunities for speedup, but I'm pretty satisfied with it at the moment.

> How can we integrate some of these in std.stdio without breaking
> backward compatibility,

There are 2 main issues with FILE *:

1) it does not provide buffer access, so you must rely on things like getline if they exist. But these have their own problems (i.e. do not support unicode, require C-malloc'd buffer)

2) All calls are opaque, so no inlining can occur. This is especially painful if you have to call fgetc in a tight loop. This goes back to the buffer access issue -- FILE * only lets you look ahead one character.

So I think the only real way to provide a decent performance improvement without breaking C compatibility is to write code that is aware of the implementation details of the opaque FILE * type. Just like getline.

> or offer additional artifacts that integrate
> seamlessly and don't need to be compatible?

I had a grand plan to replace stdio with something that "evolves" to faster D based i/o when you did D-like things. But it's very complex, and each addition of code to stdio makes things more complex.

In reality, I think we only need to be C compatible for stdout (possibly only via write[f][ln]), and possibly stdin. There's no reason we need to use C i/o for any other uses. What makes things difficult is that everything in Phobos uses std.stdio.File for all i/o and that uses C i/o.

-Steve
May 31, 2016
On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer wrote:
> On 5/30/16 5:35 AM, Dicebot wrote:
>> On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer wrote:
>>> What problems are solvable only by not caching the front element? I
>>> can't think of any.
>>
>> As far as I know, currently it is done mostly for performance reasons -
>> if result is fitting in the register there is no need to allocate stack
>> space for the cache, or something like that. One of most annoying
>> examples is map which calls lambda on each `front` call :
>> https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590
>
> Maybe my understanding of your post is incorrect. You said "It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact."
>
> I'm trying to figure out which cases caching makes the solution impossible.

There are two separate points here:

1) Current definition of input range (most importantly, the fact `front` has to be @property-like) implies `front` to always return the same result until `popFront` is called.

2) For ranges that call predicates on elements to evaluate next element this can only be achieved by caching - predicates are never required to be pure.

3) But caching is sub-optimal performance wise and thus bunch of Phobos algorithms violate `front` consistency / cheapness expectation evaluating predicates each time it is called (liked map).

>> I don't really care about concept of transient ranges, it is the fact
>> there is no guarantee of front stability for plain input ranges which
>> worries me.
>
> But this is inherent in languages which support mutable data. If you want data that doesn't change, require copied/unique data, or immutable data.

This has nothing to do with mutability, look at this totally broken example:

import std.random;
import std.algorithm;
import std.stdio;
import std.range;


void main ( )
{
	auto range = only(0).map!(x => uniform(0, 10));
	writeln(range.front);
	writeln(range.front);
	writeln(range.front);
}

May 31, 2016
On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d wrote:
> I'd argue that range-based generic code that assumes non-transience is inherently buggy, because generic code ought not to make any assumptions beyond what the range API guarantees. Currently, the range API does not guarantee non-transience, therefore code that assumes so is broken by definition.  Just because they happen to work most of the time does not change the fact that they're written wrongly.

Technically, the range API doesn't even require that front return the same value every time that it's called, because isInputRange can't possibly test for it. That doesn't mean that we don't expect that ranges behave that way. It's never been the case that we agreed that transient ranges were acceptable. In fact, it's pretty much the opposite. There was some discussion that maybe we'd consider it legitimate for basic input ranges given that we already had byLine doing it, and we didn't want to require that it copy its buffer, but previous discussions on it made it quite clear that most of us (Andrei included) thought that it should not be allowed for forward ranges.

It's been the case for ages now that ranges with a transient front are not explicitly supported. They happen to work with some algorithms (and you made changes to Phobos to make them work with more algorithms there), but they're not checked for, and there are algorithms that they don't and can't work with. The simple fact that std.array.array doesn't work with them is pretty damning IMHO.

We already have too many problems with unspecified behavior in ranges without adding transient front into the mix. It should be killed with fire IMHO.

- Jonathan M Davis

May 31, 2016
On 5/31/16 10:46 AM, Dicebot wrote:
> On Monday, 30 May 2016 at 12:53:07 UTC, Steven Schveighoffer wrote:
>> On 5/30/16 5:35 AM, Dicebot wrote:
>>> On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer wrote:
>>>> What problems are solvable only by not caching the front element? I
>>>> can't think of any.
>>>
>>> As far as I know, currently it is done mostly for performance reasons -
>>> if result is fitting in the register there is no need to allocate stack
>>> space for the cache, or something like that. One of most annoying
>>> examples is map which calls lambda on each `front` call :
>>> https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590
>>>
>>
>> Maybe my understanding of your post is incorrect. You said "It is
>> impossible to correctly define input range without caching front which
>> may not be always possible and may have negative performance impact."
>>
>> I'm trying to figure out which cases caching makes the solution
>> impossible.
>
> There are two separate points here:
>
> 1) Current definition of input range (most importantly, the fact `front`
> has to be @property-like) implies `front` to always return the same
> result until `popFront` is called.

Regardless of property-like or not, this should be the case. Otherwise, popFront makes no sense.

> 2) For ranges that call predicates on elements to evaluate next element
> this can only be achieved by caching - predicates are never required to
> be pure.

Or, by not returning different things from your predicate.

This is like saying RedBlackTree is broken when I give it a predicate of "a == b".

> 3) But caching is sub-optimal performance wise and thus bunch of Phobos
> algorithms violate `front` consistency / cheapness expectation
> evaluating predicates each time it is called (liked map).

I don't think anything defensively caches front in case the next call to front is different, unless that's specifically the reason for the range.

>>> I don't really care about concept of transient ranges, it is the fact
>>> there is no guarantee of front stability for plain input ranges which
>>> worries me.
>>
>> But this is inherent in languages which support mutable data. If you
>> want data that doesn't change, require copied/unique data, or
>> immutable data.
>
> This has nothing to do with mutability, look at this totally broken
> example:
>
> import std.random;
> import std.algorithm;
> import std.stdio;
> import std.range;
>
>
> void main ( )
> {
>     auto range = only(0).map!(x => uniform(0, 10));
>     writeln(range.front);
>     writeln(range.front);
>     writeln(range.front);
> }
>

I think we should be aware that the range API doesn't prevent bugs of all kinds. There's only so much analysis the compiler can do.

-Steve
May 31, 2016
On 5/31/16 11:45 AM, Jonathan M Davis via Digitalmars-d wrote:
> On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d wrote:
>> I'd argue that range-based generic code that assumes non-transience is
>> inherently buggy, because generic code ought not to make any
>> assumptions beyond what the range API guarantees. Currently, the range
>> API does not guarantee non-transience, therefore code that assumes so is
>> broken by definition.  Just because they happen to work most of the time
>> does not change the fact that they're written wrongly.
>
> Technically, the range API doesn't even require that front return the same
> value every time that it's called, because isInputRange can't possibly test
> for it.

The API doesn't require it mechanically, but the API does require it semantically (what does popFront mean if front changes automatically?). If front returns different things, I'd say that's a bug in your range construction.

> That doesn't mean that we don't expect that ranges behave that way.
> It's never been the case that we agreed that transient ranges were
> acceptable.

I want to note here that transient ranges are different from what you identified above (front returning different values each time). I know that you know that, but the way you said this implies that's what transient ranges are.

> In fact, it's pretty much the opposite. There was some
> discussion that maybe we'd consider it legitimate for basic input ranges
> given that we already had byLine doing it, and we didn't want to require
> that it copy its buffer, but previous discussions on it made it quite clear
> that most of us (Andrei included) thought that it should not be allowed for
> forward ranges.

I can see why forward ranges make it easier to avoid transience, since by definition the data should be available even if another range has already popFront'd by. But I don't think this needs to be a requirement. What I'd say is that as long as you haven't called popFront on the range, front should be stable. What happens elsewhere is implementation defined.

> It's been the case for ages now that ranges with a transient front are not
> explicitly supported. They happen to work with some algorithms (and you made
> changes to Phobos to make them work with more algorithms there), but they're
> not checked for, and there are algorithms that they don't and can't work
> with. The simple fact that std.array.array doesn't work with them is pretty
> damning IMHO.

array can work with them, you just have to copy each element as you iterate it (or otherwise transform the data into something that's persistent when copied).

But there is no requirement I know of that says a range has to always provide the expected outcome with std.array.array.

> We already have too many problems with unspecified behavior in ranges
> without adding transient front into the mix. It should be killed with fire
> IMHO.

It can't be killed, because it's ALREADY supported. Way too much code would break, and I think you would piss off a lot of D developers (myself included).

But I don't think transient ranges are that horrific. For instance, stdin.byLine.array probably doesn't do what you want. But it does actually work and do something that's defined and reproducible and will not corrupt memory. It's a bug needing to be fixed because you didn't understand what it was doing.

-Steve