May 30, 2016
On Sunday, 29 May 2016 at 17:29:35 UTC, Steven Schveighoffer wrote:
> This doesn't help at all. I can still make a "transient" range with all three range primitives.
>
> There seems to be a misunderstanding about what a transient range is.
>
> byLine is a transient range that requires the front element be cacheable (I have to build the line somewhere, and reusing that buffer provides performance). Shoehorning into a popFront-only style "range" does not solve the problem. Not only that, but now I would have to cache BOTH the front element and the next one.

Ack, yea, I think see the issue.  It's

    auto a = transientRange.front;
    transientRange.popFront() // <=== now a has changed

Even if you go with the only-2-primitives one, you'd have the same problem:

    auto a = transientRange.front;
    auto b = transientRange.front;   // <=== now a has changed

My "empty + front only" idea was more inspired by the case e.g. where the range is wrapping some truly-random signal (like the current observed value of atmospheric noise, for example), it clearly doesn't solve this case.
May 30, 2016
On 05/30/2016 08:21 AM, Joseph Rushton Wakeling wrote:
> On Sunday, 29 May 2016 at 17:29:35 UTC, Steven Schveighoffer wrote:
>> This doesn't help at all. I can still make a "transient" range with
>> all three range primitives.
>>
>> There seems to be a misunderstanding about what a transient range is.
>>
>> byLine is a transient range that requires the front element be
>> cacheable (I have to build the line somewhere, and reusing that buffer
>> provides performance). Shoehorning into a popFront-only style "range"
>> does not solve the problem. Not only that, but now I would have to
>> cache BOTH the front element and the next one.
>
> Ack, yea, I think see the issue.  It's
>
>      auto a = transientRange.front;
>      transientRange.popFront() // <=== now a has changed
>
> Even if you go with the only-2-primitives one, you'd have the same problem:
>
>      auto a = transientRange.front;
>      auto b = transientRange.front;   // <=== now a has changed

That's right. The one approach that works here is to acknowledge the range has no buffering at all and require the user to provide the buffer as an argument. However, that would make for a different primitive that does not fit within the range hierarchy. -- Andrei

May 30, 2016
On 5/30/16 5:35 AM, Dicebot wrote:
> On Sunday, 29 May 2016 at 17:25:47 UTC, Steven Schveighoffer wrote:
>> What problems are solvable only by not caching the front element? I
>> can't think of any.
>
> As far as I know, currently it is done mostly for performance reasons -
> if result is fitting in the register there is no need to allocate stack
> space for the cache, or something like that. One of most annoying
> examples is map which calls lambda on each `front` call :
> https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L587-L590

Maybe my understanding of your post is incorrect. You said "It is impossible to correctly define input range without caching front which may not be always possible and may have negative performance impact."

I'm trying to figure out which cases caching makes the solution impossible.

>
>
>> And there is no way to define "transient" ranges in a way other than
>> explicitly declaring they are transient. There isn't anything inherent
>> or introspectable about such ranges.
>
> I don't really care about concept of transient ranges, it is the fact
> there is no guarantee of front stability for plain input ranges which
> worries me.

But this is inherent in languages which support mutable data. If you want data that doesn't change, require copied/unique data, or immutable data.

-Steve
May 30, 2016
On 5/30/16 8:44 AM, Andrei Alexandrescu wrote:
> On 05/30/2016 08:21 AM, Joseph Rushton Wakeling wrote:
>> On Sunday, 29 May 2016 at 17:29:35 UTC, Steven Schveighoffer wrote:
>>> This doesn't help at all. I can still make a "transient" range with
>>> all three range primitives.
>>>
>>> There seems to be a misunderstanding about what a transient range is.
>>>
>>> byLine is a transient range that requires the front element be
>>> cacheable (I have to build the line somewhere, and reusing that buffer
>>> provides performance). Shoehorning into a popFront-only style "range"
>>> does not solve the problem. Not only that, but now I would have to
>>> cache BOTH the front element and the next one.
>>
>> Ack, yea, I think see the issue.  It's
>>
>>      auto a = transientRange.front;
>>      transientRange.popFront() // <=== now a has changed
>>
>> Even if you go with the only-2-primitives one, you'd have the same
>> problem:
>>
>>      auto a = transientRange.front;
>>      auto b = transientRange.front;   // <=== now a has changed
>
> That's right. The one approach that works here is to acknowledge the
> range has no buffering at all and require the user to provide the buffer
> as an argument. However, that would make for a different primitive that
> does not fit within the range hierarchy. -- Andrei
>

Yes, this is how e.g. Linux getline works. But this leaks even more implementation details to the user, and still doesn't solve the problem, it just pushes it onto the user.

It also would make for some awkward pipelining.

Another solution is to accept a "buffer manager" that is responsible for determining how to handle the buffering.

-Steve
May 30, 2016
On 5/30/16 12:17 AM, Jonathan M Davis via Digitalmars-d wrote:
> On Sunday, May 29, 2016 13:36:24 Steven Schveighoffer via Digitalmars-d wrote:
>> On 5/27/16 9:48 PM, Jonathan M Davis via Digitalmars-d wrote:
>>> On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
>>>> So what about the convention to explicitely declare a
>>>> `.transient` enum member on a range, if the front element value
>>>> can change?
>>>
>>> Honestly, I don't think that supporting transient ranges is worth it.
>>> Every
>>> single range-based function would have to either test that the "transient"
>>> enum wasn't there or take transient ranges into account, and
>>> realistically,
>>> that isn't going to happen. For better or worse, we do have byLine in
>>> std.stdio, which has a transient front, but aside from the performance
>>> benefits, it's been a disaster.
>>
>> Wholly disagree. If we didn't cache the element, D would be a
>> laughingstock of performance-minded tests.
>
> Having byLine not copy its buffer is fine. Having it be a range is not.
> Algorithms in general just do not play well with that behavior, and I don't
> think that it's reasonable to expect them to.

I disagree. Most algorithms in std.algorithm are fine with transient ranges.

>
>>> It's way too error-prone. We now have
>>> byLineCopy to combat that, but of course, byLine is the more obvious
>>> function and thus more likely to be used (plus it's been around longer),
>>> so
>>> a _lot_ of code is going to end up using it, and a good chunk of that code
>>> really should be using byLineCopy.
>>
>> There's nothing actually wrong with using byLine, and copying on demand.
>> Why such a negative connotation?
>
> Because it does not play nicely with ranges, and aside from a few rare
> ranges like byLine that have to deal directly with I/O, transience isn't
> even useful. Having an efficient solution that plays nicely with I/O is
> definitely important, but it doesn't need to be a range, especially when it
> complicates ranges in general. byLine doesn't even work with
> std.array.array, and if even that doesn't work, I don't see how a range
> could be considered well-behaved.

Here is how I think about it: the front element is valid and stable until you call popFront. After that, anything goes for the old front.

This is entirely reasonable, and fits into many many algorithms. This isn't a functional-only language, mutation is valid in D.

>
>>> I'm of the opinion that if you want a transient front, you should just use
>>> opApply and skip ranges entirely.
>>
>> So you want to make this code invalid? Why?
>>
>> foreach(i; map!(a => a.to!int)(stdin.byLine))
>> {
>>     // process each integer
>>     ...
>> }
>>
>> You want to make me copy each line to a heap-allocated string so I can
>> parse it?!!
>
> If it's a range, then it can be passed around to other algorithms with
> impunity, and almost nothing is written with the idea that a range's front
> is transient.

Algorithms which are fine with byLine from std.algorithm.searching (not going to go through all of the submodules):

all, any, balancedParens, count, countUntil, find, canFind, findAmong (with first range being transient), skipOver (second parameter transient), startsWith, until.

algorithms which would would compile with transient ranges, but not work correctly: minCount, maxCount

Now, if minCount and maxCount could introspect transience, it could .dup the elements to make sure they were returned properly.

> There's no way to check for transience, and I don't think
> that it's even vaguely worth adding yet another range primitive that has to
> be checked for everywhere just for this case. Transience does _not_ play
> nicely with algorithms in general.

I think your understanding of this is incorrect. A range is transient by default if the type of the element allows modification via reference. It doesn't have to be checked everywhere, because most algorithms are fine with or without transience.

> Using opApply doesn't completely solve the problem (since the buffer could
> still escape - we'd need some kind of scope attribute or wrapper to fix that
> problem), but it makes it so that you can't pass such a a range around and
> run into problems with all of the algorithms that don't play nicely with it.
> So, instead, you end up with code that looks something like
>
> foreach(line; stdin.byLine())
> {
>     auto i = line.to!int();
>     ...
> }
>
> And yes, it's slightly longer, but it prevents a whole class of bugs by not
> having it be a range with a transient front.

Sure, as long as we're adding new newsgroups, let's at one that's titled "why can't I use byLine as a range", as this will be a popular topic.

>
>>> Allowing for front to be transient -
>>> whether you can check for it or not - simply is not worth the extra
>>> complications. I'd love it if we deprecated byLine's range functions, and
>>> made it use opApply instead and just declare transient ranges to be
>>> completely unsupported. If you want to write your code to have a transient
>>> front, you can obviously take that risk, but you're on your own.
>>
>> There is no way to disallow front from being transient. In fact, it
>> should be assumed that it is the default unless it's wholly a value-type.
>
> Pretty much no range-based code is written with the idea that front is
> transient.

Provably false, see above.

-Steve
May 30, 2016
On 5/30/16 12:22 AM, Jack Stouffer wrote:
> On Sunday, 29 May 2016 at 17:36:24 UTC, Steven Schveighoffer wrote:
>> Wholly disagree. If we didn't cache the element, D would be a
>> laughingstock of performance-minded tests.
>
> byLine already is a laughingstock performance wise:
> https://issues.dlang.org/show_bug.cgi?id=11810

But that's because we base our I/O on C :)

My iopipe library is 2x as fast.

> It's way faster to read the entire file into a buffer and iterate by
> line over that.

Depends on how big the entire file is.

> I have to agree with Jonathan, I see a lot of proposals in this thread
> but I have yet to see a cost/benefit analysis that's pro transient
> support. The amount of changes needed to support them is not
> commensurate to any possible benefits.

Transient ranges are already supported. Removing the ability to have transient ranges would be an unmitigated disaster. e.g. removing range support for byLine.

-Steve
May 30, 2016
On 5/29/16 11:46 PM, Alex Parrill wrote:
> On Sunday, 29 May 2016 at 17:45:00 UTC, Steven Schveighoffer wrote:
>> On 5/27/16 7:42 PM, Seb wrote:
>>
>>> So what about the convention to explicitely declare a `.transient` enum
>>> member on a range, if the front element value can change?
>>
>> enum isTransient(R) = is(typeof(() {
>>    static assert(isInputRange!R);
>>    static assert(hasIndirections(ElementType!R));
>>    static assert(!allIndrectionsImmutable!(ElementType!R)); // need to
>> write this
>> }));
>>
>
> allIndrectionsImmutable could probably just be is(T : immutable) (ie
> implicitly convertible to immutable). Value types without (or with
> immutable only) indirections should be convertible to immutable, since
> the value is being copied.
>

Yes, I think that is correct. Thanks!

-Steve
May 30, 2016
On 05/30/2016 09:30 AM, Steven Schveighoffer wrote:
> My iopipe library is 2x as fast.

Cool! What is the trick to the speedup? -- Andrei

May 30, 2016
On 05/30/2016 09:26 AM, Steven Schveighoffer wrote:
> Here is how I think about it: the front element is valid and stable
> until you call popFront. After that, anything goes for the old front.

Yah, we should have formal language in the library reference about this. -- Andrei

May 30, 2016
On Monday, 30 May 2016 at 15:12:41 UTC, Andrei Alexandrescu wrote:
> On 05/30/2016 09:30 AM, Steven Schveighoffer wrote:
>> My iopipe library is 2x as fast.
>
> Cool! What is the trick to the speedup? -- Andrei

Direct buffer access and not using FILE * as a base.

-Steve