April 30, 2020
On Thursday, 30 April 2020 at 16:21:05 UTC, Steven Schveighoffer wrote:
> I would say part of the issue is that you are doing all your work in front and not popFront.
>
> What happens is that Appender is a pointer-to-implementation struct, and the compiler will allocate the first one shared amongst all initial StreamRange instances.
>
> On your first call to front, it's going to utilize that shared one. Then on the call to popFront, it will reset it to another one.
>
> For the second unittest, in your first call to front, it notices that it's already been filled, so it doesn't do any work (and returns the existing buffer).

Interesting.  I'll take this into account.  I was putting the work into front because I didn't want to do the work until it was requested.  Putting the work in popFront makes more sense in some ways, but the fact you have to call it before getting any records seems like it would break normal range algorithms.  (Please correct me if I'm wrong)  I'm wondering if putting the work into it's own method and calling it one from the constructor and from popFront the rest of the way.

> another problem, your empty condition is based on the input, which violates range expectations. indeed, on the first call to front, the range all of a sudden becomes empty.

I was about to argue the point, but after thinking about the use of popFront to execute the work, that would work.

> 3. You don't need a further Range parameter for the nested struct, it can use the template parameter from the containing function.
> 4. Make it a static struct, or else it will retain a pointer to the function stack frame needlessly.

I'll definitely try that out.  Regarding 3, I think that was in the example code I used, so I just went with it.

April 30, 2020
On Thursday, 30 April 2020 at 16:21:05 UTC, Steven Schveighoffer wrote:
> I would say part of the issue is that you are doing all your work in front and not popFront.
>
[...]
>
> I'd say:
>
> 1. move your work to the popFront function (you then need to call popFront once before returning the range in your factory method).
> 2. change empty to check if the buffer is empty instead of the input.

Doing work in popFront instead of front is usually an anti-pattern, since it forces eager evaluation of the next element even when that element is never used. You should only do this if there's no reasonable way to avoid it.
April 30, 2020
On 4/30/20 2:03 PM, Casey wrote:
> Interesting.  I'll take this into account.  I was putting the work into front because I didn't want to do the work until it was requested.  Putting the work in popFront makes more sense in some ways, but the fact you have to call it before getting any records seems like it would break normal range algorithms.  (Please correct me if I'm wrong)  I'm wondering if putting the work into it's own method and calling it one from the constructor and from popFront the rest of the way.

You don't necessarily have to do it in a separate function, because until you return it from your factory method (the only way you can create such a thing), it technically doesn't need to be "sane".

So what I mean is at the bottom of your function you have:

auto readStream(Range)(auto ref Range r) if (isInputRange!(Unqual!Range))
{
   ... // define your struct

   auto result = StreamRange(r);
   if(!r.empty) result.popFront(); // prime range
   return result;
}

There is a valid argument to be made about not doing any work until requested. You could do this inside front, but in that case, I would say you have an additional boolean that checks if it has ever been run, and run popFront once (setting that bool to false). But I don't necessarily like that, because that puts an extra burden on front. front is supposed to be non-mutating, called as much as you need (ideally it should be inout). Practically, you can make it do work, but really the bulk of the work should be done in popFront.

Note that you will have to check for priming in popFront as well, because if you did for instance:

auto r = readStream(...);
r.popFront(); // skip first element?

This would not do what is expected. In fact, thinking about it some more, you should be able to call popFront as many times as you want and have it skip that many elements without calling front once. Your range will not do that.

-Steve
April 30, 2020
On Thu, Apr 30, 2020 at 06:05:55PM +0000, Paul Backus via Digitalmars-d-learn wrote: [...]
> Doing work in popFront instead of front is usually an anti-pattern, since it forces eager evaluation of the next element even when that element is never used. You should only do this if there's no reasonable way to avoid it.

Really?? IME, usually you *need* to do work in popFront instead of front, because otherwise .empty wouldn't be able to tell whether there is another element or not.  E.g., in filtering a range based on some criterion on each element, you can't defer computing the next element until .front because you can't predict whether there will be another element that won't be dropped by popFront.

Also, for ranges based on generator functions, if .front is lazy then you need to keep extra baggage around your range to indicate whether or not the generator has been invoked yet; it's easier to just always compute the next element eagerly and cache it, and .front just returns the cached data.

Even when the range involves some expensive per-element computation, I find that it's simpler to just compute and cache in .popFront instead of adding extra baggage to .front to know when the computation has already been performed.

I'm hard-pressed to come up with an example where deferring computation to .front is a good idea!


T

-- 
A mathematician is a device for turning coffee into theorems. -- P. Erdos
April 30, 2020
On Thursday, 30 April 2020 at 18:30:14 UTC, H. S. Teoh wrote:
> On Thu, Apr 30, 2020 at 06:05:55PM +0000, Paul Backus via Digitalmars-d-learn wrote: [...]
>> Doing work in popFront instead of front is usually an anti-pattern, since it forces eager evaluation of the next element even when that element is never used. You should only do this if there's no reasonable way to avoid it.
>
> Really?? IME, usually you *need* to do work in popFront instead of front, because otherwise .empty wouldn't be able to tell whether there is another element or not.  E.g., in filtering a range based on some criterion on each element, you can't defer computing the next element until .front because you can't predict whether there will be another element that won't be dropped by popFront.
>
> Also, for ranges based on generator functions, if .front is lazy then you need to keep extra baggage around your range to indicate whether or not the generator has been invoked yet; it's easier to just always compute the next element eagerly and cache it, and .front just returns the cached data.
>
> Even when the range involves some expensive per-element computation, I find that it's simpler to just compute and cache in .popFront instead of adding extra baggage to .front to know when the computation has already been performed.
>
> I'm hard-pressed to come up with an example where deferring computation to .front is a good idea!
>
>
> T

Well, I just remembered that I had a tab open to this: http://ddili.org/ders/d.en/ranges.html

Reading, you see the following:

* empty: specifies whether the range is empty; it must return true when the range is considered to be empty, and false otherwise
* front: provides access to the element at the beginning of the range
* popFront(): shortens the range from the beginning by removing the first element

Looking at that, if popFront shortens the range, then to me it sounds like the work should be done in popFront.
April 30, 2020
On 4/30/20 2:05 PM, Paul Backus wrote:
> On Thursday, 30 April 2020 at 16:21:05 UTC, Steven Schveighoffer wrote:
>> I would say part of the issue is that you are doing all your work in front and not popFront.
>>
> [...]
>>
>> I'd say:
>>
>> 1. move your work to the popFront function (you then need to call popFront once before returning the range in your factory method).
>> 2. change empty to check if the buffer is empty instead of the input.
> 
> Doing work in popFront instead of front is usually an anti-pattern, since it forces eager evaluation of the next element even when that element is never used. You should only do this if there's no reasonable way to avoid it.

In the general scheme of things, this only makes sense for the first front access or empty access. From then on, popFront should do the work. In most cases, you can't tell a range is empty unless you've done the work to determine the next element. After you call popFront, you need to call empty before calling front, or else you may be using an invalid range. So all the work is going to be done anyway.

A good example, byLine. How do you know that there is no more data until you attempt to read the next line from the stream? At which point, you have to do all the work.

-Steve

P.S. I see H.S. Teoh said much of the same thing, but I had already typed this out before reading it ;)
April 30, 2020
On 4/30/20 2:30 PM, H. S. Teoh wrote:
> On Thu, Apr 30, 2020 at 06:05:55PM +0000, Paul Backus via Digitalmars-d-learn wrote:
> [...]
>> Doing work in popFront instead of front is usually an anti-pattern,
>> since it forces eager evaluation of the next element even when that
>> element is never used. You should only do this if there's no
>> reasonable way to avoid it.
> 
> Really?? IME, usually you *need* to do work in popFront instead of
> front, because otherwise .empty wouldn't be able to tell whether there
> is another element or not.  E.g., in filtering a range based on some
> criterion on each element, you can't defer computing the next element
> until .front because you can't predict whether there will be another
> element that won't be dropped by popFront.
> 
> Also, for ranges based on generator functions, if .front is lazy then
> you need to keep extra baggage around your range to indicate whether or
> not the generator has been invoked yet; it's easier to just always
> compute the next element eagerly and cache it, and .front just returns
> the cached data.
> 
> Even when the range involves some expensive per-element computation, I
> find that it's simpler to just compute and cache in .popFront instead of
> adding extra baggage to .front to know when the computation has already
> been performed.
> 
> I'm hard-pressed to come up with an example where deferring computation
> to .front is a good idea!

There could be expensive operations done to construct the element that aren't necessary to iterate the range, but generally you achieve this by using something like map and cache.

But really the only valid use case for this is if you create a range but never use it. Some may need this, but I think a wrapper would be a better solution than enforcing a certain usage pattern.

-Steve
April 30, 2020
On Thursday, 30 April 2020 at 18:30:14 UTC, H. S. Teoh wrote:
> On Thu, Apr 30, 2020 at 06:05:55PM +0000, Paul Backus via Digitalmars-d-learn wrote: [...]
>> Doing work in popFront instead of front is usually an anti-pattern, since it forces eager evaluation of the next element even when that element is never used. You should only do this if there's no reasonable way to avoid it.
>
> Really?? IME, usually you *need* to do work in popFront instead of front, because otherwise .empty wouldn't be able to tell whether there is another element or not.  E.g., in filtering a range based on some criterion on each element, you can't defer computing the next element until .front because you can't predict whether there will be another element that won't be dropped by popFront.

There are certainly cases where it can't be avoided. Filter is one; file I/O (as Steven Schveighoffer pointed out) is another one. Obviously, you gotta do what you gotta do. Still, I think that as long as work *can* be deferred to .front, it should be. That's the essence of lazy evaluation: only do your computation once you're absolutely sure it's necessary.

> Also, for ranges based on generator functions, if .front is lazy then you need to keep extra baggage around your range to indicate whether or not the generator has been invoked yet; it's easier to just always compute the next element eagerly and cache it, and .front just returns the cached data.

std.range.generate is actually a perfect example of the problem with doing work in popFront. Because it has to call popFront both on construction *and* after every element, consuming n elements will call the generator function n+1 times. The final call is completely wasted.
April 30, 2020
On 4/30/20 6:39 PM, Paul Backus wrote:
> On Thursday, 30 April 2020 at 18:30:14 UTC, H. S. Teoh wrote:
>> Also, for ranges based on generator functions, if .front is lazy then you need to keep extra baggage around your range to indicate whether or not the generator has been invoked yet; it's easier to just always compute the next element eagerly and cache it, and .front just returns the cached data.
> 
> std.range.generate is actually a perfect example of the problem with doing work in popFront. Because it has to call popFront both on construction *and* after every element, consuming n elements will call the generator function n+1 times. The final call is completely wasted.

generate used to do everything on front. But that meant it wasn't a true range as multiple calls to front generated different data (popFront was a noop). I fixed it a while ago.

It should be the status quo to do all work in popFront, and then you should wrap if you need different behavior.

I'm ok with something like this (I'm kind of surprised something like this doesn't exist already):

struct LazyRange(R)
{
   R src;
   bool frontEvaluated;
   void sync() {
       if(!frontEvaluated) {
          src.popFront;
          frontEvaluated = true;
       }
   }
   auto front() {
       sync();
       return src.front;
   }
   bool empty() {
       sync();
       return src.empty;
   }
   void popFront() {
       sync();
       frontEvaluated = false;
   }
}

-Steve
1 2
Next ›   Last »