is std.algorithm.joiner lazy? (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » is std.algorithm.joiner lazy? (page 2)

April 07, 2016

Re: is std.algorithm.joiner lazy?

Posted by Jonathan M Davis
in reply to Puming

Jonathan M Davis

Posted in reply to Puming

On Friday, April 08, 2016 00:30:05 Puming via Digitalmars-d-learn wrote:
> On Thursday, 7 April 2016 at 18:15:07 UTC, Jonathan M Davis wrote:
> > On Thursday, April 07, 2016 08:47:15 Puming via
> >
> > Digitalmars-d-learn wrote:
> >> On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen
> >>
> >> wrote:
> >> > On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:
> >> >> On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote:
> >> >>
> >> >> OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.
> >> >
> >> > After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with:
> >> >
> >> > xs.map!(x=>mkarray(x)).cache.joiner;
> >>
> >> Thanks! I added more elements to xs and checked that you are right.
> >>
> >> So EVERY element is accessed twice with joiner. Better add that to the docs, and note the use of cache.
> >
> > I would note that in general, it's not uncommon for an algorithm to access front multiple times. So, this really isn't a joiner-specific issue. If anything, it's map that should get a note in its docs, not joiner. You really should just expect front to be called multiple times. So, if that's a problem, use cache. But joiner is not doing anything abnormal.
>
> But in the joiner docs, it says joiner is lazy. But accessing front multiple times is not true laziness. I think it better note that after the lazy part: "joiner is lazy, but it will access the front twice".
>
> If there are many other lazy functions behave like this, I suggest to make a new name for it, like 'semi-lazy', to be more accurate.
>
> Maybe its my fault, I didn't know what cache does before Edwin
> told me.
> So there is the solution, it just is not easy for newbies to find
> out because there is no direct link between these functions.

Lazy means that it's not going to consume the entire range when you call the function. Rather, it's going to return a range that you can iterate over. It may or may not process the first element before returning, depending on how it works, and there's definitely nothing that says whether it's going to access front multiple times or not before calling popFront. And accessing front multiple times without calling popFront is _normal_ whether you're dealing with a lazy range or an eager one. All that lazy means is that you're getting a range from the function rather than it consuming the range before returning.

So, whatever you do with a range, in general, you have to assume that an algorithm might access front multiple times, and the implementation is free to change so that it accesses it more times or fewer times, because the range API says nothing about whether front is accessed multiple times or not. front needs to return equal values every time that it's called before popFront is called, but that doesn't mean that they have to be the same objects, and it doesn't mean that there's any restriction on how many times front is accessed before a call to popFront.

So, I see no reason for joiner to say anything in its docs about how many times it accesses front. It's pretty much irrelevant to how ranges are expected to work, and it could change. If it actually matters for what you're doing, then you need to figure out how to rework your code so that it doesn't matter whether front is accessed multiple times per call to popFront or not. That's just part of working with ranges, though I can certainly understand if you didn't realize that previously.
>
> There is another problem, map, cache, and joiner don't work when composed multiple times. I've submitted a bug, https://issues.dlang.org/show_bug.cgi?id=15891, can you confirm?

Well, given your example, I would strongly argue that you should write a range that calls read in its constructor and in popFront rather (so that calling front multiple times doesn't matter) rather than using map. While map can theoretically be used the way that you're trying to use it, it's really intended for converting an element using rather than doing stuff like I/O in it. Also, if the range that you give map is random access (like an array would be), then opIndex could be used to access random elements, which _really_ wouldn't work with reading from a file. So, I think that map is just plain a bad choice for what you're trying to do.

It's not obvious to me why your example is failing to compile - the problem appears to be with cache specifically and has nothing to do with joiner - and I am inclined to agree that there's a bug there (be it in cache or in the compiler), but I really think that using map is a bad move for what you're trying to do anyway - especially when you consider what will happen if opIndex is used. I'd strongly encourage you to just write a range that does what you need instead.

- Jonathan M Davis

April 08, 2016

Re: is std.algorithm.joiner lazy?

Posted by Puming
in reply to Jonathan M Davis

Puming

Posted in reply to Jonathan M Davis

On Friday, 8 April 2016 at 01:14:11 UTC, Jonathan M Davis wrote:
> [...]
>
> Lazy means that it's not going to consume the entire range when you call the function. Rather, it's going to return a range that you can iterate over. It may or may not process the first element before returning, depending on how it works, and there's definitely nothing that says whether it's going to access front multiple times or not before calling popFront. And accessing front multiple times without calling popFront is _normal_ whether you're dealing with a lazy range or an eager one. All that lazy means is that you're getting a range from the function rather than it consuming the range before returning.
>
> So, whatever you do with a range, in general, you have to assume that an algorithm might access front multiple times, and the implementation is free to change so that it accesses it more times or fewer times, because the range API says nothing about whether front is accessed multiple times or not. front needs to return equal values every time that it's called before popFront is called, but that doesn't mean that they have to be the same objects, and it doesn't mean that there's any restriction on how many times front is accessed before a call to popFront.
>
> So, I see no reason for joiner to say anything in its docs about how many times it accesses front. It's pretty much irrelevant to how ranges are expected to work, and it could change. If it actually matters for what you're doing, then you need to figure out how to rework your code so that it doesn't matter whether front is accessed multiple times per call to popFront or not. That's just part of working with ranges, though I can certainly understand if you didn't realize that previously.
That makes sense. Thanks for the clarification.
>>
>> There is another problem, map, cache, and joiner don't work when composed multiple times. I've submitted a bug, https://issues.dlang.org/show_bug.cgi?id=15891, can you confirm?
>
> Well, given your example, I would strongly argue that you should write a range that calls read in its constructor and in popFront rather (so that calling front multiple times doesn't matter) rather than using map. While map can theoretically be used the way that you're trying to use it, it's really intended for converting an element using rather than doing stuff like I/O in it. Also, if the range that you give map is random access (like an array would be), then opIndex could be used to access random elements, which _really_ wouldn't work with reading from a file. So, I think that map is just plain a bad choice for what you're trying to do.

So what you mean is to read the front in constructor, and read further parts in the popFront()? that way multiple access to the front won't hurt anything. I think it might work, I'll change my code.

So the guideline is: when accessing front is costly, don't use map, use a customized range struct instead. right?

>
> It's not obvious to me why your example is failing to compile - the problem appears to be with cache specifically and has nothing to do with joiner - and I am inclined to agree that there's a bug there (be it in cache or in the compiler), but I really think that using map is a bad move for what you're trying to do anyway - especially when you consider what will happen if opIndex is used. I'd strongly encourage you to just write a range that does what you need instead.

OK, hope it'll get fixed. I'll try to look for it once I'm able to understande the code in phobos.

>
> - Jonathan M Davis

April 08, 2016

Re: is std.algorithm.joiner lazy?

Posted by Puming
in reply to Jonathan M Davis

Puming

Posted in reply to Jonathan M Davis

On Friday, 8 April 2016 at 01:14:11 UTC, Jonathan M Davis wrote:
> [...]
>
> Well, given your example, I would strongly argue that you should write a range that calls read in its constructor and in popFront rather (so that calling front multiple times doesn't matter) rather than using map. While map can theoretically be used the way that you're trying to use it, it's really intended for converting an element using rather than doing stuff like I/O in it. Also, if the range that you give map is random access (like an array would be), then opIndex could be used to access random elements, which _really_ wouldn't work with reading from a file. So, I think that map is just plain a bad choice for what you're trying to do.
>

Well, I used map because of when viewing the scenario in a data flow, map seems an intuitive choise:

what I have: a bunch of large files, each file containing sections of data, each sections is composed of many lines of record. For each file, I have an list of indices.

what I want: given a list of files and indices for each file, I want to construct a lazy stream of records for other program to use.

here is the data flow:

query constraints
-> [(filePath, [index])]
-> [(File, [index])] // map, needs cache
-> [[section]] // map, needs cache
-> [[[record]]]  // joiner.joiner
-> Range of record

And after reading cache's docs, I get that cache is perfect for converting a Range with front side effect into a Range with popFront side effect.

So if cache and map works harmoniously, they should do the same trick as manually writing two Ranges here.

>
> - Jonathan M Davis

April 07, 2016

Re: is std.algorithm.joiner lazy?

Posted by Jonathan M Davis
in reply to Puming

Jonathan M Davis

Posted in reply to Puming

On Friday, April 08, 2016 02:01:07 Puming via Digitalmars-d-learn wrote:
> So what you mean is to read the front in constructor, and read further parts in the popFront()? that way multiple access to the front won't hurt anything. I think it might work, I'll change my code.
>
> So the guideline is: when accessing front is costly, don't use map, use a customized range struct instead. right?

In general, when you're dealing with a non-random access range, it's best for popFront to do the work of setting up front and then have front return the same object every time. If front is doing the work, then if it gets called multiple times, that work is being repeated every time it gets called. map is a funny case, because it can be a random-access range (if the underlying range it's wrapping is a random-access range). So, fundamentally, it doesn't work in map to do the work in popFront. It pretty much has to be done in front. So, doing stuff like range.map!(a => to!string(a))() is problematic in that a new allocation is going to occur every time that front is called - or when any element is accessed via opIndex. It works so long as the element is equal every time, and calling front multiple times does not affect the rest of the range, but it can be costly. In theory, cache should solve that case (and it would result in a range that wasn't random access, so opIndex wouldn't be called on it), but obviously, you're running into problems with it.

In any case, in general, when doing something like reading from a file with a range, it works best to do the work in popFront to avoid issues with multiple calls to front, and the constructor needs to do that work as well (be it by calling popFront or not), because front needs to be valid as soon as the range has been created, and it's not empty. So, you end up with something like

struct MyRange
{
public:
    @property T front() { return _value; }
    @property bool empty() { ... }
    void popFront()
    {
        _value = readNextValueFromFile();
    }

private:

    this(Something s)
    {
        ...
        popFront();
    }

    T _value;
}

It also encapsulates things better than having a function whose only purpose is to be used in map, though there are obviously cases where writing a function just to use in map would make sense.

In general, I would only use map for cases where I'm converting something to something else and not for functions that do arbitrary work. A function for map that cannot be pure is a danger sign IMHO. Certainly, if you're going to follow how ranges are expected to work, whatever function you give map needs to return equal values every time front is called between calls to popFront, and multiple calls to front cannot affect the rest of the range.  And what you did with map, doesn't follow those guidelines, though it probably would if cache worked, and you always fed it into cache.  Still, for something like this, I'd just create my own range and be done with it. You often need to anyway in order to manage extra state. And it tends to be more idiomatic, though I suppose that that's somewhat subjective.

- Jonathan M Davis

April 08, 2016

Re: is std.algorithm.joiner lazy?

Posted by Puming
in reply to Jonathan M Davis

Puming

Posted in reply to Jonathan M Davis

On Friday, 8 April 2016 at 02:49:01 UTC, Jonathan M Davis wrote:
> [...]

Thanks. I'll adopt this idiom. Hopefully it gets used often enough to warrent a phobos function :-)

April 08, 2016

Re: is std.algorithm.joiner lazy?

Posted by Mike Parker
in reply to Puming

Mike Parker

Posted in reply to Puming

On Friday, 8 April 2016 at 03:20:53 UTC, Puming wrote:
> On Friday, 8 April 2016 at 02:49:01 UTC, Jonathan M Davis wrote:
>> [...]
>
> Thanks. I'll adopt this idiom. Hopefully it gets used often enough to warrent a phobos function :-)

What would such a function look like? I don't think such a thing could exist. This is more than just an idiom, IMO. It's a basic principle of ranges that, if not followed, is likely to produce a broken range and/or one whose front is more expensive than it needs to be. The trouble is that it isn't necessarily obvious and is easy to overlook when first implementing a custom range.

In Learning D, I used a custom FilteredRange to introduce the concept of ranges. It has a member function called skipNext which does the work of the filtering. It's called once in the constructor to 'prime' the range with the first value that matches the filter, then inside every call to popFront to find the next match. I closed that section with this paragraph:

"It might be tempting to take the filtering logic out of the skipNext method and add
it to front, which is another way to guarantee that it's performed on every element.
Then no work would need to be done in the constructor and popFront would
simply become a wrapper for _source.popFront. The problem with that approach
is that front can potentially be called multiple times without calling popFront in
between, meaning the predicate will be tested on each call. That's unnecessary work.
As a general rule, any work that needs to be done inside a range to prepare a front
element should happen as a result of calling popFront, leaving front to simply
focus on returning the current element."

A lazy range should be advanced in the constructor when it needs to be (usually when there is some criterion for an element to be returned from front) and always in popFront, but never in front.

April 08, 2016

Re: is std.algorithm.joiner lazy?

Posted by Puming
in reply to Mike Parker

Puming

Posted in reply to Mike Parker

On Friday, 8 April 2016 at 08:44:36 UTC, Mike Parker wrote:
> On Friday, 8 April 2016 at 03:20:53 UTC, Puming wrote:
>> On Friday, 8 April 2016 at 02:49:01 UTC, Jonathan M Davis wrote:
>>> [...]
>>
>> Thanks. I'll adopt this idiom. Hopefully it gets used often enough to warrent a phobos function :-)
>
> What would such a function look like? I don't think such a thing could exist. This is more than just an idiom, IMO. It's a basic principle of ranges that, if not followed, is likely to produce a broken range and/or one whose front is more expensive than it needs to be. The trouble is that it isn't necessarily obvious and is easy to overlook when first implementing a custom range.
>

I thought it was just like map!readNext.cache


> [...]

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation