Jump to page: 1 2
Thread overview
is std.algorithm.joiner lazy?
Apr 07, 2016
Puming
Apr 07, 2016
Edwin van Leeuwen
Apr 07, 2016
Puming
Apr 07, 2016
Edwin van Leeuwen
Apr 07, 2016
Puming
Apr 07, 2016
Jonathan M Davis
Apr 08, 2016
Puming
Apr 08, 2016
Jonathan M Davis
Apr 08, 2016
Puming
Apr 08, 2016
Jonathan M Davis
Apr 08, 2016
Puming
Apr 08, 2016
Mike Parker
Apr 08, 2016
Puming
Apr 08, 2016
Puming
Apr 07, 2016
Puming
Apr 07, 2016
Edwin van Leeuwen
Apr 07, 2016
Puming
April 07, 2016
Hi:

when I use map with joiner, I found that function in map are called. In the document it says joiner is lazy, so why is the function called?

say:

int[] mkarray(int a) {
   writeln("mkarray called!");
   return [a * 2]; // just for test
}

void main() {
   auto xs = [1, 2];
   auto r = xs.map!(x=>mkarray(x)).joiner;
}

running this will get the output:

mkarray called!
mkarray called!

I suppose joiner does not consume?

when I actually consume the result by writlen, I get more output:

mkarray called!
mkarray called!
[2mkarray called!
mkarray called!
, 4]

I don't understand
April 07, 2016
On Thursday, 7 April 2016 at 07:07:40 UTC, Puming wrote:
> Hi:
>
> when I use map with joiner, I found that function in map are called. In the document it says joiner is lazy, so why is the function called?
>
> say:
>
> int[] mkarray(int a) {
>    writeln("mkarray called!");
>    return [a * 2]; // just for test
> }
>
> void main() {
>    auto xs = [1, 2];
>    auto r = xs.map!(x=>mkarray(x)).joiner;
> }
>
> running this will get the output:
>
> mkarray called!
> mkarray called!
>
> I suppose joiner does not consume?
>
> when I actually consume the result by writlen, I get more output:
>
> mkarray called!
> mkarray called!
> [2mkarray called!
> mkarray called!
> , 4]
>
> I don't understand

Apparently it works processing the first two elements at creation. All the other elements will be processed lazily.

Even when a range is lazy the algorithm still often has to "consume" one or two starting elements, just to set initial conditions. It does surprise me that joiner needs to process the first two, would have to look at the implementation why.
April 07, 2016
On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote:
> On Thursday, 7 April 2016 at 07:07:40 UTC, Puming wrote:
>> [...]
>
> Apparently it works processing the first two elements at creation. All the other elements will be processed lazily.
>
> Even when a range is lazy the algorithm still often has to "consume" one or two starting elements, just to set initial conditions. It does surprise me that joiner needs to process the first two, would have to look at the implementation why.

OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.
April 07, 2016
On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:
> On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote:
>
> OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.

After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with:

xs.map!(x=>mkarray(x)).cache.joiner;
April 07, 2016
On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen wrote:
> On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:
>> On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote:
>>
>> OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.
>
> After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with:
>
> xs.map!(x=>mkarray(x)).cache.joiner;

Thanks! I added more elements to xs and checked that you are right.

So EVERY element is accessed twice with joiner. Better add that to the docs, and note the use of cache.
April 07, 2016
On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen wrote:
> On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:
>> On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote:
>>
>> OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.
>
> After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with:
>
> xs.map!(x=>mkarray(x)).cache.joiner;

There is another problem with cache, that is if I want another level of this map&joiner(which is my code scenario, where I'm reading a bunch of files, with each one I need to read multiple locations with seek and return a bunch of lines with each seek), adding cache will result compiler error:

simplified demo:

auto read(int a) {
   writeln("read called!", a);
   return [0, a]; // second level
}

auto mkarray(int a) {
  writeln("mkarray called!", a);
  return [-a, a].map!(x=>read(x)).cache.joiner; // to avoid calling read twice
}

void main() {
  auto xs = [1,2 ,3, 4];
  auto r = xs.map!(x=>mkarray(x)).cache.joiner; // to avoid calling mkarray twice

  writeln(r);
}

When compiled, I get the error:

Error: open path skips field __caches_field_0
source/app.d(19, 36): Error: template instance std.algorithm.iteration.cache!(MapResult!(__lambda1, int[])) error instantiating
April 07, 2016
On Thursday, 7 April 2016 at 09:55:56 UTC, Puming wrote:
> When compiled, I get the error:
>
> Error: open path skips field __caches_field_0
> source/app.d(19, 36): Error: template instance std.algorithm.iteration.cache!(MapResult!(__lambda1, int[])) error instantiating

That seems like a bug to me and you might want to submit it to the bug tracker. Even converting it to an array first does not seem to work:

import std.stdio : writeln;
import std.algorithm : map, cache, joiner;
import std.array : array;

auto read(int a) {
   return [0, a]; // second level
}

auto mkarray(int a) {
  return [-a, a].map!(x=>read(x)).cache.joiner; // to avoid calling read twice
}

void main() {
  auto xs = [1,2 ,3, 4];
  auto r = xs.map!(x=>mkarray(x)).array;
	
  // Both lines below should be equal, but second does not compile
  [[0, -1, 0, 1], [0, -2, 0, 2], [0, -3, 0, 3], [0, -4, 0, 4]].cache.joiner.writeln;
  r.cache.joiner.writeln;
}

Above results in following error:
/opt/compilers/dmd2/include/std/algorithm/iteration.d(326): Error: one path skips field __caches_field_0
/d617/f62.d(19): Error: template instance std.algorithm.iteration.cache!(Result[]) error instantiating
April 07, 2016
On Thursday, 7 April 2016 at 10:57:25 UTC, Edwin van Leeuwen wrote:
> On Thursday, 7 April 2016 at 09:55:56 UTC, Puming wrote:
>> [...]
>
> That seems like a bug to me and you might want to submit it to the bug tracker. Even converting it to an array first does not seem to work:
>
> [...]

Thanks. I just looked at the joiner code, but didn't find the source of error. I'll submit a bug report.
April 07, 2016
On Thursday, April 07, 2016 08:47:15 Puming via Digitalmars-d-learn wrote:
> On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen
>
> wrote:
> > On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:
> >> On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote:
> >>
> >> OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.
> >
> > After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with:
> >
> > xs.map!(x=>mkarray(x)).cache.joiner;
>
> Thanks! I added more elements to xs and checked that you are right.
>
> So EVERY element is accessed twice with joiner. Better add that to the docs, and note the use of cache.

I would note that in general, it's not uncommon for an algorithm to access front multiple times. So, this really isn't a joiner-specific issue. If anything, it's map that should get a note in its docs, not joiner. You really should just expect front to be called multiple times. So, if that's a problem, use cache. But joiner is not doing anything abnormal.

And it's not even the case that it necessarily makes sense to make a rule of thumb that ranges should copy front instead of calling it multiple times, because if front returns by ref, calling front multiple times is likely to be cheapepr, and while we don't properly support non-copyable types (like UniquePtr) with ranges right now, we really should, so if anything, it becomes the case that algorithms should favor calling front multiple times over copying its value.

So, there are pros and cons involved with copying front vs calling it multiple times, and I think that both approaches are both pretty common at this point. So, given how frequently it makes sense for map to allocate (e.g. to!string(a)), map should probably have a note about cache, but overall, it's just something that you need to be aware of. Regardless, I don't think that it makes sense to put anything in joiner's docs about it.

- Jonathan M Davis

April 08, 2016
On Thursday, 7 April 2016 at 18:15:07 UTC, Jonathan M Davis wrote:
> On Thursday, April 07, 2016 08:47:15 Puming via Digitalmars-d-learn wrote:
>> On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen
>>
>> wrote:
>> > On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:
>> >> On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote:
>> >>
>> >> OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.
>> >
>> > After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with:
>> >
>> > xs.map!(x=>mkarray(x)).cache.joiner;
>>
>> Thanks! I added more elements to xs and checked that you are right.
>>
>> So EVERY element is accessed twice with joiner. Better add that to the docs, and note the use of cache.
>
> I would note that in general, it's not uncommon for an algorithm to access front multiple times. So, this really isn't a joiner-specific issue. If anything, it's map that should get a note in its docs, not joiner. You really should just expect front to be called multiple times. So, if that's a problem, use cache. But joiner is not doing anything abnormal.

But in the joiner docs, it says joiner is lazy. But accessing front multiple times is not true laziness. I think it better note that after the lazy part: "joiner is lazy, but it will access the front twice".

If there are many other lazy functions behave like this, I suggest to make a new name for it, like 'semi-lazy', to be more accurate.

Maybe its my fault, I didn't know what cache does before Edwin told me.
So there is the solution, it just is not easy for newbies to find out because there is no direct link between these functions.

>
> And it's not even the case that it necessarily makes sense to make a rule of thumb that ranges should copy front instead of calling it multiple times, because if front returns by ref, calling front multiple times is likely to be cheapepr, and while we don't properly support non-copyable types (like UniquePtr) with ranges right now, we really should, so if anything, it becomes the case that algorithms should favor calling front multiple times over copying its value.

Indeed. I think copy is not good. But multiple access is a thing to note. When I want to use lazy things, it usually is that I'm reading files, so accessing twice is not acceptable.

>
> So, there are pros and cons involved with copying front vs calling it multiple times, and I think that both approaches are both pretty common at this point. So, given how frequently it makes sense for map to allocate (e.g. to!string(a)), map should probably have a note about cache, but overall, it's just something that you need to be aware of. Regardless, I don't think that it makes sense to put anything in joiner's docs about it.

There is another problem, map, cache, and joiner don't work when composed multiple times. I've submitted a bug, https://issues.dlang.org/show_bug.cgi?id=15891, can you confirm?

Because of this, now I have to read a file multiple times(using only joiner), or have to eagerly retrieve data in an array (which is too big), or fall back to an imperative way of manually accessing each file. They are all bad.
>
> - Jonathan M Davis


« First   ‹ Prev
1 2