Thread overview | ||||||||
---|---|---|---|---|---|---|---|---|
|
December 20, 2017 Behavior of joining mapresults | ||||
---|---|---|---|---|
| ||||
When working with json data files, that we're a little bigger than convenient I stumbled upon a strange behavior with joining of mapresults (I understand that this is more or less flatmap). I mapped inputfiles, to JSONValues, from which I took out some arrays, whose content I wanted to join. Although the joiner is at the end of the functional pipe, it led to calling of the parsing code twice. I tried to reduce the problem: #!/usr/bin/env rdmd -unittest unittest { import std.stdio; import std.range; import std.algorithm; import std.string; auto parse(int i) { writeln("parsing %s".format(i)); return [1, 2, 3]; } writeln(iota(1, 5).map!(parse)); writeln("-------------------------------"); writeln((iota(1, 5).map!(parse)).joiner); } void main() {} As you can see if you run this code, parsing 1,..5 is called two times each. What am I doing wrong here? Thanks in advance, Christian |
December 20, 2017 Re: Behavior of joining mapresults | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christian Köstlin | On Wednesday, 20 December 2017 at 15:28:00 UTC, Christian Köstlin wrote:
> When working with json data files, that we're a little bigger than
> convenient I stumbled upon a strange behavior with joining of mapresults
> (I understand that this is more or less flatmap).
> I mapped inputfiles, to JSONValues, from which I took out some arrays,
> whose content I wanted to join.
> Although the joiner is at the end of the functional pipe, it led to
> calling of the parsing code twice.
> I tried to reduce the problem:
>
> [...]
you need to memorize I guess, map is lazy.
|
December 20, 2017 Re: Behavior of joining mapresults | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On 20.12.17 17:19, Stefan Koch wrote:
> On Wednesday, 20 December 2017 at 15:28:00 UTC, Christian Köstlin wrote:
>> When working with json data files, that we're a little bigger than
>> convenient I stumbled upon a strange behavior with joining of mapresults
>> (I understand that this is more or less flatmap).
>> I mapped inputfiles, to JSONValues, from which I took out some arrays,
>> whose content I wanted to join.
>> Although the joiner is at the end of the functional pipe, it led to
>> calling of the parsing code twice.
>> I tried to reduce the problem:
>>
>> [...]
>
> you need to memorize I guess, map is lazy.
thats an idea, thank a lot, will give it a try ...
|
December 21, 2017 Re: Behavior of joining mapresults | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christian Köstlin | On 20.12.17 17:30, Christian Köstlin wrote: > thats an idea, thank a lot, will give it a try ... #!/usr/bin/env rdmd -unittest unittest { import std.stdio; import std.range; import std.algorithm; import std.string; import std.functional; auto parse(int i) { writeln("parsing %s".format(i)); return [1, 2, 3]; } writeln(iota(1, 5).map!(memoize!parse)); writeln("-------------------------------"); writeln((iota(1, 5).map!(memoize!parse)).joiner); } void main() {} works, but i fear for the data that is stored in the memoization. at the moment its not a big issue, as all the data fits comfortable into ram, but for bigger data another approach is needed (probably even my current json parsing must be exchanged). I still wonder, if the joiner calls front more often than necessary. For sure its valid to call front as many times as one sees fit, but with a lazy map in between, it might not be the best solution. |
December 21, 2017 Re: Behavior of joining mapresults | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christian Köstlin | On Thursday, December 21, 2017 07:46:03 Christian Köstlin via Digitalmars-d- learn wrote:
> On 20.12.17 17:30, Christian Köstlin wrote:
> > thats an idea, thank a lot, will give it a try ...
>
> #!/usr/bin/env rdmd -unittest
> unittest {
> import std.stdio;
> import std.range;
> import std.algorithm;
> import std.string;
> import std.functional;
>
> auto parse(int i) {
> writeln("parsing %s".format(i));
> return [1, 2, 3];
> }
>
> writeln(iota(1, 5).map!(memoize!parse));
> writeln("-------------------------------");
> writeln((iota(1, 5).map!(memoize!parse)).joiner);
> }
>
> void main() {}
>
> works, but i fear for the data that is stored in the memoization. at the moment its not a big issue, as all the data fits comfortable into ram, but for bigger data another approach is needed (probably even my current json parsing must be exchanged).
>
> I still wonder, if the joiner calls front more often than necessary. For sure its valid to call front as many times as one sees fit, but with a lazy map in between, it might not be the best solution.
I would think that it would make a lot more sense to simply put the whole thing in an array than to use memoize. e.g.
auto arr = iota(1, 5).map!parse().array();
- Jonathan M Davis
|
December 21, 2017 Re: Behavior of joining mapresults | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On 21.12.17 08:41, Jonathan M Davis wrote: > I would think that it would make a lot more sense to simply put the whole thing in an array than to use memoize. e.g. > > auto arr = iota(1, 5).map!parse().array(); thats also possible, but i wanted to make use of the laziness ... e.g. if i then search over the flattened stuff, i do not have to parse the 10th file. i replaced joiner by a primitive flatten function like this: #!/usr/bin/env rdmd -unittest unittest { import std.stdio; import std.range; import std.algorithm; import std.string; import std.functional; auto parse(int i) { writeln("parsing %s".format(i)); return [1, 2, 3]; } writeln(iota(1, 5).map!(parse)); writeln("-------------------------------"); writeln((iota(1, 5).map!(parse)).joiner); writeln("-------------------------------"); writeln((iota(1, 5).map!(memoize!parse)).joiner); writeln("-------------------------------"); writeln((iota(1, 5).map!(parse)).flatten); } auto flatten(T)(T input) { import std.range; struct Res { T input; ElementType!T current; this(T input) { this.input = input; this.current = this.input.front; advance(); } private void advance() { while (current.empty) { if (input.empty) { return; } input.popFront; if (input.empty) { return; } current = input.front; } } bool empty() { return current.empty; } auto front() { return current.front; } void popFront() { current.popFront; advance(); } } return Res(input); } void main() {} With this implementation my program behaves as expected (parsing the input data only once). |
Copyright © 1999-2021 by the D Language Foundation