Thread overview
Behavior of joining mapresults
Dec 20, 2017
Christian Köstlin
Dec 20, 2017
Stefan Koch
Dec 20, 2017
Christian Köstlin
Dec 21, 2017
Christian Köstlin
Dec 21, 2017
Jonathan M Davis
Dec 21, 2017
Christian Köstlin
December 20, 2017
When working with json data files, that we're a little bigger than
convenient I stumbled upon a strange behavior with joining of mapresults
(I understand that this is more or less flatmap).
I mapped inputfiles, to JSONValues, from which I took out some arrays,
whose content I wanted to join.
Although the joiner is at the end of the functional pipe, it led to
calling of the parsing code twice.
I tried to reduce the problem:

#!/usr/bin/env rdmd -unittest
unittest {
    import std.stdio;
    import std.range;
    import std.algorithm;
    import std.string;

    auto parse(int i) {
        writeln("parsing %s".format(i));
        return [1, 2, 3];
    }

    writeln(iota(1, 5).map!(parse));
    writeln("-------------------------------");
    writeln((iota(1, 5).map!(parse)).joiner);
}

void main() {}

As you can see if you run this code, parsing 1,..5 is called two times each. What am I doing wrong here?

Thanks in advance,
Christian

December 20, 2017
On Wednesday, 20 December 2017 at 15:28:00 UTC, Christian Köstlin wrote:
> When working with json data files, that we're a little bigger than
> convenient I stumbled upon a strange behavior with joining of mapresults
> (I understand that this is more or less flatmap).
> I mapped inputfiles, to JSONValues, from which I took out some arrays,
> whose content I wanted to join.
> Although the joiner is at the end of the functional pipe, it led to
> calling of the parsing code twice.
> I tried to reduce the problem:
>
> [...]

you need to memorize I guess, map is lazy.
December 20, 2017
On 20.12.17 17:19, Stefan Koch wrote:
> On Wednesday, 20 December 2017 at 15:28:00 UTC, Christian Köstlin wrote:
>> When working with json data files, that we're a little bigger than
>> convenient I stumbled upon a strange behavior with joining of mapresults
>> (I understand that this is more or less flatmap).
>> I mapped inputfiles, to JSONValues, from which I took out some arrays,
>> whose content I wanted to join.
>> Although the joiner is at the end of the functional pipe, it led to
>> calling of the parsing code twice.
>> I tried to reduce the problem:
>>
>> [...]
> 
> you need to memorize I guess, map is lazy.
thats an idea, thank a lot, will give it a try ...

December 21, 2017
On 20.12.17 17:30, Christian Köstlin wrote:
> thats an idea, thank a lot, will give it a try ...
#!/usr/bin/env rdmd -unittest
unittest {
    import std.stdio;
    import std.range;
    import std.algorithm;
    import std.string;
    import std.functional;

    auto parse(int i) {
        writeln("parsing %s".format(i));
        return [1, 2, 3];
    }

    writeln(iota(1, 5).map!(memoize!parse));
    writeln("-------------------------------");
    writeln((iota(1, 5).map!(memoize!parse)).joiner);
}

void main() {}

works, but i fear for the data that is stored in the memoization. at the moment its not a big issue, as all the data fits comfortable into ram, but for bigger data another approach is needed (probably even my current json parsing must be exchanged).

I still wonder, if the joiner calls front more often than necessary. For sure its valid to call front as many times as one sees fit, but with a lazy map in between, it might not be the best solution.

December 21, 2017
On Thursday, December 21, 2017 07:46:03 Christian Köstlin via Digitalmars-d- learn wrote:
> On 20.12.17 17:30, Christian Köstlin wrote:
> > thats an idea, thank a lot, will give it a try ...
>
> #!/usr/bin/env rdmd -unittest
> unittest {
>     import std.stdio;
>     import std.range;
>     import std.algorithm;
>     import std.string;
>     import std.functional;
>
>     auto parse(int i) {
>         writeln("parsing %s".format(i));
>         return [1, 2, 3];
>     }
>
>     writeln(iota(1, 5).map!(memoize!parse));
>     writeln("-------------------------------");
>     writeln((iota(1, 5).map!(memoize!parse)).joiner);
> }
>
> void main() {}
>
> works, but i fear for the data that is stored in the memoization. at the moment its not a big issue, as all the data fits comfortable into ram, but for bigger data another approach is needed (probably even my current json parsing must be exchanged).
>
> I still wonder, if the joiner calls front more often than necessary. For sure its valid to call front as many times as one sees fit, but with a lazy map in between, it might not be the best solution.

I would think that it would make a lot more sense to simply put the whole thing in an array than to use memoize. e.g.

auto arr = iota(1, 5).map!parse().array();

- Jonathan M Davis


December 21, 2017
On 21.12.17 08:41, Jonathan M Davis wrote:
> I would think that it would make a lot more sense to simply put the whole thing in an array than to use memoize. e.g.
> 
> auto arr = iota(1, 5).map!parse().array();
thats also possible, but i wanted to make use of the laziness ... e.g.
if i then search over the flattened stuff, i do not have to parse the
10th file.
i replaced joiner by a primitive flatten function like this:
#!/usr/bin/env rdmd -unittest
unittest {
    import std.stdio;
    import std.range;
    import std.algorithm;
    import std.string;
    import std.functional;

    auto parse(int i) {
        writeln("parsing %s".format(i));
        return [1, 2, 3];
    }

    writeln(iota(1, 5).map!(parse));
    writeln("-------------------------------");
    writeln((iota(1, 5).map!(parse)).joiner);
    writeln("-------------------------------");
    writeln((iota(1, 5).map!(memoize!parse)).joiner);
    writeln("-------------------------------");
    writeln((iota(1, 5).map!(parse)).flatten);
}

auto flatten(T)(T input) {
    import std.range;
    struct Res {
        T input;
        ElementType!T current;
        this(T input) {
            this.input = input;
            this.current = this.input.front;
            advance();
        }
        private void advance() {
            while (current.empty) {
                if (input.empty) {
                    return;
                }
                input.popFront;
                if (input.empty) {
                    return;
                }
                current = input.front;
            }
        }

        bool empty() {
            return current.empty;
        }
        auto front() {
            return current.front;
        }

        void popFront() {
            current.popFront;
            advance();
        }

    }
    return Res(input);
}

void main() {}

With this implementation my program behaves as expected (parsing the
input data only once).