Behavior of joining mapresults

Dec 20, 2017

Christian Köstlin

Dec 20, 2017

Stefan Koch

Dec 20, 2017

Dec 21, 2017

Dec 21, 2017

Dec 21, 2017

When working with json data files, that we're a little bigger than convenient I stumbled upon a strange behavior with joining of mapresults (I understand that this is more or less flatmap). I mapped inputfiles, to JSONValues, from which I took out some arrays, whose content I wanted to join. Although the joiner is at the end of the functional pipe, it led to calling of the parsing code twice. I tried to reduce the problem: #!/usr/bin/env rdmd -unittest unittest { import std.stdio; import std.range; import std.algorithm; import std.string; auto parse(int i) { writeln("parsing %s".format(i)); return [1, 2, 3]; } writeln(iota(1, 5).map!(parse)); writeln("-------------------------------"); writeln((iota(1, 5).map!(parse)).joiner); } void main() {} As you can see if you run this code, parsing 1,..5 is called two times each. What am I doing wrong here? Thanks in advance, Christian

On Wednesday, 20 December 2017 at 15:28:00 UTC, Christian Köstlin wrote: > When working with json data files, that we're a little bigger than > convenient I stumbled upon a strange behavior with joining of mapresults > (I understand that this is more or less flatmap). > I mapped inputfiles, to JSONValues, from which I took out some arrays, > whose content I wanted to join. > Although the joiner is at the end of the functional pipe, it led to > calling of the parsing code twice. > I tried to reduce the problem: > > [...] you need to memorize I guess, map is lazy.

On 20.12.17 17:19, Stefan Koch wrote: > On Wednesday, 20 December 2017 at 15:28:00 UTC, Christian Köstlin wrote: >> When working with json data files, that we're a little bigger than >> convenient I stumbled upon a strange behavior with joining of mapresults >> (I understand that this is more or less flatmap). >> I mapped inputfiles, to JSONValues, from which I took out some arrays, >> whose content I wanted to join. >> Although the joiner is at the end of the functional pipe, it led to >> calling of the parsing code twice. >> I tried to reduce the problem: >> >> [...] > > you need to memorize I guess, map is lazy. thats an idea, thank a lot, will give it a try ...

On 20.12.17 17:30, Christian Köstlin wrote: > thats an idea, thank a lot, will give it a try ... #!/usr/bin/env rdmd -unittest unittest { import std.stdio; import std.range; import std.algorithm; import std.string; import std.functional; auto parse(int i) { writeln("parsing %s".format(i)); return [1, 2, 3]; } writeln(iota(1, 5).map!(memoize!parse)); writeln("-------------------------------"); writeln((iota(1, 5).map!(memoize!parse)).joiner); } void main() {} works, but i fear for the data that is stored in the memoization. at the moment its not a big issue, as all the data fits comfortable into ram, but for bigger data another approach is needed (probably even my current json parsing must be exchanged). I still wonder, if the joiner calls front more often than necessary. For sure its valid to call front as many times as one sees fit, but with a lazy map in between, it might not be the best solution.

On Thursday, December 21, 2017 07:46:03 Christian Köstlin via Digitalmars-d- learn wrote: > On 20.12.17 17:30, Christian Köstlin wrote: > > thats an idea, thank a lot, will give it a try ... > > #!/usr/bin/env rdmd -unittest > unittest { > import std.stdio; > import std.range; > import std.algorithm; > import std.string; > import std.functional; > > auto parse(int i) { > writeln("parsing %s".format(i)); > return [1, 2, 3]; > } > > writeln(iota(1, 5).map!(memoize!parse)); > writeln("-------------------------------"); > writeln((iota(1, 5).map!(memoize!parse)).joiner); > } > > void main() {} > > works, but i fear for the data that is stored in the memoization. at the moment its not a big issue, as all the data fits comfortable into ram, but for bigger data another approach is needed (probably even my current json parsing must be exchanged). > > I still wonder, if the joiner calls front more often than necessary. For sure its valid to call front as many times as one sees fit, but with a lazy map in between, it might not be the best solution. I would think that it would make a lot more sense to simply put the whole thing in an array than to use memoize. e.g. auto arr = iota(1, 5).map!parse().array(); - Jonathan M Davis

December 21, 2017

Re: Behavior of joining mapresults

Posted by Christian Köstlin
in reply to Jonathan M Davis

Permalink

Christian Köstlin

Posted in reply to Jonathan M Davis

Permalink

On 21.12.17 08:41, Jonathan M Davis wrote:
> I would think that it would make a lot more sense to simply put the whole thing in an array than to use memoize. e.g.
> 
> auto arr = iota(1, 5).map!parse().array();
thats also possible, but i wanted to make use of the laziness ... e.g.
if i then search over the flattened stuff, i do not have to parse the
10th file.
i replaced joiner by a primitive flatten function like this:
#!/usr/bin/env rdmd -unittest
unittest {
    import std.stdio;
    import std.range;
    import std.algorithm;
    import std.string;
    import std.functional;

    auto parse(int i) {
        writeln("parsing %s".format(i));
        return [1, 2, 3];
    }

    writeln(iota(1, 5).map!(parse));
    writeln("-------------------------------");
    writeln((iota(1, 5).map!(parse)).joiner);
    writeln("-------------------------------");
    writeln((iota(1, 5).map!(memoize!parse)).joiner);
    writeln("-------------------------------");
    writeln((iota(1, 5).map!(parse)).flatten);
}

auto flatten(T)(T input) {
    import std.range;
    struct Res {
        T input;
        ElementType!T current;
        this(T input) {
            this.input = input;
            this.current = this.input.front;
            advance();
        }
        private void advance() {
            while (current.empty) {
                if (input.empty) {
                    return;
                }
                input.popFront;
                if (input.empty) {
                    return;
                }
                current = input.front;
            }
        }

        bool empty() {
            return current.empty;
        }
        auto front() {
            return current.front;
        }

        void popFront() {
            current.popFront;
            advance();
        }

    }
    return Res(input);
}

void main() {}

With this implementation my program behaves as expected (parsing the
input data only once).

Forums