July 18, 2020
Consider the following code. It counts the number of subdirectories in directories provided via commandline args.

import std.stdio;
import std.algorithm;
import std.file;
import std.range;
import std.exception;

@trusted bool isDirNothrow(string dir) nothrow
{
    bool ok;
    collectException(dir.isDir(), ok);
    return ok;
}

auto subDirs(Range)(Range searchDirs)
if(is(ElementType!Range : string))
{
    return searchDirs
        .filter!(isDirNothrow).map!(function(dir) {
            return dir.dirEntries(SpanMode.shallow)
                .filter!(isDirNothrow);
        }).cache().joiner;
}

int main(string[] args)
{
    auto r = args[1..$].subDirs;
    writefln("Found %s subdirs", count(r));
    return 0;
}

Using strace I inspect how many stat calls it uses. Like this:

strace -C -e stat path/to/binary

strace reports some duplicated stat calls, specifically on the first provided directory and first subdirectory of each directories.

How can I avoid redundant stat calls? Adding array() after each filter() removes duplicates but it leads to allocations obviously. Changing the cache() to array() before joiner also solve the issue.

I suspect there's something wrong in combination of filters and joiner.