July 03
On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
> Okay. This is my proposal for an updated input range API for Phobos v3. I previously sent it out to some or the core D folks, and the feedback has largely been positive, but I'm now posting it to the newsgroup for wider feedback.

Tales from the AI
-----------------

General Observations:
Length and Clarity:

The proposal is very long and dense, which might deter some readers from providing thorough feedback. Consider breaking it into more manageable sections or providing a summarized version at the beginning.

Some sections could benefit from more concise language. Removing redundant phrases and focusing on the core points could enhance readability.
Implementation Details:

You mention several times that the exact details need to be sorted out. It would be beneficial to provide more concrete examples or pseudo-code where possible to clarify your ideas.

Specific Observations and Suggestions:

Auto-decoding:

Your point about auto-decoding is clear, but ensure you emphasize the impact on backward compatibility. This change might break a lot of existing code, so highlighting strategies for a smooth transition would be beneficial.
Save Requirement for Forward Ranges:

You propose removing save and instead relying on copy semantics. While this can simplify the API, it's important to provide more detailed guidelines on how to implement forward ranges under this new requirement. Examples of how to handle common use cases without save would help clarify this change.
Copy Semantics for Basic Input Ranges:

Making basic input ranges non-copyable is a significant change. Ensure you clearly articulate the benefits versus the drawbacks. You might also want to provide migration strategies or helper functions for common patterns that currently rely on copying input ranges.

Default Initialization:

Requiring all default-initializable ranges to have a valid init value is a good move towards safety. However, the discussion on whether to disallow finite ranges from disabling default initialization seems unresolved. A more definitive stance would strengthen the proposal. Consider detailing specific examples of how this requirement improves robustness and how to handle exceptions.

Transient Front:

Explicitly disallowing transient fronts is a significant change. It might help to provide more context or examples of how common use cases will be handled under the new rules. For instance, what are the alternatives for ranges that currently rely on transient fronts?

Const Ranges:

The problem with const ranges is well-articulated, but the solution proposed (relying on language improvements) seems uncertain. It might be useful to outline potential workarounds or interim solutions until language changes are implemented.
Renaming Range API Functions:

Renaming functions like front to first and empty to isEmpty could introduce a significant learning curve. Ensure you provide strong justifications for these changes and perhaps include a transition plan or tools to help developers update their codebases.

Backward Compatibility and Transition Plan:

The proposal introduces many breaking changes. A detailed transition plan, including deprecation strategies, versioning schemes, and tooling support, would be crucial. Consider providing scripts or automated tools to help developers migrate their code.

Structural Suggestions:

Summary Section:

Add a summary section at the beginning that lists the key changes and their benefits. This helps readers quickly grasp the main points before diving into the details.
Separate Concerns:

Consider separating the proposal into multiple documents or sections: one for identifying problems with the current API, one for the proposed changes, and one for implementation strategies and examples.

Examples and Use Cases:

Throughout the document, add more concrete examples and use cases. This helps in understanding the practical implications of the changes and provides a clearer picture of the benefits and challenges.

Community Feedback:

Encourage community feedback by posing specific questions or areas where input is particularly needed. This can help focus the discussion and gather more targeted feedback.
Overall, your proposal is comprehensive and addresses many crucial aspects of the range API. By refining the structure and providing more concrete details, you can enhance its clarity and make it easier for the community to provide valuable feedback.
September 23

Hey,

In continuation to the discussion about how 3-method ranges do not apply at the conference, I wanted to refer you to Teodora Serbanescu's work on ranges, showing how a single-method range protocol is better optimized[1] than the 3 method range protocol.

Also, here's some benchmark code[2] of a repeated composition of map/filter, and if you look at the generated assembly, you see how well opApply is optimized vs. the phobos range which even in -release -O4 remains non-inlined.

Results on my laptop:

opApplyRange took 13637 hnsecs
phobosRange took 207873 hnsecs

[1]. https://www.dropbox.com/scl/fi/bo9kj2jhd209ie503s00e/Teodora-Single-method-range-protocol.pdf?rlkey=p5qnai91xpsw1kme7im16zogy&st=l1phb9k5&dl=0

[2].

import std: iota, unaryFun, ForeachType;

struct MapResult(alias _Func, R) {
    alias Func = unaryFun!_Func;
    R underlying;
    int opApply(scope int delegate(ref typeof(Func(ForeachType!R.init))) dg) {
        foreach(ref x; underlying) {
            auto res = Func(x);
            if(int rc = dg(res)) return rc;
        }
        return 0;
    }
}

struct FilterResult(alias _Func, R) {
    alias Func = unaryFun!_Func;
    R underlying;
    int opApply(scope int delegate(ref ForeachType!R) dg) {
        foreach(ref x; underlying) {
            if(Func(x)) if(int rc = dg(x)) return rc;
        }
        return 0;
    }
}

auto map(alias Func, R)(R rng) { return MapResult!(Func, R)(rng); }
auto filter(alias Func, R)(R rng) { return FilterResult!(Func, R)(rng); }

auto benchmark(alias F, Args...)(auto ref Args args) {
    import std: forward, writefln;
    import std.datetime.stopwatch: StopWatch, AutoStart;
    auto sw = StopWatch(AutoStart.yes);
    scope(exit) {
        sw.stop();
        writefln("%s took %s hnsecs", __traits(identifier, F), sw.peek.total!"hnsecs");
    }
    return F(forward!args);
}

auto opApplyRange(ref uint res) {
    foreach(x; iota(1000000).filter!"a%2".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3") res += x;
}

auto phobosRange(ref uint res) {
    import std: map, filter;
    foreach(x; iota(1000000).filter!"a%2".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3") res += x;
}

unittest {
    import std: writeln;
    uint res = 0;
    benchmark!opApplyRange(res);
    benchmark!phobosRange(res);
}
September 23

On Monday, 23 September 2024 at 08:20:59 UTC, Eyal Lotem wrote:

>

showing how a single-method range protocol is better optimized[1] than the 3 method range protocol.
opApplyRange

Is opApply even an iterator interface? Per the talk comparing the function count of iterators

https://youtu.be/d3qY4dZ2r4w?si=sRWCtba_njjqrOCP

>

phobosRange
import std: map, filter;

How would you know the difference between phoboes complexity confusing the optimizer vs the 3-ness?


3 function ranges are good for sane api reasons even if slower, much less optional indexing allowing for binary search, allowing in-place sorts etc.

September 23

On Monday, 23 September 2024 at 08:56:20 UTC, monkyyy wrote:

>

On Monday, 23 September 2024 at 08:20:59 UTC, Eyal Lotem wrote:

>

showing how a single-method range protocol is better optimized[1] than the 3 method range protocol.
opApplyRange

Is opApply even an iterator interface? Per the talk comparing the function count of iterators

https://youtu.be/d3qY4dZ2r4w?si=sRWCtba_njjqrOCP

It's more restrictive because it can't do zip or some other things. I don't suggest using opApply as a replacement. It's merely a proof that Phobos ranges are not optimizing as possible

> >

phobosRange
import std: map, filter;

How would you know the difference between phoboes complexity confusing the optimizer vs the 3-ness?

It's my conclusion from looking at the cgast vs asm. You can take a look and I'm open to other reasons for this.

>

3 function ranges are good for sane api reasons even if slower, much less optional indexing allowing for binary search, allowing in-place sorts etc.

This is all possible if iteration uses one method. A single method for iteration doesn't preclude other methods for other purposes.

September 24

On Monday, 23 September 2024 at 08:20:59 UTC, Eyal Lotem wrote:

>

Hey,

In continuation to the discussion about how 3-method ranges do not apply at the conference, I wanted to refer you to Teodora Serbanescu's work on ranges, showing how a single-method range protocol is better optimized[1] than the 3 method range protocol.

Also, here's some benchmark code[2] of a repeated composition of map/filter, and if you look at the generated assembly, you see how well opApply is optimized vs. the phobos range which even in -release -O4 remains non-inlined.

Thank you for posting this!

-Steve

September 24

On Monday, 23 September 2024 at 08:20:59 UTC, Eyal Lotem wrote:

>

Results on my laptop:

opApplyRange took 13637 hnsecs
phobosRange took 207873 hnsecs

I implemented a quick-and-dirty "getNext" style range as outlined in the post, which is not the greatest, and doesn't support ref ranges.

I also just did a "simple implementation" of Map and Filter as ranges.

The results tell an interesting story.

import std: iota, unaryFun, ForeachType;
import std.datetime.stopwatch : benchmark;

struct MapResult(alias _Func, R) {
    alias Func = unaryFun!_Func;
    R underlying;
    auto getNext(ref bool valid) {
        auto x = underlying.getNext(valid);
        if(valid) return Func(x);
        return typeof(Func(x)).init;
    }
    int opApply(scope int delegate(ref typeof(Func(ForeachType!R.init))) dg) {
        foreach(ref x; underlying) {
            auto res = Func(x);
            if(int rc = dg(res)) return rc;
        }
        return 0;
    }
}

struct MapResult2(alias _Func, R) {
    alias Func = unaryFun!_Func;
    R underlying;
    auto front() => Func(underlying.front);
    void popFront() => underlying.popFront;
    bool empty() => underlying.empty;
}

struct FilterResult(alias _Func, R) {
    alias Func = unaryFun!_Func;
    R underlying;
    auto getNext(ref bool valid) {
        typeof(underlying.getNext(valid)) x;
        do {
            x = underlying.getNext(valid);
        } while(valid && !Func(x));
        return x;
    }
    int opApply(scope int delegate(ref ForeachType!R) dg) {
        foreach(ref x; underlying) {
            if(Func(x)) if(int rc = dg(x)) return rc;
        }
        return 0;
    }
}

struct FilterResult2(alias _Func, R) {
    alias Func = unaryFun!_Func;
    R underlying;
    this(R underlying) {
        while(!underlying.empty && !Func(underlying.front)) underlying.popFront;
        this.underlying = underlying;
    }

    auto front() => underlying.front;
    void popFront() {
        underlying.popFront;
        while(!underlying.empty && !Func(underlying.front)) underlying.popFront;
    }

    bool empty() => underlying.empty;
}

// strawman protocol, no ref support.
auto getNext(R)(ref R range, ref bool valid) {
    typeof(range.front()) x;
    if((valid = !range.empty) == true)
    {
        x = range.front;
        range.popFront;
    }
    return x;
}

auto map(alias Func, R)(R rng) { return MapResult!(Func, R)(rng); }
auto filter(alias Func, R)(R rng) { return FilterResult!(Func, R)(rng); }
auto map2(alias Func, R)(R rng) { return MapResult2!(Func, R)(rng); }
auto filter2(alias Func, R)(R rng) { return FilterResult2!(Func, R)(rng); }

auto opApplyRange() {
    uint res;
    foreach(x; iota(1000000).filter!"a%2".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3") res += x;
    assert(res == 1890804904);
}

auto newRangeStyle() {
    uint res;
    auto r = iota(1000000).filter!"a%2".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3";
    bool valid = false;
    while(true) {
        auto x = r.getNext(valid);
        if(!valid) break;
        res += x;
    }
    assert(res == 1890804904);
}

auto phobosRange() {
    uint res;
    import std: map, filter;
    foreach(x; iota(1000000).filter!"a%2".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3") res += x;
    assert(res == 1890804904);
}

auto simpleRanges() {
    uint res;
    foreach(x; iota(1000000).filter2!"a%2".map2!"a*2".filter2!"a%3".map2!"a*2".filter2!"a%3".map2!"a*2".filter2!"a%3") res += x;
    assert(res == 1890804904);
}


unittest {
    import std: writeln;
    immutable runs = 100;
    auto data = benchmark!(opApplyRange, phobosRange, newRangeStyle, simpleRanges)(runs);
    writeln(i"opApplyRange took $(data[0]) for $(runs) runs");
    writeln(i"phobosRange took $(data[1]) for $(runs) runs");
    writeln(i"newRangeStyle took $(data[2]) for $(runs) runs");
    writeln(i"simpleRanges took $(data[3]) for $(runs) runs");
}

Some notes here:

  1. I used phobos benchmark instead of the custom one. My results were very unstable, so I wanted something that was somewhat immune from random CPU usage on my laptop.
  2. I added an assert to ensure the functions were actually working as expected!

Results:

opApplyRange took 186 ms, 123 μs, and 9 hnsecs for 100 runs
phobosRange took 1 sec, 345 ms, 67 μs, and 5 hnsecs for 100 runs
newRangeStyle took 83 ms, 706 μs, and 2 hnsecs for 100 runs
simpleRanges took 136 ms and 47 μs for 100 runs

Interestingly here, the simple ranges seem properly inlined. So somehow the optimizer could see through those despite having 3 function API, but not the normal ranges? Is there something I did special in there that the normal functions prevent?

But even then, the performance of the getNext API destroys everything. So there is still something to this to be explored.

For sure we should figure out why phobos does so poorly here when clearly it's possible to do much better.

My laptop details:

LDC - the LLVM D compiler (1.39.0):
  based on DMD v2.109.1 and LLVM 17.0.6
  built with LDC - the LLVM D compiler (1.39.0)
  Default target: arm64-apple-darwin23.5.0
  Host CPU: apple-m1

-Steve

September 24

On Tuesday, 24 September 2024 at 14:36:05 UTC, Steven Schveighoffer wrote:

>

For sure we should figure out why phobos does so poorly here when clearly it's possible to do much better.

1000000 lines of code, contracts, bandaids instead of rewrites

September 24

On Tuesday, 24 September 2024 at 14:36:05 UTC, Steven Schveighoffer wrote:

>

Results:

opApplyRange took 186 ms, 123 μs, and 9 hnsecs for 100 runs
phobosRange took 1 sec, 345 ms, 67 μs, and 5 hnsecs for 100 runs
newRangeStyle took 83 ms, 706 μs, and 2 hnsecs for 100 runs
simpleRanges took 136 ms and 47 μs for 100 runs

Interestingly here, the simple ranges seem properly inlined. So somehow the optimizer could see through those despite having 3 function API, but not the normal ranges? Is there something I did special in there that the normal functions prevent?

Some more info, my simple filter range pre-computes the first element, and then avoids the whole "check if it's initialized" dance on front and empty.

If I add that "feature" to my range, it gets a lot slower, like 5x slower. Still not as bad as Phobos, but it explains a bit of it.

This is the cost of "optimizing" ranges just in case someone does something unexpected.

Oh, and someone asked on discord, I am using -O3 on the build.

-Steve

1 2 3
Next ›   Last »