Thread overview
Segmented Ranges?
Jun 09, 2012
bearophile
Jun 09, 2012
bearophile
Jun 09, 2012
Denis Shelomovskij
Jun 17, 2012
bearophile
Jun 18, 2012
Christophe Travert
June 09, 2012
Often enough you have to perform operations on some non-flat sequence, like on a 2D matrix:

import std.stdio;
void main() {
    auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
    foreach (row; m1)
        foreach (x; row)
            writeln(x);
}


Such multi-level ranges are common. But a Range (below an Input Range) that works on a generic multi level Range is slower than the two nested for loops because it needs to keep attention to the exhaustion of the sub-ranges at every iteration (see popFront):


import std.stdio, std.traits, std.array, std.range, std.algorithm;

struct BilevelScan(Range) {
    private Range range;
    typeof(range.front) currentSegment;

    this(Range range_) {
        this.range = range_;
        _nextItem();
    }

    @property bool empty() {
        return range.empty && currentSegment.empty;
    }

    @property auto front() {
        return currentSegment.front;
    }

    void popFront() {
        currentSegment.popFront();
        _nextItem();
    }

    private void _nextItem() {
        while (!range.empty && currentSegment.empty) {
            currentSegment = range.front;
            range.popFront();
        }
    }
}

void main() { // some test code
    auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];

    foreach (x; BilevelScan!(typeof(m1))(m1))
        writeln(x);
    writeln(m1, "\n");

    auto m2 = [[10.5]];
    foreach (x; BilevelScan!(typeof(m2))(m2))
        writeln(x);
    writeln(m2, "\n");

    auto m3 = 6.iota().map!(i => iota(10 ^^ i + i, 10 ^^ i + i + 3))();
    foreach (x; BilevelScan!(typeof(m3))(m3))
        writeln(x);
    writeln(m3, "\n");
}



In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew H. Austern offers a solution to regain the lost efficiency:
http://


Some quotations from the paper:

>segmented data structures are common. Within the STL, for example, the deque container is typically implemented as a segmented array, and hash tables [6] are typically implemented as arrays of buckets. Other examples of segmentation include two-dimensional matrices, graphs, files in record-based file systems, and, on modern NUMA (non-uniform memory architecture) multiprocessor machines, even ordinary memory.<

>A segmented iterator has the same associated types as any other iterator (a value type, for example), and, additionally, it has an associated segment iterator type and local iterator type. Writing a generic algorithm that uses segmented iterators requires the ability to name those types. Additionally, a fully generic algorithm ought to be usable with both segmented and nonsegmented iterators.<

>The distinction between segmented and nonsegmented iterators is orthogonal to the distinction between Forward Iterators, Bidirectional Iterators, and Random Access Iterators.<

>Multiple levels of segmentation can be used with no extra effort. Nothing in this discussion, or this implementation of fill, assumes that the local iterator is nonsegmented.<

>Segmented iterators, an extension of ordinary STL iterators, make it possible for generic algorithms to treat a segmented data structure as something other than a uniform range of elements. This allows existing algorithms to operate more efficiently on segmented data structures, and provides a natural decomposition for parallel computation. Segmented iterators enable new kinds of generic algorithms that explicitly depend on segmentation.<


In D the concept of Input Range is defined as a type that statically supports (there is also some necessary semantics that can't be expressed in the D code of such concepts):

R r;              // can define a range object
if (r.empty) {}   // can test for empty
r.popFront();     // can invoke popFront()
auto h = r.front; // can get the front of the range of non-void type



So an idea is to introduce in D the multi-level Ranges:
SegmentedInputRange
SegmentedOutputRange
SegmentedForwardRange
SegmentedBidirectionalRange
SegmentedRandomAccessRange

I think a Segmented Input Range is more complex than an Input Range. Inventing such ranges is like finding the miminal set of axioms that allow to write down a theorem, that is to efficiently implement the normal algorithms. This is not an easy for me, I don't know much about this topic.

Bye,
bearophile
June 09, 2012
On 09-06-2012 12:45, bearophile wrote:
> Often enough you have to perform operations on some non-flat sequence,
> like on a 2D matrix:
>
> import std.stdio;
> void main() {
> auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
> foreach (row; m1)
> foreach (x; row)
> writeln(x);
> }
>
>
> Such multi-level ranges are common. But a Range (below an Input Range)
> that works on a generic multi level Range is slower than the two nested
> for loops because it needs to keep attention to the exhaustion of the
> sub-ranges at every iteration (see popFront):
>
>
> import std.stdio, std.traits, std.array, std.range, std.algorithm;
>
> struct BilevelScan(Range) {
> private Range range;
> typeof(range.front) currentSegment;
>
> this(Range range_) {
> this.range = range_;
> _nextItem();
> }
>
> @property bool empty() {
> return range.empty && currentSegment.empty;
> }
>
> @property auto front() {
> return currentSegment.front;
> }
>
> void popFront() {
> currentSegment.popFront();
> _nextItem();
> }
>
> private void _nextItem() {
> while (!range.empty && currentSegment.empty) {
> currentSegment = range.front;
> range.popFront();
> }
> }
> }
>
> void main() { // some test code
> auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
>
> foreach (x; BilevelScan!(typeof(m1))(m1))
> writeln(x);
> writeln(m1, "\n");
>
> auto m2 = [[10.5]];
> foreach (x; BilevelScan!(typeof(m2))(m2))
> writeln(x);
> writeln(m2, "\n");
>
> auto m3 = 6.iota().map!(i => iota(10 ^^ i + i, 10 ^^ i + i + 3))();
> foreach (x; BilevelScan!(typeof(m3))(m3))
> writeln(x);
> writeln(m3, "\n");
> }
>
>
>
> In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew
> H. Austern offers a solution to regain the lost efficiency:
> http://

I think your NNTP client ate a link. ;)

>
>
> Some quotations from the paper:
>
>> segmented data structures are common. Within the STL, for example, the
>> deque container is typically implemented as a segmented array, and
>> hash tables [6] are typically implemented as arrays of buckets. Other
>> examples of segmentation include two-dimensional matrices, graphs,
>> files in record-based file systems, and, on modern NUMA (non-uniform
>> memory architecture) multiprocessor machines, even ordinary memory.<
>
>> A segmented iterator has the same associated types as any other
>> iterator (a value type, for example), and, additionally, it has an
>> associated segment iterator type and local iterator type. Writing a
>> generic algorithm that uses segmented iterators requires the ability
>> to name those types. Additionally, a fully generic algorithm ought to
>> be usable with both segmented and nonsegmented iterators.<
>
>> The distinction between segmented and nonsegmented iterators is
>> orthogonal to the distinction between Forward Iterators, Bidirectional
>> Iterators, and Random Access Iterators.<
>
>> Multiple levels of segmentation can be used with no extra effort.
>> Nothing in this discussion, or this implementation of fill, assumes
>> that the local iterator is nonsegmented.<
>
>> Segmented iterators, an extension of ordinary STL iterators, make it
>> possible for generic algorithms to treat a segmented data structure as
>> something other than a uniform range of elements. This allows existing
>> algorithms to operate more efficiently on segmented data structures,
>> and provides a natural decomposition for parallel computation.
>> Segmented iterators enable new kinds of generic algorithms that
>> explicitly depend on segmentation.<
>
>
> In D the concept of Input Range is defined as a type that statically
> supports (there is also some necessary semantics that can't be expressed
> in the D code of such concepts):
>
> R r; // can define a range object
> if (r.empty) {} // can test for empty
> r.popFront(); // can invoke popFront()
> auto h = r.front; // can get the front of the range of non-void type
>
>
>
> So an idea is to introduce in D the multi-level Ranges:
> SegmentedInputRange
> SegmentedOutputRange
> SegmentedForwardRange
> SegmentedBidirectionalRange
> SegmentedRandomAccessRange
>
> I think a Segmented Input Range is more complex than an Input Range.
> Inventing such ranges is like finding the miminal set of axioms that
> allow to write down a theorem, that is to efficiently implement the
> normal algorithms. This is not an easy for me, I don't know much about
> this topic.
>
> Bye,
> bearophile

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
June 09, 2012
> In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew H. Austern offers a solution to regain the lost efficiency:
> http://

Sorry:
http://lafstern.org/matt/segmented.pdf

Bye,
bearophile
June 09, 2012
09.06.2012 19:06, bearophile пишет:
>> In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew
>> H. Austern offers a solution to regain the lost efficiency:
>...
> http://lafstern.org/matt/segmented.pdf

Wow, didn't know there is an article about this. Looks like we all think about same things. Personally I was too lazy to propose in this NG similar but less advanced idea. So my small (but valuable, IMHO) addition:
Add explicit vectorisation for some common predicates (`a == b` e.g.) for sequential memory blocks (arrays) and ranges with elements of such blocks (called "segmented" in the article).

-- 
Денис В. Шеломовский
Denis V. Shelomovskij


June 09, 2012
On 6/9/12 5:45 AM, bearophile wrote:
> Often enough you have to perform operations on some non-flat sequence,

Would joiner help?

Andrei


June 17, 2012
Andrei Alexandrescu:

> Would joiner help?

I don't know. I think the point of those segmented ranges is to not hide their segmented nature (as joiner does).

- - - - - - - - - - - - - -

This paper shows a variant of the stream fusion, applied on chunked arrays that represent strings:

"Rewriting Haskell Strings" (2007) by Duncan Coutts, Don Stewart, Roman Leshchinskiy:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.3166


In D it's easy to create a range that segments a given lazy range into a lazy iterable of arrays:


import std.range: ElementType;
import std.array: empty, front, popFront;

struct Segmenter(size_t chunkLength, Range) {
    alias ElementType!Range elementT;
    private elementT[chunkLength] chunk;
    private Range r;
    private size_t len;

    this(Range rin) {
        this.r = rin;
        popFront();
    }

    @property bool empty() const {
        return len == 0 && r.empty;
    }

    @property elementT[] front() pure nothrow {
        return chunk[0 .. len];
    }

    void popFront() {
        len = 0;
        while (len < chunkLength && !r.empty) {
            chunk[len] = r.front;
            len++;
            r.popFront();
        }
    }
}


But Phobos ranges don't manage segments automatically (so you can't just apply a normal map on a Segmenter, because this map will work on the chunks), and there aren't the optimizations described in that paper. On the other hand some range methods become inlined, because D ranges are quite different from Haskell lazy lists, so maybe some of those optimizations are not needed.

Bye,
bearophile
June 18, 2012
"bearophile" , dans le message (digitalmars.D:169611), a écrit :
> struct BilevelScan(Range) {

This is basically std.algorithm.joiner

> So an idea is to introduce in D the multi-level Ranges:
> SegmentedInputRange
> SegmentedOutputRange
> SegmentedForwardRange
> SegmentedBidirectionalRange
> SegmentedRandomAccessRange
> 
> I think a Segmented Input Range is more complex than an Input Range. Inventing such ranges is like finding the miminal set of axioms that allow to write down a theorem, that is to efficiently implement the normal algorithms. This is not an easy for me, I don't know much about this topic.

I would only use this axiom:
- a SegmentedRange defines a bySegment property, returning a Range of
Range bearing the same element type as SegmentedRange.

and define BySegment!Range as the type returned by range.bySegment, and SegmentType!Range as the type returned by range.bySegment.front.

Then it is up to the implementer of the algorithm to test the type of a SegmentedRange (InputRange, OutputRange, etc.), the type of BySegment!Range, and the type of SegmentType!Range when needed. There are too many combinaisons to make specific range types for all of them. Moreover SegmentedRandomAccessRange would be ambiguous: is it a RandomAccessRange, a Range whose BySegment method returns a RandomAccessRange, a Range whose SegmentType is a RandomAccessRange, or all of them ?

-- 
Christophe