View mode: basic / threaded / horizontal-split · Log in · Help
June 09, 2012
Segmented Ranges?
Often enough you have to perform operations on some non-flat 
sequence, like on a 2D matrix:

import std.stdio;
void main() {
    auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
    foreach (row; m1)
        foreach (x; row)
            writeln(x);
}


Such multi-level ranges are common. But a Range (below an Input 
Range) that works on a generic multi level Range is slower than 
the two nested for loops because it needs to keep attention to 
the exhaustion of the sub-ranges at every iteration (see 
popFront):


import std.stdio, std.traits, std.array, std.range, std.algorithm;

struct BilevelScan(Range) {
    private Range range;
    typeof(range.front) currentSegment;

    this(Range range_) {
        this.range = range_;
        _nextItem();
    }

    @property bool empty() {
        return range.empty && currentSegment.empty;
    }

    @property auto front() {
        return currentSegment.front;
    }

    void popFront() {
        currentSegment.popFront();
        _nextItem();
    }

    private void _nextItem() {
        while (!range.empty && currentSegment.empty) {
            currentSegment = range.front;
            range.popFront();
        }
    }
}

void main() { // some test code
    auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];

    foreach (x; BilevelScan!(typeof(m1))(m1))
        writeln(x);
    writeln(m1, "\n");

    auto m2 = [[10.5]];
    foreach (x; BilevelScan!(typeof(m2))(m2))
        writeln(x);
    writeln(m2, "\n");

    auto m3 = 6.iota().map!(i => iota(10 ^^ i + i, 10 ^^ i + i + 
3))();
    foreach (x; BilevelScan!(typeof(m3))(m3))
        writeln(x);
    writeln(m3, "\n");
}



In the paper "Segmented Iterators and Hierarchical Algorithms" 
Matthew H. Austern offers a solution to regain the lost 
efficiency:
http://


Some quotations from the paper:

>segmented data structures are common. Within the STL, for 
>example, the deque container is typically implemented as a 
>segmented array, and hash tables [6] are typically implemented 
>as arrays of buckets. Other examples of segmentation include 
>two-dimensional matrices, graphs, files in record-based file 
>systems, and, on modern NUMA (non-uniform memory architecture) 
>multiprocessor machines, even ordinary memory.<

>A segmented iterator has the same associated types as any other 
>iterator (a value type, for example), and, additionally, it has 
>an associated segment iterator type and local iterator type. 
>Writing a generic algorithm that uses segmented iterators 
>requires the ability to name those types. Additionally, a fully 
>generic algorithm ought to be usable with both segmented and 
>nonsegmented iterators.<

>The distinction between segmented and nonsegmented iterators is 
>orthogonal to the distinction between Forward Iterators, 
>Bidirectional Iterators, and Random Access Iterators.<

>Multiple levels of segmentation can be used with no extra 
>effort. Nothing in this discussion, or this implementation of 
>fill, assumes that the local iterator is nonsegmented.<

>Segmented iterators, an extension of ordinary STL iterators, 
>make it possible for generic algorithms to treat a segmented 
>data structure as something other than a uniform range of 
>elements. This allows existing algorithms to operate more 
>efficiently on segmented data structures, and provides a natural 
>decomposition for parallel computation. Segmented iterators 
>enable new kinds of generic algorithms that explicitly depend on 
>segmentation.<


In D the concept of Input Range is defined as a type that 
statically supports (there is also some necessary semantics that 
can't be expressed in the D code of such concepts):

R r;              // can define a range object
if (r.empty) {}   // can test for empty
r.popFront();     // can invoke popFront()
auto h = r.front; // can get the front of the range of non-void 
type



So an idea is to introduce in D the multi-level Ranges:
SegmentedInputRange
SegmentedOutputRange
SegmentedForwardRange
SegmentedBidirectionalRange
SegmentedRandomAccessRange

I think a Segmented Input Range is more complex than an Input 
Range. Inventing such ranges is like finding the miminal set of 
axioms that allow to write down a theorem, that is to efficiently 
implement the normal algorithms. This is not an easy for me, I 
don't know much about this topic.

Bye,
bearophile
June 09, 2012
Re: Segmented Ranges?
On 09-06-2012 12:45, bearophile wrote:
> Often enough you have to perform operations on some non-flat sequence,
> like on a 2D matrix:
>
> import std.stdio;
> void main() {
> auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
> foreach (row; m1)
> foreach (x; row)
> writeln(x);
> }
>
>
> Such multi-level ranges are common. But a Range (below an Input Range)
> that works on a generic multi level Range is slower than the two nested
> for loops because it needs to keep attention to the exhaustion of the
> sub-ranges at every iteration (see popFront):
>
>
> import std.stdio, std.traits, std.array, std.range, std.algorithm;
>
> struct BilevelScan(Range) {
> private Range range;
> typeof(range.front) currentSegment;
>
> this(Range range_) {
> this.range = range_;
> _nextItem();
> }
>
> @property bool empty() {
> return range.empty && currentSegment.empty;
> }
>
> @property auto front() {
> return currentSegment.front;
> }
>
> void popFront() {
> currentSegment.popFront();
> _nextItem();
> }
>
> private void _nextItem() {
> while (!range.empty && currentSegment.empty) {
> currentSegment = range.front;
> range.popFront();
> }
> }
> }
>
> void main() { // some test code
> auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
>
> foreach (x; BilevelScan!(typeof(m1))(m1))
> writeln(x);
> writeln(m1, "\n");
>
> auto m2 = [[10.5]];
> foreach (x; BilevelScan!(typeof(m2))(m2))
> writeln(x);
> writeln(m2, "\n");
>
> auto m3 = 6.iota().map!(i => iota(10 ^^ i + i, 10 ^^ i + i + 3))();
> foreach (x; BilevelScan!(typeof(m3))(m3))
> writeln(x);
> writeln(m3, "\n");
> }
>
>
>
> In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew
> H. Austern offers a solution to regain the lost efficiency:
> http://

I think your NNTP client ate a link. ;)

>
>
> Some quotations from the paper:
>
>> segmented data structures are common. Within the STL, for example, the
>> deque container is typically implemented as a segmented array, and
>> hash tables [6] are typically implemented as arrays of buckets. Other
>> examples of segmentation include two-dimensional matrices, graphs,
>> files in record-based file systems, and, on modern NUMA (non-uniform
>> memory architecture) multiprocessor machines, even ordinary memory.<
>
>> A segmented iterator has the same associated types as any other
>> iterator (a value type, for example), and, additionally, it has an
>> associated segment iterator type and local iterator type. Writing a
>> generic algorithm that uses segmented iterators requires the ability
>> to name those types. Additionally, a fully generic algorithm ought to
>> be usable with both segmented and nonsegmented iterators.<
>
>> The distinction between segmented and nonsegmented iterators is
>> orthogonal to the distinction between Forward Iterators, Bidirectional
>> Iterators, and Random Access Iterators.<
>
>> Multiple levels of segmentation can be used with no extra effort.
>> Nothing in this discussion, or this implementation of fill, assumes
>> that the local iterator is nonsegmented.<
>
>> Segmented iterators, an extension of ordinary STL iterators, make it
>> possible for generic algorithms to treat a segmented data structure as
>> something other than a uniform range of elements. This allows existing
>> algorithms to operate more efficiently on segmented data structures,
>> and provides a natural decomposition for parallel computation.
>> Segmented iterators enable new kinds of generic algorithms that
>> explicitly depend on segmentation.<
>
>
> In D the concept of Input Range is defined as a type that statically
> supports (there is also some necessary semantics that can't be expressed
> in the D code of such concepts):
>
> R r; // can define a range object
> if (r.empty) {} // can test for empty
> r.popFront(); // can invoke popFront()
> auto h = r.front; // can get the front of the range of non-void type
>
>
>
> So an idea is to introduce in D the multi-level Ranges:
> SegmentedInputRange
> SegmentedOutputRange
> SegmentedForwardRange
> SegmentedBidirectionalRange
> SegmentedRandomAccessRange
>
> I think a Segmented Input Range is more complex than an Input Range.
> Inventing such ranges is like finding the miminal set of axioms that
> allow to write down a theorem, that is to efficiently implement the
> normal algorithms. This is not an easy for me, I don't know much about
> this topic.
>
> Bye,
> bearophile

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
June 09, 2012
Re: Segmented Ranges?
> In the paper "Segmented Iterators and Hierarchical Algorithms" 
> Matthew H. Austern offers a solution to regain the lost 
> efficiency:
> http://

Sorry:
http://lafstern.org/matt/segmented.pdf

Bye,
bearophile
June 09, 2012
Re: Segmented Ranges?
09.06.2012 19:06, bearophile пишет:
>> In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew
>> H. Austern offers a solution to regain the lost efficiency:
>...
> http://lafstern.org/matt/segmented.pdf

Wow, didn't know there is an article about this. Looks like we all think 
about same things. Personally I was too lazy to propose in this NG 
similar but less advanced idea. So my small (but valuable, IMHO) addition:
Add explicit vectorisation for some common predicates (`a == b` e.g.) 
for sequential memory blocks (arrays) and ranges with elements of such 
blocks (called "segmented" in the article).

-- 
Денис В. Шеломовский
Denis V. Shelomovskij
June 09, 2012
Re: Segmented Ranges?
On 6/9/12 5:45 AM, bearophile wrote:
> Often enough you have to perform operations on some non-flat sequence,

Would joiner help?

Andrei
June 17, 2012
Re: Segmented Ranges?
Andrei Alexandrescu:

> Would joiner help?

I don't know. I think the point of those segmented ranges is to 
not hide their segmented nature (as joiner does).

- - - - - - - - - - - - - -

This paper shows a variant of the stream fusion, applied on 
chunked arrays that represent strings:

"Rewriting Haskell Strings" (2007) by Duncan Coutts, Don Stewart, 
Roman Leshchinskiy:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.3166


In D it's easy to create a range that segments a given lazy range 
into a lazy iterable of arrays:


import std.range: ElementType;
import std.array: empty, front, popFront;

struct Segmenter(size_t chunkLength, Range) {
    alias ElementType!Range elementT;
    private elementT[chunkLength] chunk;
    private Range r;
    private size_t len;

    this(Range rin) {
        this.r = rin;
        popFront();
    }

    @property bool empty() const {
        return len == 0 && r.empty;
    }

    @property elementT[] front() pure nothrow {
        return chunk[0 .. len];
    }

    void popFront() {
        len = 0;
        while (len < chunkLength && !r.empty) {
            chunk[len] = r.front;
            len++;
            r.popFront();
        }
    }
}


But Phobos ranges don't manage segments automatically (so you 
can't just apply a normal map on a Segmenter, because this map 
will work on the chunks), and there aren't the optimizations 
described in that paper. On the other hand some range methods 
become inlined, because D ranges are quite different from Haskell 
lazy lists, so maybe some of those optimizations are not needed.

Bye,
bearophile
June 18, 2012
Re: Segmented Ranges?
"bearophile" , dans le message (digitalmars.D:169611), a écrit :
> struct BilevelScan(Range) {

This is basically std.algorithm.joiner

> So an idea is to introduce in D the multi-level Ranges:
> SegmentedInputRange
> SegmentedOutputRange
> SegmentedForwardRange
> SegmentedBidirectionalRange
> SegmentedRandomAccessRange
> 
> I think a Segmented Input Range is more complex than an Input 
> Range. Inventing such ranges is like finding the miminal set of 
> axioms that allow to write down a theorem, that is to efficiently 
> implement the normal algorithms. This is not an easy for me, I 
> don't know much about this topic.

I would only use this axiom:
- a SegmentedRange defines a bySegment property, returning a Range of 
Range bearing the same element type as SegmentedRange.

and define BySegment!Range as the type returned by range.bySegment, and 
SegmentType!Range as the type returned by range.bySegment.front.

Then it is up to the implementer of the algorithm to test the type of 
a SegmentedRange (InputRange, OutputRange, etc.), the type of 
BySegment!Range, and the type of SegmentType!Range when needed. There 
are too many combinaisons to make specific range types for all of them.
Moreover SegmentedRandomAccessRange would be ambiguous: is it a 
RandomAccessRange, a Range whose BySegment method returns a 
RandomAccessRange, a Range whose SegmentType is a RandomAccessRange, 
or all of them ?

-- 
Christophe
Top | Discussion index | About this forum | D home