Segmented Ranges? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Segmented Ranges?

Thread overview

Segmented Ranges?
Jun 09, 2012 bearophile
Jun 09, 2012 Alex Rønne Petersen
Jun 09, 2012 bearophile
Jun 09, 2012 Denis Shelomovskij
Jun 09, 2012 Andrei Alexandrescu
Jun 17, 2012 bearophile
Jun 18, 2012 Christophe Travert

June 09, 2012

Segmented Ranges?

Posted by bearophile

bearophile

Often enough you have to perform operations on some non-flat sequence, like on a 2D matrix:

import std.stdio;
void main() {
    auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
    foreach (row; m1)
        foreach (x; row)
            writeln(x);
}


Such multi-level ranges are common. But a Range (below an Input Range) that works on a generic multi level Range is slower than the two nested for loops because it needs to keep attention to the exhaustion of the sub-ranges at every iteration (see popFront):


import std.stdio, std.traits, std.array, std.range, std.algorithm;

struct BilevelScan(Range) {
    private Range range;
    typeof(range.front) currentSegment;

    this(Range range_) {
        this.range = range_;
        _nextItem();
    }

    @property bool empty() {
        return range.empty && currentSegment.empty;
    }

    @property auto front() {
        return currentSegment.front;
    }

    void popFront() {
        currentSegment.popFront();
        _nextItem();
    }

    private void _nextItem() {
        while (!range.empty && currentSegment.empty) {
            currentSegment = range.front;
            range.popFront();
        }
    }
}

void main() { // some test code
    auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];

    foreach (x; BilevelScan!(typeof(m1))(m1))
        writeln(x);
    writeln(m1, "\n");

    auto m2 = [[10.5]];
    foreach (x; BilevelScan!(typeof(m2))(m2))
        writeln(x);
    writeln(m2, "\n");

    auto m3 = 6.iota().map!(i => iota(10 ^^ i + i, 10 ^^ i + i + 3))();
    foreach (x; BilevelScan!(typeof(m3))(m3))
        writeln(x);
    writeln(m3, "\n");
}



In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew H. Austern offers a solution to regain the lost efficiency:
http://


Some quotations from the paper:

>segmented data structures are common. Within the STL, for example, the deque container is typically implemented as a segmented array, and hash tables [6] are typically implemented as arrays of buckets. Other examples of segmentation include two-dimensional matrices, graphs, files in record-based file systems, and, on modern NUMA (non-uniform memory architecture) multiprocessor machines, even ordinary memory.<

>A segmented iterator has the same associated types as any other iterator (a value type, for example), and, additionally, it has an associated segment iterator type and local iterator type. Writing a generic algorithm that uses segmented iterators requires the ability to name those types. Additionally, a fully generic algorithm ought to be usable with both segmented and nonsegmented iterators.<

>The distinction between segmented and nonsegmented iterators is orthogonal to the distinction between Forward Iterators, Bidirectional Iterators, and Random Access Iterators.<

>Multiple levels of segmentation can be used with no extra effort. Nothing in this discussion, or this implementation of fill, assumes that the local iterator is nonsegmented.<

>Segmented iterators, an extension of ordinary STL iterators, make it possible for generic algorithms to treat a segmented data structure as something other than a uniform range of elements. This allows existing algorithms to operate more efficiently on segmented data structures, and provides a natural decomposition for parallel computation. Segmented iterators enable new kinds of generic algorithms that explicitly depend on segmentation.<


In D the concept of Input Range is defined as a type that statically supports (there is also some necessary semantics that can't be expressed in the D code of such concepts):

R r;              // can define a range object
if (r.empty) {}   // can test for empty
r.popFront();     // can invoke popFront()
auto h = r.front; // can get the front of the range of non-void type



So an idea is to introduce in D the multi-level Ranges:
SegmentedInputRange
SegmentedOutputRange
SegmentedForwardRange
SegmentedBidirectionalRange
SegmentedRandomAccessRange

I think a Segmented Input Range is more complex than an Input Range. Inventing such ranges is like finding the miminal set of axioms that allow to write down a theorem, that is to efficiently implement the normal algorithms. This is not an easy for me, I don't know much about this topic.

Bye,
bearophile

June 09, 2012

Re: Segmented Ranges?

Posted by Alex Rønne Petersen
in reply to bearophile

Alex Rønne Petersen

Posted in reply to bearophile

On 09-06-2012 12:45, bearophile wrote:
> Often enough you have to perform operations on some non-flat sequence,
> like on a 2D matrix:
>
> import std.stdio;
> void main() {
> auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
> foreach (row; m1)
> foreach (x; row)
> writeln(x);
> }
>
>
> Such multi-level ranges are common. But a Range (below an Input Range)
> that works on a generic multi level Range is slower than the two nested
> for loops because it needs to keep attention to the exhaustion of the
> sub-ranges at every iteration (see popFront):
>
>
> import std.stdio, std.traits, std.array, std.range, std.algorithm;
>
> struct BilevelScan(Range) {
> private Range range;
> typeof(range.front) currentSegment;
>
> this(Range range_) {
> this.range = range_;
> _nextItem();
> }
>
> @property bool empty() {
> return range.empty && currentSegment.empty;
> }
>
> @property auto front() {
> return currentSegment.front;
> }
>
> void popFront() {
> currentSegment.popFront();
> _nextItem();
> }
>
> private void _nextItem() {
> while (!range.empty && currentSegment.empty) {
> currentSegment = range.front;
> range.popFront();
> }
> }
> }
>
> void main() { // some test code
> auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
>
> foreach (x; BilevelScan!(typeof(m1))(m1))
> writeln(x);
> writeln(m1, "\n");
>
> auto m2 = [[10.5]];
> foreach (x; BilevelScan!(typeof(m2))(m2))
> writeln(x);
> writeln(m2, "\n");
>
> auto m3 = 6.iota().map!(i => iota(10 ^^ i + i, 10 ^^ i + i + 3))();
> foreach (x; BilevelScan!(typeof(m3))(m3))
> writeln(x);
> writeln(m3, "\n");
> }
>
>
>
> In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew
> H. Austern offers a solution to regain the lost efficiency:
> http://

I think your NNTP client ate a link. ;)

>
>
> Some quotations from the paper:
>
>> segmented data structures are common. Within the STL, for example, the
>> deque container is typically implemented as a segmented array, and
>> hash tables [6] are typically implemented as arrays of buckets. Other
>> examples of segmentation include two-dimensional matrices, graphs,
>> files in record-based file systems, and, on modern NUMA (non-uniform
>> memory architecture) multiprocessor machines, even ordinary memory.<
>
>> A segmented iterator has the same associated types as any other
>> iterator (a value type, for example), and, additionally, it has an
>> associated segment iterator type and local iterator type. Writing a
>> generic algorithm that uses segmented iterators requires the ability
>> to name those types. Additionally, a fully generic algorithm ought to
>> be usable with both segmented and nonsegmented iterators.<
>
>> The distinction between segmented and nonsegmented iterators is
>> orthogonal to the distinction between Forward Iterators, Bidirectional
>> Iterators, and Random Access Iterators.<
>
>> Multiple levels of segmentation can be used with no extra effort.
>> Nothing in this discussion, or this implementation of fill, assumes
>> that the local iterator is nonsegmented.<
>
>> Segmented iterators, an extension of ordinary STL iterators, make it
>> possible for generic algorithms to treat a segmented data structure as
>> something other than a uniform range of elements. This allows existing
>> algorithms to operate more efficiently on segmented data structures,
>> and provides a natural decomposition for parallel computation.
>> Segmented iterators enable new kinds of generic algorithms that
>> explicitly depend on segmentation.<
>
>
> In D the concept of Input Range is defined as a type that statically
> supports (there is also some necessary semantics that can't be expressed
> in the D code of such concepts):
>
> R r; // can define a range object
> if (r.empty) {} // can test for empty
> r.popFront(); // can invoke popFront()
> auto h = r.front; // can get the front of the range of non-void type
>
>
>
> So an idea is to introduce in D the multi-level Ranges:
> SegmentedInputRange
> SegmentedOutputRange
> SegmentedForwardRange
> SegmentedBidirectionalRange
> SegmentedRandomAccessRange
>
> I think a Segmented Input Range is more complex than an Input Range.
> Inventing such ranges is like finding the miminal set of axioms that
> allow to write down a theorem, that is to efficiently implement the
> normal algorithms. This is not an easy for me, I don't know much about
> this topic.
>
> Bye,
> bearophile

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org

June 09, 2012

Re: Segmented Ranges?

Posted by bearophile
in reply to bearophile

bearophile

Posted in reply to bearophile

> In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew H. Austern offers a solution to regain the lost efficiency:
> http://

Sorry:
http://lafstern.org/matt/segmented.pdf

Bye,
bearophile

June 09, 2012

Re: Segmented Ranges?

Posted by Denis Shelomovskij
in reply to bearophile

Denis Shelomovskij

Posted in reply to bearophile

09.06.2012 19:06, bearophile пишет:
>> In the paper "Segmented Iterators and Hierarchical Algorithms" Matthew
>> H. Austern offers a solution to regain the lost efficiency:
>...
> http://lafstern.org/matt/segmented.pdf

Wow, didn't know there is an article about this. Looks like we all think about same things. Personally I was too lazy to propose in this NG similar but less advanced idea. So my small (but valuable, IMHO) addition:
Add explicit vectorisation for some common predicates (`a == b` e.g.) for sequential memory blocks (arrays) and ranges with elements of such blocks (called "segmented" in the article).

-- 
Денис В. Шеломовский
Denis V. Shelomovskij

June 09, 2012

Re: Segmented Ranges?

Posted by Andrei Alexandrescu
in reply to bearophile

Andrei Alexandrescu

Posted in reply to bearophile

On 6/9/12 5:45 AM, bearophile wrote:
> Often enough you have to perform operations on some non-flat sequence,

Would joiner help?

Andrei

June 17, 2012

Re: Segmented Ranges?

Posted by bearophile
in reply to Andrei Alexandrescu

bearophile

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu:

> Would joiner help?

I don't know. I think the point of those segmented ranges is to not hide their segmented nature (as joiner does).

- - - - - - - - - - - - - -

This paper shows a variant of the stream fusion, applied on chunked arrays that represent strings:

"Rewriting Haskell Strings" (2007) by Duncan Coutts, Don Stewart, Roman Leshchinskiy:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.3166


In D it's easy to create a range that segments a given lazy range into a lazy iterable of arrays:


import std.range: ElementType;
import std.array: empty, front, popFront;

struct Segmenter(size_t chunkLength, Range) {
    alias ElementType!Range elementT;
    private elementT[chunkLength] chunk;
    private Range r;
    private size_t len;

    this(Range rin) {
        this.r = rin;
        popFront();
    }

    @property bool empty() const {
        return len == 0 && r.empty;
    }

    @property elementT[] front() pure nothrow {
        return chunk[0 .. len];
    }

    void popFront() {
        len = 0;
        while (len < chunkLength && !r.empty) {
            chunk[len] = r.front;
            len++;
            r.popFront();
        }
    }
}


But Phobos ranges don't manage segments automatically (so you can't just apply a normal map on a Segmenter, because this map will work on the chunks), and there aren't the optimizations described in that paper. On the other hand some range methods become inlined, because D ranges are quite different from Haskell lazy lists, so maybe some of those optimizations are not needed.

Bye,
bearophile

June 18, 2012

Re: Segmented Ranges?

Posted by Christophe Travert
in reply to bearophile

Christophe Travert

Posted in reply to bearophile

"bearophile" , dans le message (digitalmars.D:169611), a écrit :
> struct BilevelScan(Range) {

This is basically std.algorithm.joiner

> So an idea is to introduce in D the multi-level Ranges:
> SegmentedInputRange
> SegmentedOutputRange
> SegmentedForwardRange
> SegmentedBidirectionalRange
> SegmentedRandomAccessRange
> 
> I think a Segmented Input Range is more complex than an Input Range. Inventing such ranges is like finding the miminal set of axioms that allow to write down a theorem, that is to efficiently implement the normal algorithms. This is not an easy for me, I don't know much about this topic.

I would only use this axiom:
- a SegmentedRange defines a bySegment property, returning a Range of
Range bearing the same element type as SegmentedRange.

and define BySegment!Range as the type returned by range.bySegment, and SegmentType!Range as the type returned by range.bySegment.front.

Then it is up to the implementer of the algorithm to test the type of a SegmentedRange (InputRange, OutputRange, etc.), the type of BySegment!Range, and the type of SegmentType!Range when needed. There are too many combinaisons to make specific range types for all of them. Moreover SegmentedRandomAccessRange would be ambiguous: is it a RandomAccessRange, a Range whose BySegment method returns a RandomAccessRange, a Range whose SegmentType is a RandomAccessRange, or all of them ?

-- 
Christophe

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation