Search
Ranges usages
Nov 22, 2012
bearophile
Nov 22, 2012
Rob T
Nov 22, 2012
monarch_dodra
Nov 23, 2012
Timon Gehr
Nov 23, 2012
Jonathan M Davis
Nov 23, 2012
Timon Gehr
Nov 22, 2012
monarch_dodra
Nov 22, 2012
Artur Skawina
```(This is a partial repost from D.learn.)

Some time ago after Andrei Alexandrescu talk "Iterators must Go" someone asked about implementing the Dutch national flag sort algorithm with D ranges. This is a version of mine (but unfortunately currently I can't try it on the Dlist of Phobos), it looks short but for me it wasn't immediate to write:

void dutchNationalFlagSort(Range, T)(Range secondLast,
in T lowVal, in T highVal)
pure nothrow if (isBidirectionalRange!Range &&
hasSwappableElements!Range &&
is(ElementType!Range == T)) {
for (auto first = secondLast; !secondLast.empty; )
if (secondLast.front == lowVal) {
swap(first.front, secondLast.front);
first.popFront();
secondLast.popFront();
} else if (secondLast.front == highVal) {
swap(secondLast.front, secondLast.back);
secondLast.popBack();
} else
secondLast.popFront();
}

In the following program there are two versions of the compress function, that implement the same bare-bones version of the LZW (http://en.wikipedia.org/wiki/Lempel-Ziv-Welch ) compression algorithm.

compress1 is the simpler version, while compress2 avoids some heap allocations using a Slice that is able to grow (in some of my benchmarks compress2 is about twice faster than compress1).

Is it possible to write similar code nicely & efficiently with ranges?

import std.stdio, std.array;

alias T = ubyte;
alias Ta = immutable(T)[];

int[] compress1(immutable T[] original) {
int[Ta] dictionary;
foreach (i; 0 .. 256)
dictionary[cast(Ta)[i]] = i;

Ta w;
int[] result;
foreach (ch; original) {
auto wc = w ~ ch;
if (wc in dictionary) {
w = wc;
} else {
result ~= dictionary[w];
dictionary[wc] = dictionary.length - 1;
w = [ch];
}
}

return w.empty ? result : (result ~ dictionary[w]);
}

int[] compress2(immutable T[] original) {
int[Ta] dictionary;
foreach (i; 0 .. 256)
dictionary[cast(Ta)[i]] = i;

struct Slice {
size_t start, end;
@property opSlice() {
return original[start .. end];
}
alias this = opSlice;
}

Slice w;
int[] result;
foreach (i; 0 .. original.length) {
auto wc = Slice(w.start, w.end + 1);
if (wc in dictionary) {
w = wc;
} else {
result ~= dictionary[w];
dictionary[wc] = dictionary.length - 1;
w = Slice(i, i + 1);
}
}

return w.empty ? result : (result ~ dictionary[w]);
}

void main() {
auto txt = cast(Ta)"TOBEORNOTTOBEORTOBEORNOT";
writeln(compress1(txt));
writeln(compress2(txt));
}

Thank you,
bye,
bearophile
```
```FYI

I'm getting these compile errors, must be the compiler flags I'm using?

source/main.d(19): Error: cannot implicitly convert expression (dictionary.length() - 1LU) of type ulong to int
source/main.d(48): Error: cannot implicitly convert expression (dictionary.length() - 1LU) of type ulong to int

17        } else {
18            result ~= dictionary[w];
19**          dictionary[wc] = dictionary.length - 1;
20            w = [ch];
21        }

46        } else {
47            result ~= dictionary[w];
48**          dictionary[wc] = dictionary.length - 1;
49            w = Slice(i, i + 1);
50        }

--rt

```
```On Thursday, 22 November 2012 at 01:14:23 UTC, bearophile wrote:
>
> [SNIP]
>
> Thank you,
> bye,
> bearophile

I'd argue that your first "compress1" isn't range correct either anyways, because "w ~ ch;" (which will allocate) is not part of the range interface.

Neither is "w = [ch];" (which will also allocate)

Given the allocations in the first example, I'd hardly consider the performance comparison fair.

int[] compress3(immutable T[] original) {
int[Ta] dictionary;
foreach (i; 0 .. 256)
dictionary[cast(Ta)[i]] = i;

int[] result;
size_t start, end;
Ta w;
foreach (i, ch; original) {
auto wc = original[start .. end + 1];
if (wc in dictionary) {
w = wc;
++end;
} else {
w = original[start .. end];
result ~= dictionary[w];
dictionary[wc] = dictionary.length - 1;
start = i;
end = i + 1;
}
}

return w.empty ? result : (result ~ dictionary[w]);
}

This is "range correct". Being range correct, it doesn't allocate either.

I would have preferred writing "wc = wc.ptr[0 .. wc.length + 1]"... but that's not range correct...

I'd be thrilled if you (quickly) benched my compress3.

----
Also: I'd suggest you avoid code such as:
int[] result;
for (...)
result ~= xxx ;

Unless you take the time to manually call assumeUnique, this will reallocate on *each*and*every* append. You are better off using an actual appender:
auto result = appender!int();
for (...)
result.put(x);
return result.data;

My benching experiences have shown performance gains with appender as early as the 2nd insertion.
--------

My limited range experience has shown me that:
1. They are *incredibly* easy to chain and adapt.
2. RA ranges are (mostly) just as powerful as iterators or raw arrays.
3. If all you want to do is iterate forward, than any flavor of range will rock.

The "HUGE" flaw I see with ranges though, is in regards to bidirectional ranges, and their ability to "only shrink, never grow". In particular, it is not possible to "slice" an Bidir right through the middle. "iterate until you find x, and then return the slice *up-to* x". Iterators can do it just fine.

The (IMO) tell-tale symptom are all the algorithms that return a "take" for bidirectional ranges. "take" is nice and all, but in that case, I think it is subverting your range's type.

In particular, own of the "great power" of a bidirectional range, is to define a sequence by "all elements between a start and end point". What's cool about this definition is that it means that if you insert new element between the bounds, they will be taken into account by the range.

"Take" subverts that in that the sequence becomes a "start + index", so inserting new items in the underlying _container_ will just bump the last items out of your rake range.

--------
Well, that's my rant anyways: Range are an incredibly powerful abstract notion, but when you want to get down and dirty with them, they tend to create a bit of friction compared to some lower level abstractions...

But overall: Worth it.
```
```On Thursday, 22 November 2012 at 06:52:55 UTC, Rob T wrote:
> FYI
>
> I'm getting these compile errors, must be the compiler flags I'm using?

Must be that bearophile and I are on windows (ergo 32), so "length" is of type "size_t", which is a "uint". This matches the key of the dictionary.

If you are linux, then you are probably 64, and the type of "dictionary.length - 1" gets promoted to ulong, and can't be placed back into the dictionary. I don't think this has anything to do with flags.

Placing a cast here "cast(uint)" would be the correct workaround I think.
```
```On 11/22/12 02:14, bearophile wrote:
> In the following program there are two versions of the compress function, that implement the same bare-bones version of the LZW (http://en.wikipedia.org/wiki/Lempel-Ziv-Welch ) compression algorithm.
>
> compress1 is the simpler version, while compress2 avoids some heap allocations using a Slice that is able to grow (in some of my benchmarks compress2 is about twice faster than compress1).

> import std.stdio, std.array;
>
> alias T = ubyte;
> alias Ta = immutable(T)[];
>
> int[] compress1(immutable T[] original) {
>     int[Ta] dictionary;
>     foreach (i; 0 .. 256)
>         dictionary[cast(Ta)[i]] = i;
>
>     Ta w;
>     int[] result;
>     foreach (ch; original) {
>         auto wc = w ~ ch;
>         if (wc in dictionary) {
>             w = wc;
>         } else {
>             result ~= dictionary[w];
>             dictionary[wc] = dictionary.length - 1;
>             w = [ch];
>         }
>     }
>
>     return w.empty ? result : (result ~ dictionary[w]);
> }
>
> int[] compress2(immutable T[] original) {
>     int[Ta] dictionary;
>     foreach (i; 0 .. 256)
>         dictionary[cast(Ta)[i]] = i;
>
>     struct Slice {
>         size_t start, end;
>         @property opSlice() {
>             return original[start .. end];
>         }
>         alias this = opSlice;
>     }
>
>     Slice w;
>     int[] result;
>     foreach (i; 0 .. original.length) {
>         auto wc = Slice(w.start, w.end + 1);
>         if (wc in dictionary) {
>             w = wc;
>         } else {
>             result ~= dictionary[w];
>             dictionary[wc] = dictionary.length - 1;
>             w = Slice(i, i + 1);
>         }
>     }
>
>     return w.empty ? result : (result ~ dictionary[w]);
> }
>
> void main() {
>     auto txt = cast(Ta)"TOBEORNOTTOBEORTOBEORNOT";
>     writeln(compress1(txt));
>     writeln(compress2(txt));
> }

> Is it possible to write similar code nicely & efficiently with ranges?

First, the code above is practically a textbook example on how /not/ to write readable and efficient code. So lets do a saner implementation:

import std.stdio, std.array, std.range;

OUT[] compress(OUT=int, R)(const R original) {
alias typeof(original[0..1]) Elems;
alias typeof(cast()Elems[0]) Elem;

IDAA!(OUT, const Elems) dictionary;
OUT[] result;
size_t start, end = 1;

while (end<=original.length) {
if (original[start..end] in dictionary)
{ ++end; continue; }
result ~= dictionary[original[start..end-1]];
dictionary[original[start..end]] = cast(OUT)dictionary.length;
start = end-1;
}

return original[start..end-1].empty ? result : (result ~ dictionary[original[start..end-1]]);
}

struct IDAA(VAL, IDX) {
VAL[IDX] map;

enum M = IDX[0].max;
static immutable VAL[M+1] idmap = {
VAL[M+1] a;
foreach (VAL c; 0..M)
a[c] = c;
return a;
}();

auto opIndex(const IDX idx) inout {
return idx.length==1 ? idmap[idx[0]] : map[idx];
}
auto opIndexAssign(const VAL val, const IDX idx) {
return idx.length==1 ? idx[0] : (map[idx] = val);
}
const opBinaryRight(string op:"in")(const IDX idx) /*inout*/ {
return idx.length==1 ? &idmap[idx[0]] : idx in map;
}
@property length() /*inout*/ { return map.length + idmap.length; }
}

This code is about an order of magnitude faster than both of your versions (which performed ~ equally here w/ gdc), as the initialization, allocations, copies and AA ops were completely dominating everything else.

Now, when we have something to compare against, rewriting it to use ranges is trivial:

OUT[] compress(OUT=int, R)(R original) {
IDAA!(OUT, const typeof(cast()original.front)[]) dictionary;
OUT[] result;
size_t l = 1;

while (original.length>=l) {
if (original.takeExactly(l) in dictionary)
{ ++l; continue; }
result ~= dictionary[original.takeExactly(l-1)];
dictionary[original.takeExactly(l)] = cast(OUT)dictionary.length;
original.popFrontN(l-1);
l = 1;
}

return original.empty ? result : (result ~ dictionary[original]);
}

And the (surprising, for me) result is that this code performs just as well as the "saner" version above.

So i guess the answer is that it is possible.
For some definition of "nicely", as I don't like takeExactly and popFrontN. ;)

artur
```
```On 11/22/2012 11:25 AM, monarch_dodra wrote:
> On Thursday, 22 November 2012 at 06:52:55 UTC, Rob T wrote:
>> FYI
>>
>> I'm getting these compile errors, must be the compiler flags I'm using?
>
> Must be that bearophile and I are on windows (ergo 32), so "length" is
> of type "size_t", which is a "uint". This matches the key of the
> dictionary.
>
> If you are linux, then you are probably 64, and the type of
> "dictionary.length - 1" gets promoted to ulong, and can't be placed back
> into the dictionary. I don't think this has anything to do with flags.
>
> Placing a cast here "cast(uint)" would be the correct workaround I think.

The flag is -m32.
```
```On Friday, November 23, 2012 01:52:51 Timon Gehr wrote:
> On 11/22/2012 11:25 AM, monarch_dodra wrote:
> > On Thursday, 22 November 2012 at 06:52:55 UTC, Rob T wrote:
> >> FYI
> >>
> >> I'm getting these compile errors, must be the compiler flags I'm using?
> >
> > Must be that bearophile and I are on windows (ergo 32), so "length" is of type "size_t", which is a "uint". This matches the key of the dictionary.
> >
> > If you are linux, then you are probably 64, and the type of "dictionary.length - 1" gets promoted to ulong, and can't be placed back into the dictionary. I don't think this has anything to do with flags.
> >
> > Placing a cast here "cast(uint)" would be the correct workaround I think.
>
> The flag is -m32.

That would be a very poor solution if the problem is that you're trying to assign a size_t to a uint. Sure, -m32 will make it compile a 32-bit binary which would then work, but well-written code doesn't generally care whether it's on a 32-bit or 64-bit machine, so if a variable of type size_t is being assigned to anything smaller than long, a cast or std.conv.to would be the correct way to handle it. Unless you really have to target a specific architecture, writing code which assumes what architecture it's in is generally bad practice. It should be as cross-platform as is reasonably possible, and D and Phobos are well set up to do that.

- Jonathan M Davis
```
```On 11/23/2012 03:01 AM, Jonathan M Davis wrote:
> On Friday, November 23, 2012 01:52:51 Timon Gehr wrote:
>> On 11/22/2012 11:25 AM, monarch_dodra wrote:
>>> On Thursday, 22 November 2012 at 06:52:55 UTC, Rob T wrote:
>>>> FYI
>>>>
>>>> I'm getting these compile errors, must be the compiler flags I'm using?
>>>
>>> Must be that bearophile and I are on windows (ergo 32), so "length" is
>>> of type "size_t", which is a "uint". This matches the key of the
>>> dictionary.
>>>
>>> If you are linux, then you are probably 64, and the type of
>>> "dictionary.length - 1" gets promoted to ulong, and can't be placed back
>>> into the dictionary. I don't think this has anything to do with flags.
>>>
>>> Placing a cast here "cast(uint)" would be the correct workaround I think.
>>
>> The flag is -m32.
>
> That would be a very poor solution if the problem is that you're trying to
> assign a size_t to a uint. Sure, -m32 will make it compile a 32-bit binary
> which would then work, but well-written code doesn't generally care whether
> it's on a 32-bit or 64-bit machine, so if a variable of type size_t is being
> assigned to anything smaller than long, a cast or std.conv.to would be the
> correct way to handle it. Unless you really have to target a specific
> architecture, writing code which assumes what architecture it's in is
> generally bad practice. It should be as cross-platform as is reasonably
> possible, and D and Phobos are well set up to do that.
>
> - Jonathan M Davis
>

Sure, but that is up to the author.
```