September 09, 2008
Sean Kelly wrote:
> Andrei Alexandrescu wrote:
> ...
>>
>> After much thought and discussions among Walter, Bartosz and myself, I defined a range design and reimplemented all of std.algorithm and much of std.stdio in terms of ranges alone.
> 
> Yup.  This is why I implemented all of Tango's algorithms specifically for arrays from the start--slices represent a reasonable approximation of ranges, and this seems far preferable to the iterator approach of C++.  Glad to hear that this is what you've decided as well.

That's great to hear, but I should warn you that moving from arrays to "the lowest range that will do" is not quite easy. Think of std::rotate for example.

>> This is quite a thorough test because the algorithms are diverse and stress-test the expressiveness and efficiency of the range design. Along the way I made the interesting realization that certain union/difference operations are needed as primitives for ranges. There are also a few bugs in the compiler and some needed language enhancements (e.g. returning a reference from a function); Walter is committed to implement them.
> 
> Very nice.  The inability to return a reference has been a thorn in my side for ages.
> 
>> I put together a short document for the range design. I definitely missed about a million things and have been imprecise about another million, so feedback would be highly appreciated. See:
>>
>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
> 
> It seems workable from a 1000' view.  I'll have to try and apply the approach to some algorithms and see if anything comes up.  So far, dealing with bidirectional ranges seems a bit weird, but that's likely more related to the syntax (ie. 'pop') than anything.
> 
> 
> Sean
> 
> P.S. This decision has interesting implications for D2+, given the functional tendencies already present in the language :-)

Wait until you see the generators! An efficient generator of Fibonacci numbers in one line...

auto fib = generate!("a[0] + a[1]")(1, 1);

I'm so excited I can hardly stand myself. :o)


Andrei
September 09, 2008
Denis Koroskin wrote:
> 1) There is a typo:
> 
> // Copies a range to another
> void copy(R1, R2)(R1 src, R2 tgt)
> {
>     while (!src.isEmpty)
>     {
>         tgt.putNext(r.getNext);  // should be tgt.putNext(src.getNext);
>     }
> }

Thanks! Fixed.

> 2) R.next and R.pop could have better names. I mean, they represent similar operations yet names are so different.

I agree. Next was a natural choice. I stole pop from Perl. Any symmetric and short operation names would be welcome.

> 3) Walter mentioned that built-in array could be re-implemented using a pair of pointers instead of ptr+length. Will it ever get a green light? It fits range concept much better.

Walter told me to first implement my design, and if it works, he'll do the change. Yes, it does fit ranges much better because the often-used next and, um, pop will only touch one word instead of two.

> 4) We need some way of supporting dollar notation in user containers. The hack of using __dollar is bad (although it works).

It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider:

auto x = matrix[$-1, $-1];

Here the dollar's occurrences have different meanings. A good start would be to expand the above into:

auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

> 5) I don't quite like names left and right! :) I think they should represent limits (pointers to begin and end, in case of array) rather that values. In this case, built-in arrays could be implemented as follows:
> 
> struct Array(T)
> {
>     T* left;
>     T* right;
>     size_t length() { return right-left; }
>     ref T opIndex(size_t index) { return left[index]; }
>     // etc
> }
> 
> The rationale behind having access to range limits is to allow operations on them. For example,
> R.left-=n;

I disagree. Defining operations on range limits opens a box that would make Pandora jealous:

1. What is the type of left in general? Um, let's define Iterator!(R) for each range R.

2. What are the primitives of an iterator? Well, -= sounds good. How do you check it for correctness? In fact, how do you check any operation of a naked iterator for correctness?

3. I want to play with some data. What should I use here, ranges or iterators? ...

Much of the smarts of the range design is that it gets away WITHOUT having to answer embarrassing questions such as the above. Ranges are rock-solid, and part of them being rock-solid is that they expose enough primitives to be complete, but at the same time do not expose dangerous internals.

> could be used instead of
> foreach(i; 0..n) {
>     R.pop();
> }
> 
> which is more efficient in many cases.

Stop right there. That's not a primitive. It is an algorithm that gets implemented in terms of a primitive. I disagree that such an algorithm is an operator and does not have a name such as popN.

> Other that that - great, I like it.

Thanks for your comments.


Andrei
September 09, 2008
Manfred_Nowak wrote:
> Andrei Alexandrescu wrote:
> 
>>  feedback would be highly appreciated
> 
> 1) Example in "4. Bidirectional range"
> 
> Reversing of ranges can be done in constant runtime, but the example exposes runtime linear in the number of elements.
> 
> This might be a hint, that a "6. Reversable Range" might be required, because a range reversable in constant time requires more space.

There are numerous collections and ranges to be defined, of course. The five-kinds taxonomy is time-tested and allows implementation of a great many algorithms. Beyond that, users can define many containers and ranges with additional operations or with improved properties of existing operations.

> 2) [left,right][Diff,Union]
> 
> Ranges are not sets; therefore not only me might have problems to capture the idea behind "difference" and "union" on ranges.

I am opened to better names. Bartosz talked me into the ones above. I used these:

leftToLeft
leftToRight
rightToRight

> Of course one can define whatever one wants, but I would prefer
> [sub,snip,cut,split,...][B,E][B,E] (r,s)
> 
> I.e. `subBB(r,s)' is the subrange of `r' starting at the beginning of `r' and ending at the beginning of `s' (including the beginning of `r', but not including the beginning of `s').
> 
> It my be of some worth to include the `B' or `E' as parameters to the  choosen keyword(?) to enable algorithmically accesses:
> 
> | sub(B,B,r,s)
> 
> instead of `leftDiff( r, s)'

I find these too cryptic, but to each their own. I predict that primitive names will become a bicycle shed. In the end we'll have to use enum :o).


Andrei

September 09, 2008
On Mon, Sep 8, 2008 at 8:24 PM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Sergey Gromov wrote:
>>
>> Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>>>
>>> Walter, Bartosz and myself have been hard at work trying to find the
>>> right abstraction for iteration. That abstraction would replace the infamous
>>> opApply and would allow for external iteration, thus paving the way to
>>> implementing real generic algorithms.
>>
>> opApply() wasn't my hero either. :)  Your article really looks like something I'd expect to find in D.  It only requires foreach support, and yeah, return by reference.
>
> Indeed. Both are in the works.

Quick question about this one -- how will iterators get foreach support?  Are they classes or structs?  If they're structs, how will the compiler know something is an iterator?  Or will it be based on duck typing (if it looks like an iterator, it must be an iterator)?

And if this support involves "blessing" certain types within the runtime, what will this mean for other runtime libraries?
September 09, 2008
Andrei Alexandrescu wrote:
> Sean Kelly wrote:
>> Andrei Alexandrescu wrote:
>> ...
>>>
>>> After much thought and discussions among Walter, Bartosz and myself, I defined a range design and reimplemented all of std.algorithm and much of std.stdio in terms of ranges alone.
>>
>> Yup.  This is why I implemented all of Tango's algorithms specifically for arrays from the start--slices represent a reasonable approximation of ranges, and this seems far preferable to the iterator approach of C++.  Glad to hear that this is what you've decided as well.
> 
> That's great to hear, but I should warn you that moving from arrays to "the lowest range that will do" is not quite easy. Think of std::rotate for example.

I'll admit that I find some of the features <algorithm> provides to be pretty weird.  Has anyone ever actually wanted to sort something other than a random-access range in C++?  Or rotate one, for example?  These operations are allowed, but to me they fall outside the realm of useful functionality.  I suppose there may be some relation here to Stepanov's idea of a computational basis.  Should an algorithm operate on a range if it cannot do so efficiently?  And even if it does, will anyone actually use it?


Sean
September 09, 2008
"Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message news:ga46ok$2s77$1@digitalmars.com...
> I put together a short document for the range design. I definitely missed about a million things and have been imprecise about another million, so feedback would be highly appreciated. See:
>
> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

This is just awesome. Thank you for tackling this issue.

I agree with others that some names are not so obvious. Left/right? How do Arabic speakers feel about this : ) Begin/end seems more intuitive.

Can you explain this please. From Input range:

e=r.getNext Reads an element and moves to the next one. Returns the read element, which is of type ElementType!(R). The call is defined only right after r.isEmpty returned false.

That last part: The call is defined only right after r.isEmpty returned false.

First of all, is* functions always sound "const" to me, but the way you describe isEmpty it sounds like it actually changes something, advancing a pointer or something like that. What happens if isEmpty is called twice? Will it skip 1 element?

The input range behaves like C#'s IEnumerator, but at least the C# names are more obvious: while (i.MoveNext()) e = i.Current; But isEmpty is common to all ranges, so I understand why it's the way it is. I just hope it could stay "const", not modifying the internal state. Perhaps add "next" to input ranges as well?

L. 

September 09, 2008
Sean Kelly wrote:
> Andrei Alexandrescu wrote:
>> Sean Kelly wrote:
>>> Andrei Alexandrescu wrote: ...
>>>> 
>>>> After much thought and discussions among Walter, Bartosz and
>>>> myself, I defined a range design and reimplemented all of
>>>> std.algorithm and much of std.stdio in terms of ranges alone.
>>> 
>>> Yup.  This is why I implemented all of Tango's algorithms specifically for arrays from the start--slices represent a
>>> reasonable approximation of ranges, and this seems far preferable
>>> to the iterator approach of C++.  Glad to hear that this is what
>>> you've decided as well.
>> 
>> That's great to hear, but I should warn you that moving from arrays
>> to "the lowest range that will do" is not quite easy. Think of std::rotate for example.
> 
> I'll admit that I find some of the features <algorithm> provides to
> be pretty weird.  Has anyone ever actually wanted to sort something
> other than a random-access range in C++?  Or rotate one, for example?

Great questions. I don't recall having needed to sort a list lately, but rotate is a great function that has an undeservedly odd name. What rotate does is to efficiently transform this:

a, b, c, d, e, f, A, B, C, D

into this:

A, B, C, D, a, b, c, d, e, f

I use that all the time because it's really a move-to-front operation. In fact my algorithm2 implementation does not call it rotate anymore, it calls it moveToFront and allows you to move any subrange of a range to the front of that range efficiently. It's a useful operation in a great deal of lookup strategies.

> These operations are allowed, but to me they fall outside the realm
> of useful functionality.  I suppose there may be some relation here
> to Stepanov's idea of a computational basis.  Should an algorithm
> operate on a range if it cannot do so efficiently?  And even if it
> does, will anyone actually use it?

I think it all depends on what one's day-to-day work consists of. I was chatting to Walter about it and he confessed that, although he has a great deal of respect for std.algorithm, he's not using much of it. I told him back that I need 80% of std.algorithm on a daily basis. In fact that's why I wrote it - otherwise I wouldn't have had the time to put into it.

This is because I make next to no money so I can afford to work on basic research, which is "important" in a long-ranging way. Today's computing is quite disorganized and great energy is expended on gluing together various pieces, protocols, and interfaces. I've worked in that environment quite a lot, and dealing with glue can easily become 90% of a day's work, leaving only little time to get occupied with a real problem, such as making a computer genuinely smarter or at least more helpful towards its user. All too often we put a few widgets on a window and the actual logic driving those buttons - the "smarts", the actual "work" gets drowned by details taking care of making that logic stick to the buttons.

I mentioned in a talk once that any programmer should know how to multiply two matrices. Why? Because if you don't, you can't tackle a variety of problems that can be easily expressed in terms of matrix multiplication, even though they have nothing to do with algebra (rotating figures, machine learning, fractals, fast series...). A person in the audience said that she never actually needs to multiply two matrices, so why bother? I gave an evasive response, but the reality was that that was a career-limiting state of affairs for her.


Andrei
September 09, 2008
On Mon, 08 Sep 2008 20:24:27 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> About the "collection should be a range itself" mantra, I've had a micro-epiphany. Since D's slices so nicely model at the same time arrays and their ranges, it is very seductive to think of carrying that to other collection types. But I got disabused of that notion as soon as I wanted to define a less simple data structure. Consider a matrix:
>
> auto a = BlockMatrix!(float, 3)(100, 200, 300);
>
> defines a block contiguous matrix of three dimensions with the respective sizes. Now a should be the matrix AND its range at the same time. But what's "the range" of a matrix? Oops. As soon as you start to think of it, so many darn ranges come to mind.
>
> * flat: all elements in one shot in an arbitrary order
>
> * dimension-wise: iterate over a given dimension
>
> * subspace: iterate over a "slice" of the matrix with fewer dimensions
>
> * diagonal: scan the matrix from one corner to the opposite corner
>
> I guess there are some more. So before long I realized that the most gainful design is this:
>
> a) A matrix owns its stuff and is preoccupied with storage internals, allocation, and the such.
>
> b) The matrix defines as many range types as it wants.
>
> c) Users use the ranges.
>
> For example:
>
> foreach (ref e; a.flat) e *= 1.1;
> foreach (row; a.dim(0)) row[0, 0] = 0;
> foreach (col; a.dim(1)) col[1, 1] *= 5;

I'd recommend a more clear cut example. Three of the ranges are very well defined in array languages and libraries. Essentially a slice of a matrix is another matrix that may have less or more dimensions and therefore may be a collection in addition to a range. The dimension-wise range is the only operation which is more complex, due to the type and dimensions of the returned array changing float[x,y,z] -> float[x,y][z]. And the main argument is that a float[x,y][z] is large, slow to create and unwanted, so a separate range/generator is better (Also note a generator can provide implicit the head const, tail mutable nature of the range). Even given this, it doesn't contratict the "collection should be a range itself" mantra, since there is a very well defined range which encompasses the data, its just that some ranges are more optimal if they're only views, and not a root collection.
September 09, 2008
On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Denis Koroskin wrote:
>> 3) Walter mentioned that built-in array could be re-implemented using a pair of pointers instead of ptr+length. Will it ever get a green light? It fits range concept much better.
>
> Walter told me to first implement my design, and if it works, he'll do the change. Yes, it does fit ranges much better because the often-used next and, um, pop will only touch one word instead of two.

I'd warn that changing away from ptr+length would create logical incosistencies between 1D arrays and 2D/3D/ND arrays.

>> 4) We need some way of supporting dollar notation in user containers. The hack of using __dollar is bad (although it works).
>
> It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider:
>
> auto x = matrix[$-1, $-1];
>
> Here the dollar's occurrences have different meanings. A good start would be to expand the above into:
>
> auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];

I'd also add that multiple dimension slicing should be supported. i.e.
auto x = matrix[2..5,0..$,3]
would become
auto x = matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3)
with
struct Slice (T) { T start; T end; }
Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the array
September 09, 2008
Robert Jacques wrote:
> On Mon, 08 Sep 2008 20:24:27 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> 
>> About the "collection should be a range itself" mantra, I've had a micro-epiphany. Since D's slices so nicely model at the same time arrays and their ranges, it is very seductive to think of carrying that to other collection types. But I got disabused of that notion as soon as I wanted to define a less simple data structure. Consider a matrix:
>>
>> auto a = BlockMatrix!(float, 3)(100, 200, 300);
>>
>> defines a block contiguous matrix of three dimensions with the respective sizes. Now a should be the matrix AND its range at the same time. But what's "the range" of a matrix? Oops. As soon as you start to think of it, so many darn ranges come to mind.
>>
>> * flat: all elements in one shot in an arbitrary order
>>
>> * dimension-wise: iterate over a given dimension
>>
>> * subspace: iterate over a "slice" of the matrix with fewer dimensions
>>
>> * diagonal: scan the matrix from one corner to the opposite corner
>>
>> I guess there are some more. So before long I realized that the most gainful design is this:
>>
>> a) A matrix owns its stuff and is preoccupied with storage internals, allocation, and the such.
>>
>> b) The matrix defines as many range types as it wants.
>>
>> c) Users use the ranges.
>>
>> For example:
>>
>> foreach (ref e; a.flat) e *= 1.1;
>> foreach (row; a.dim(0)) row[0, 0] = 0;
>> foreach (col; a.dim(1)) col[1, 1] *= 5;
> 
> I'd recommend a more clear cut example. Three of the ranges are very well defined in array languages and libraries. Essentially a slice of a matrix is another matrix that may have less or more dimensions and therefore may be a collection in addition to a range.

There are two problems with the view that a slice of a matrix is also a matrix:

1. If ownership is desired, then the slice does not own anything in the matrix, so that does not put it on equal footing with the matrix it started with.

2. Slicing a block matrix on a hyperplane will result in a strided range. That is not a block matrix at all. So again it is more useful to think of the block matrix as the store, and of various ranges crawling over it as ways to look at the matrix.

I agree that ranges could be shoehorned into working. But then all ownership is out of the window, and also it creates more confusion than it clears. Now for n possible ranges over a block matrix, you have a daunting task ahead. You'd need to be able to construct any from any other, otherwise you could get stuck with a range that's a sort of dead end.

That could be solved in a number of ways (e.g. force all range2range conversions to go through some "central" range) but by that time you start to realize that that design is not quite gainful. Besides, it only takes care of matrices, but not of various other nonlinear structures

The approach in which the container is preoccupied with storage and ownership, and it offers various ranges that view the container in various ways, sounds like the better design to me.

> The dimension-wise range is the only operation which is more complex, due to the type and dimensions of the returned array changing float[x,y,z] -> float[x,y][z]. And the main argument is that a float[x,y][z] is large, slow to create and unwanted, so a separate range/generator is better (Also note a generator can provide implicit the head const, tail mutable nature of the range). Even given this, it doesn't contratict the "collection should be a range itself" mantra, since there is a very well defined range which encompasses the data, its just that some ranges are more optimal if they're only views, and not a root collection.

Again, I agree a range-only view can be shoehorned into working. I just think it would be a bad design.


Andrei