September 08, 2008
Hello,


Walter, Bartosz and myself have been hard at work trying to find the right abstraction for iteration. That abstraction would replace the infamous opApply and would allow for external iteration, thus paving the way to implementing real generic algorithms.

We considered an STL-style container/iterator design. Containers would use the newfangled value semantics to enforce ownership of their contents. Iterators would span containers in various ways.

The main problem with that approach was integrating built-in arrays into the design. STL's iterators are generalized pointers; D's built-in arrays are, however, not pointers, they are "pairs of pointers" that cover contiguous ranges in memory. Most people who've used D gained the intuition that slices are superior to pointers in many ways, such as easier checking for validity, higher-level compact primitives, streamlined and safe interface. However, if STL iterators are generalized pointers, what is the corresponding generalization of D's slices? Intuitively that generalization should also be superior to iterators.

In a related development, the Boost C++ library has defined ranges as pairs of two iterators and implemented a series of wrappers that accept ranges and forward their iterators to STL functions. The main outcome of Boost ranges been to decrease the verboseness and perils of naked iterator manipulation (see http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a C++ application using Boost could avail itself of containers, ranges, and iterators. The Boost notion of range is very close to a generalization of D's slice.

We have considered that design too, but that raised a nagging question. In most slice-based D programming, using bare pointers is not necessary. Could then there be a way to use _only_ ranges and eliminate iterators altogether? A container/range design would be much simpler than one also exposing iterators.

All these questions aside, there are several other imperfections in the STL, many caused by the underlying language. For example STL is incapable of distinguishing between input/output iterators and forward iterators. This is because C++ cannot reasonably implement a type with destructive copy semantics, which is what would be needed to make said distinction. We wanted the Phobos design to provide appropriate answers to such questions, too. This would be useful particularly because it would allow implementation of true and efficient I/O integrated with iteration. STL has made an attempt at that, but istream_iterator and ostream_iterator are, with all due respect, a joke that builds on another joke, the iostreams.

After much thought and discussions among Walter, Bartosz and myself, I defined a range design and reimplemented all of std.algorithm and much of std.stdio in terms of ranges alone. This is quite a thorough test because the algorithms are diverse and stress-test the expressiveness and efficiency of the range design. Along the way I made the interesting realization that certain union/difference operations are needed as primitives for ranges. There are also a few bugs in the compiler and some needed language enhancements (e.g. returning a reference from a function); Walter is committed to implement them.

I put together a short document for the range design. I definitely missed about a million things and have been imprecise about another million, so feedback would be highly appreciated. See:

http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html


Andrei
September 08, 2008
On Mon, Sep 8, 2008 at 5:50 PM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Hello,
>
>
> Walter, Bartosz and myself have been hard at work trying to find the right abstraction for iteration. That abstraction would replace the infamous opApply and would allow for external iteration, thus paving the way to implementing real generic algorithms.
>
> We considered an STL-style container/iterator design. Containers would use the newfangled value semantics to enforce ownership of their contents. Iterators would span containers in various ways.
>
> The main problem with that approach was integrating built-in arrays into the design. STL's iterators are generalized pointers; D's built-in arrays are, however, not pointers, they are "pairs of pointers" that cover contiguous ranges in memory. Most people who've used D gained the intuition that slices are superior to pointers in many ways, such as easier checking for validity, higher-level compact primitives, streamlined and safe interface. However, if STL iterators are generalized pointers, what is the corresponding generalization of D's slices? Intuitively that generalization should also be superior to iterators.
>
> In a related development, the Boost C++ library has defined ranges as pairs of two iterators and implemented a series of wrappers that accept ranges and forward their iterators to STL functions. The main outcome of Boost ranges been to decrease the verboseness and perils of naked iterator manipulation (see http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a C++ application using Boost could avail itself of containers, ranges, and iterators. The Boost notion of range is very close to a generalization of D's slice.
>
> We have considered that design too, but that raised a nagging question. In most slice-based D programming, using bare pointers is not necessary. Could then there be a way to use _only_ ranges and eliminate iterators altogether? A container/range design would be much simpler than one also exposing iterators.
>
> All these questions aside, there are several other imperfections in the STL, many caused by the underlying language. For example STL is incapable of distinguishing between input/output iterators and forward iterators. This is because C++ cannot reasonably implement a type with destructive copy semantics, which is what would be needed to make said distinction. We wanted the Phobos design to provide appropriate answers to such questions, too. This would be useful particularly because it would allow implementation of true and efficient I/O integrated with iteration. STL has made an attempt at that, but istream_iterator and ostream_iterator are, with all due respect, a joke that builds on another joke, the iostreams.
>
> After much thought and discussions among Walter, Bartosz and myself, I defined a range design and reimplemented all of std.algorithm and much of std.stdio in terms of ranges alone. This is quite a thorough test because the algorithms are diverse and stress-test the expressiveness and efficiency of the range design. Along the way I made the interesting realization that certain union/difference operations are needed as primitives for ranges. There are also a few bugs in the compiler and some needed language enhancements (e.g. returning a reference from a function); Walter is committed to implement them.
>
> I put together a short document for the range design. I definitely missed about a million things and have been imprecise about another million, so feedback would be highly appreciated. See:
>
> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>
>
> Andrei
>

I like!
September 08, 2008
Reply to Andrei,

> Hello,
> 
> Walter, Bartosz and myself have been hard at work trying to find the
> right abstraction for iteration. That abstraction would replace the
> infamous opApply and would allow for external iteration, thus paving
> the way to implementing real generic algorithms.

First of all, I /Like/ opApply. I know there are issues with it so I'd rather see it supplemented rather than replaced.


September 08, 2008
Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Walter, Bartosz and myself have been hard at work trying to find the right abstraction for iteration. That abstraction would replace the infamous opApply and would allow for external iteration, thus paving the way to implementing real generic algorithms.

opApply() wasn't my hero either. :)  Your article really looks like something I'd expect to find in D.  It only requires foreach support, and yeah, return by reference.

> I put together a short document for the range design. I definitely missed about a million things and have been imprecise about another million, so feedback would be highly appreciated. See:
> 
> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

- next operation not mentioned in section 3, forward ranges.

- the union operations look... weird.  Unobvious.  I'm too sleepy now to propose anything better but I'll definitely give it a try.  The rest of the interface seems very natural.

- what's a collection?  How do you get a range out of there?  Collection should be a range itself, like an array.  But it shouldn't be destroyed after passing it to foreach().  How to achieve this if foreach() essentially uses getNext()?

I'd really love to have this design in D though.
September 08, 2008
On Tue, 09 Sep 2008 01:50:54 +0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Hello,
>
>
> Walter, Bartosz and myself have been hard at work trying to find the right abstraction for iteration. That abstraction would replace the infamous opApply and would allow for external iteration, thus paving the way to implementing real generic algorithms.
>
> We considered an STL-style container/iterator design. Containers would use the newfangled value semantics to enforce ownership of their contents. Iterators would span containers in various ways.
>
> The main problem with that approach was integrating built-in arrays into the design. STL's iterators are generalized pointers; D's built-in arrays are, however, not pointers, they are "pairs of pointers" that cover contiguous ranges in memory. Most people who've used D gained the intuition that slices are superior to pointers in many ways, such as easier checking for validity, higher-level compact primitives, streamlined and safe interface. However, if STL iterators are generalized pointers, what is the corresponding generalization of D's slices? Intuitively that generalization should also be superior to iterators.
>
> In a related development, the Boost C++ library has defined ranges as pairs of two iterators and implemented a series of wrappers that accept ranges and forward their iterators to STL functions. The main outcome of Boost ranges been to decrease the verboseness and perils of naked iterator manipulation (see http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a C++ application using Boost could avail itself of containers, ranges, and iterators. The Boost notion of range is very close to a generalization of D's slice.
>
> We have considered that design too, but that raised a nagging question. In most slice-based D programming, using bare pointers is not necessary. Could then there be a way to use _only_ ranges and eliminate iterators altogether? A container/range design would be much simpler than one also exposing iterators.
>
> All these questions aside, there are several other imperfections in the STL, many caused by the underlying language. For example STL is incapable of distinguishing between input/output iterators and forward iterators. This is because C++ cannot reasonably implement a type with destructive copy semantics, which is what would be needed to make said distinction. We wanted the Phobos design to provide appropriate answers to such questions, too. This would be useful particularly because it would allow implementation of true and efficient I/O integrated with iteration. STL has made an attempt at that, but istream_iterator and ostream_iterator are, with all due respect, a joke that builds on another joke, the iostreams.
>
> After much thought and discussions among Walter, Bartosz and myself, I defined a range design and reimplemented all of std.algorithm and much of std.stdio in terms of ranges alone. This is quite a thorough test because the algorithms are diverse and stress-test the expressiveness and efficiency of the range design. Along the way I made the interesting realization that certain union/difference operations are needed as primitives for ranges. There are also a few bugs in the compiler and some needed language enhancements (e.g. returning a reference from a function); Walter is committed to implement them.
>
> I put together a short document for the range design. I definitely missed about a million things and have been imprecise about another million, so feedback would be highly appreciated. See:
>
> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>
>
> Andrei

1) There is a typo:

// Copies a range to another
void copy(R1, R2)(R1 src, R2 tgt)
{
    while (!src.isEmpty)
    {
        tgt.putNext(r.getNext);  // should be tgt.putNext(src.getNext);
    }
}

2) R.next and R.pop could have better names. I mean, they represent similar operations yet names are so different.

3) Walter mentioned that built-in array could be re-implemented using a pair of pointers instead of ptr+length. Will it ever get a green light? It fits range concept much better.

4) We need some way of supporting dollar notation in user containers. The hack of using __dollar is bad (although it works).

5) I don't quite like names left and right! :) I think they should represent limits (pointers to begin and end, in case of array) rather that values. In this case, built-in arrays could be implemented as follows:

struct Array(T)
{
    T* left;
    T* right;
    size_t length() { return right-left; }
    ref T opIndex(size_t index) { return left[index]; }
    // etc
}

The rationale behind having access to range limits is to allow operations on them. For example,
R.left-=n;

could be used instead of
foreach(i; 0..n) {
    R.pop();
}

which is more efficient in many cases.

Other that that - great, I like it.
September 08, 2008
BCS wrote:
> Reply to Andrei,
> 
>> Hello,
>>
>> Walter, Bartosz and myself have been hard at work trying to find the
>> right abstraction for iteration. That abstraction would replace the
>> infamous opApply and would allow for external iteration, thus paving
>> the way to implementing real generic algorithms.
> 
> First of all, I /Like/ opApply. I know there are issues with it so I'd rather see it supplemented rather than replaced.

We all like the way it looks. Ranges will preserve its syntax within a much more efficient and expressive implementation.

Andrei
September 08, 2008
Andrei Alexandrescu wrote:
...
> 
> After much thought and discussions among Walter, Bartosz and myself, I defined a range design and reimplemented all of std.algorithm and much of std.stdio in terms of ranges alone.

Yup.  This is why I implemented all of Tango's algorithms specifically for arrays from the start--slices represent a reasonable approximation of ranges, and this seems far preferable to the iterator approach of C++.  Glad to hear that this is what you've decided as well.

> This is quite a thorough test because the algorithms are diverse and stress-test the expressiveness and efficiency of the range design. Along the way I made the interesting realization that certain union/difference operations are needed as primitives for ranges. There are also a few bugs in the compiler and some needed language enhancements (e.g. returning a reference from a function); Walter is committed to implement them.

Very nice.  The inability to return a reference has been a thorn in my side for ages.

> I put together a short document for the range design. I definitely missed about a million things and have been imprecise about another million, so feedback would be highly appreciated. See:
> 
> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

It seems workable from a 1000' view.  I'll have to try and apply the approach to some algorithms and see if anything comes up.  So far, dealing with bidirectional ranges seems a bit weird, but that's likely more related to the syntax (ie. 'pop') than anything.


Sean

P.S. This decision has interesting implications for D2+, given the functional tendencies already present in the language :-)
September 08, 2008
Andrei Alexandrescu wrote:

>  feedback would be highly appreciated

1) Example in "4. Bidirectional range"

Reversing of ranges can be done in constant runtime, but the example exposes runtime linear in the number of elements.

This might be a hint, that a "6. Reversable Range" might be required, because a range reversable in constant time requires more space.


2) [left,right][Diff,Union]

Ranges are not sets; therefore not only me might have problems to capture the idea behind "difference" and "union" on ranges.

Of course one can define whatever one wants, but I would prefer [sub,snip,cut,split,...][B,E][B,E] (r,s)

I.e. `subBB(r,s)' is the subrange of `r' starting at the beginning of `r' and ending at the beginning of `s' (including the beginning of `r', but not including the beginning of `s').

It my be of some worth to include the `B' or `E' as parameters to the choosen keyword(?) to enable algorithmically accesses:

| sub(B,B,r,s)

instead of `leftDiff( r, s)'

-manfred
-- 
If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)

September 09, 2008
Sergey Gromov wrote:
> Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>> Walter, Bartosz and myself have been hard at work trying to find the right abstraction for iteration. That abstraction would replace the infamous opApply and would allow for external iteration, thus paving the way to implementing real generic algorithms.
> 
> opApply() wasn't my hero either. :)  Your article really looks like something I'd expect to find in D.  It only requires foreach support, and yeah, return by reference.

Indeed. Both are in the works.

>> I put together a short document for the range design. I definitely missed about a million things and have been imprecise about another million, so feedback would be highly appreciated. See:
>>
>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
> 
> - next operation not mentioned in section 3, forward ranges.

Thanks! Fixed.

> - the union operations look... weird.  Unobvious.  I'm too sleepy now to propose anything better but I'll definitely give it a try.  The rest of the interface seems very natural.

I agree I hadn't known what primitives would be needed when I sat down. Clearly there was a need for some since individual iterators are not available anymore. New ideas would be great; I suggest you validate them by implementing some nontrivial algorithms in std.algorithm with your, um, computational basis of choice :o).

> - what's a collection?  How do you get a range out of there?  Collection should be a range itself, like an array.  But it shouldn't be destroyed after passing it to foreach().  How to achieve this if foreach() essentially uses getNext()?

These are great questions, I'm glad you asked. The way I see things, D2 ranges can be of two kinds: owned and unowned. For example D1's ranges are all unowned:

auto a = new int[100];
...

This array is unowned because it going out of scope leaves the underlying array in place. Now consider:

scope a = new int[100];

In this case the array is owned by the current scope. Scoped data is a very crude form of ownership that IMHO brought more trouble than it solved. It's a huge hole smack in the middle of everything, and we plan to revisit it as soon as we can.

A better design would be to define collections that own their contents. For example:

Array!(int) a(100);

This time a does own the underlying array. You'd be able to get ranges all over it:

int[] b = a.all;

So now we have two nice notions: Arrays own the data. Ranges walk over that data. An array can have many ranges crawling over it. But two arrays never overlap. The contents of the array will be destroyed (BUT NOT DELETED) when a goes out of scope.

What's the deal with destroyed but not deleted? Consider:

int[] a;
if (condition) {
   Array!(int) b;
   a = b.all;
}
writeln(a);

This is a misuse of the array in that a range crawling on its back has survived the array itself. What should happen now? Looking at other languages:

1) All Java objects are unowned, meaning the issue does not appear in the first place, which is an advantage. The disadvantage is that scarce resources must be carefully managed by hand in client code.

2) C++ makes the behavior undefined because it destroys data AND recycles memory as soon as the array goes out of scope. Mayhem ensues.

We'd like:

1.5) Allow object ownership but also make the behavior of incorrect code well-defined so it can be reasoned about, reproduced, and debugged.

That's why I think an Array going out of scope should invoke destructors for its data, and then obliterate contents with ElementType.init. That way, an Array!(File) will properly close all files AND put them in a "closed" state. At the same time, the memory associated with the array will NOT be deallocated, so a range surviving the array will never crash unrelated code, but instead will see closed files all over.

In the case of int, there is no destructor so none of that happens. Surviving ranges will continue looking at the contents, which is now unowned.

So there is a difference in the way data with destructors and data without destructors is handled. I don't like that, but this is the most reasonably effective design I came up with so far.

About the "collection should be a range itself" mantra, I've had a micro-epiphany. Since D's slices so nicely model at the same time arrays and their ranges, it is very seductive to think of carrying that to other collection types. But I got disabused of that notion as soon as I wanted to define a less simple data structure. Consider a matrix:

auto a = BlockMatrix!(float, 3)(100, 200, 300);

defines a block contiguous matrix of three dimensions with the respective sizes. Now a should be the matrix AND its range at the same time. But what's "the range" of a matrix? Oops. As soon as you start to think of it, so many darn ranges come to mind.

* flat: all elements in one shot in an arbitrary order

* dimension-wise: iterate over a given dimension

* subspace: iterate over a "slice" of the matrix with fewer dimensions

* diagonal: scan the matrix from one corner to the opposite corner

I guess there are some more. So before long I realized that the most gainful design is this:

a) A matrix owns its stuff and is preoccupied with storage internals, allocation, and the such.

b) The matrix defines as many range types as it wants.

c) Users use the ranges.

For example:

foreach (ref e; a.flat) e *= 1.1;
foreach (row; a.dim(0)) row[0, 0] = 0;
foreach (col; a.dim(1)) col[1, 1] *= 5;

and so on.

Inevitably naysayers will, well, naysay: D defined a built-in array, but it also needs Array, so built-in arrays turned out to be useless. So how is that better than C++ which has pointers and vector? Walter has long feared such naysaying and opposed addition of user-defined array types to Phobos. But now I am fully prepared to un-naysay the naysayers: built-in slices are a superior building block to naked pointers. They are in fact embodying a powerful concept, that of a range. With ranges everything there is can be built efficiently and safely. Finally, garbage collection helps by ensuring object ownership while preserving well-definedness of incorrect code.


Andrei
September 09, 2008
Reply to Andrei,

> BCS wrote:
> 
>> Reply to Andrei,
>> 
>>> Hello,
>>> 
>>> Walter, Bartosz and myself have been hard at work trying to find the
>>> right abstraction for iteration. That abstraction would replace the
>>> infamous opApply and would allow for external iteration, thus paving
>>> the way to implementing real generic algorithms.
>>> 
>> First of all, I /Like/ opApply. I know there are issues with it so
>> I'd rather see it supplemented rather than replaced.
>> 
> We all like the way it looks. Ranges will preserve its syntax within a
> much more efficient and expressive implementation.
> 
> Andrei
> 

I was referring to the implementation as visible from the called code's side


« First   ‹ Prev
1 2 3 4 5 6 7 8 9 10 11
Top | Discussion index | About this forum | D home