View mode: basic / threaded / horizontal-split · Log in · Help
September 08, 2008
RFC on range design for D2
Hello,


Walter, Bartosz and myself have been hard at work trying to find the 
right abstraction for iteration. That abstraction would replace the 
infamous opApply and would allow for external iteration, thus paving the 
way to implementing real generic algorithms.

We considered an STL-style container/iterator design. Containers would 
use the newfangled value semantics to enforce ownership of their 
contents. Iterators would span containers in various ways.

The main problem with that approach was integrating built-in arrays into 
the design. STL's iterators are generalized pointers; D's built-in 
arrays are, however, not pointers, they are "pairs of pointers" that 
cover contiguous ranges in memory. Most people who've used D gained the 
intuition that slices are superior to pointers in many ways, such as 
easier checking for validity, higher-level compact primitives, 
streamlined and safe interface. However, if STL iterators are 
generalized pointers, what is the corresponding generalization of D's 
slices? Intuitively that generalization should also be superior to 
iterators.

In a related development, the Boost C++ library has defined ranges as 
pairs of two iterators and implemented a series of wrappers that accept 
ranges and forward their iterators to STL functions. The main outcome of 
Boost ranges been to decrease the verboseness and perils of naked 
iterator manipulation (see 
http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a 
C++ application using Boost could avail itself of containers, ranges, 
and iterators. The Boost notion of range is very close to a 
generalization of D's slice.

We have considered that design too, but that raised a nagging question. 
In most slice-based D programming, using bare pointers is not necessary. 
Could then there be a way to use _only_ ranges and eliminate iterators 
altogether? A container/range design would be much simpler than one also 
exposing iterators.

All these questions aside, there are several other imperfections in the 
STL, many caused by the underlying language. For example STL is 
incapable of distinguishing between input/output iterators and forward 
iterators. This is because C++ cannot reasonably implement a type with 
destructive copy semantics, which is what would be needed to make said 
distinction. We wanted the Phobos design to provide appropriate answers 
to such questions, too. This would be useful particularly because it 
would allow implementation of true and efficient I/O integrated with 
iteration. STL has made an attempt at that, but istream_iterator and 
ostream_iterator are, with all due respect, a joke that builds on 
another joke, the iostreams.

After much thought and discussions among Walter, Bartosz and myself, I 
defined a range design and reimplemented all of std.algorithm and much 
of std.stdio in terms of ranges alone. This is quite a thorough test 
because the algorithms are diverse and stress-test the expressiveness 
and efficiency of the range design. Along the way I made the interesting 
realization that certain union/difference operations are needed as 
primitives for ranges. There are also a few bugs in the compiler and 
some needed language enhancements (e.g. returning a reference from a 
function); Walter is committed to implement them.

I put together a short document for the range design. I definitely 
missed about a million things and have been imprecise about another 
million, so feedback would be highly appreciated. See:

http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html


Andrei
September 08, 2008
Re: RFC on range design for D2
On Mon, Sep 8, 2008 at 5:50 PM, Andrei Alexandrescu
<SeeWebsiteForEmail@erdani.org> wrote:
> Hello,
>
>
> Walter, Bartosz and myself have been hard at work trying to find the right
> abstraction for iteration. That abstraction would replace the infamous
> opApply and would allow for external iteration, thus paving the way to
> implementing real generic algorithms.
>
> We considered an STL-style container/iterator design. Containers would use
> the newfangled value semantics to enforce ownership of their contents.
> Iterators would span containers in various ways.
>
> The main problem with that approach was integrating built-in arrays into the
> design. STL's iterators are generalized pointers; D's built-in arrays are,
> however, not pointers, they are "pairs of pointers" that cover contiguous
> ranges in memory. Most people who've used D gained the intuition that slices
> are superior to pointers in many ways, such as easier checking for validity,
> higher-level compact primitives, streamlined and safe interface. However, if
> STL iterators are generalized pointers, what is the corresponding
> generalization of D's slices? Intuitively that generalization should also be
> superior to iterators.
>
> In a related development, the Boost C++ library has defined ranges as pairs
> of two iterators and implemented a series of wrappers that accept ranges and
> forward their iterators to STL functions. The main outcome of Boost ranges
> been to decrease the verboseness and perils of naked iterator manipulation
> (see http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a
> C++ application using Boost could avail itself of containers, ranges, and
> iterators. The Boost notion of range is very close to a generalization of
> D's slice.
>
> We have considered that design too, but that raised a nagging question. In
> most slice-based D programming, using bare pointers is not necessary. Could
> then there be a way to use _only_ ranges and eliminate iterators altogether?
> A container/range design would be much simpler than one also exposing
> iterators.
>
> All these questions aside, there are several other imperfections in the STL,
> many caused by the underlying language. For example STL is incapable of
> distinguishing between input/output iterators and forward iterators. This is
> because C++ cannot reasonably implement a type with destructive copy
> semantics, which is what would be needed to make said distinction. We wanted
> the Phobos design to provide appropriate answers to such questions, too.
> This would be useful particularly because it would allow implementation of
> true and efficient I/O integrated with iteration. STL has made an attempt at
> that, but istream_iterator and ostream_iterator are, with all due respect, a
> joke that builds on another joke, the iostreams.
>
> After much thought and discussions among Walter, Bartosz and myself, I
> defined a range design and reimplemented all of std.algorithm and much of
> std.stdio in terms of ranges alone. This is quite a thorough test because
> the algorithms are diverse and stress-test the expressiveness and efficiency
> of the range design. Along the way I made the interesting realization that
> certain union/difference operations are needed as primitives for ranges.
> There are also a few bugs in the compiler and some needed language
> enhancements (e.g. returning a reference from a function); Walter is
> committed to implement them.
>
> I put together a short document for the range design. I definitely missed
> about a million things and have been imprecise about another million, so
> feedback would be highly appreciated. See:
>
> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>
>
> Andrei
>

I like!
September 08, 2008
Re: RFC on range design for D2
Reply to Andrei,

> Hello,
> 
> Walter, Bartosz and myself have been hard at work trying to find the
> right abstraction for iteration. That abstraction would replace the
> infamous opApply and would allow for external iteration, thus paving
> the way to implementing real generic algorithms.

First of all, I /Like/ opApply. I know there are issues with it so I'd rather 
see it supplemented rather than replaced.
September 08, 2008
Re: RFC on range design for D2
Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Walter, Bartosz and myself have been hard at work trying to find the 
> right abstraction for iteration. That abstraction would replace the 
> infamous opApply and would allow for external iteration, thus paving the 
> way to implementing real generic algorithms.

opApply() wasn't my hero either. :)  Your article really looks like 
something I'd expect to find in D.  It only requires foreach support, 
and yeah, return by reference.

> I put together a short document for the range design. I definitely 
> missed about a million things and have been imprecise about another 
> million, so feedback would be highly appreciated. See:
> 
> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

- next operation not mentioned in section 3, forward ranges.

- the union operations look... weird.  Unobvious.  I'm too sleepy now to 
propose anything better but I'll definitely give it a try.  The rest of 
the interface seems very natural.

- what's a collection?  How do you get a range out of there?  Collection 
should be a range itself, like an array.  But it shouldn't be destroyed 
after passing it to foreach().  How to achieve this if foreach() 
essentially uses getNext()?

I'd really love to have this design in D though.
September 08, 2008
Re: RFC on range design for D2
On Tue, 09 Sep 2008 01:50:54 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> Hello,
>
>
> Walter, Bartosz and myself have been hard at work trying to find the  
> right abstraction for iteration. That abstraction would replace the  
> infamous opApply and would allow for external iteration, thus paving the  
> way to implementing real generic algorithms.
>
> We considered an STL-style container/iterator design. Containers would  
> use the newfangled value semantics to enforce ownership of their  
> contents. Iterators would span containers in various ways.
>
> The main problem with that approach was integrating built-in arrays into  
> the design. STL's iterators are generalized pointers; D's built-in  
> arrays are, however, not pointers, they are "pairs of pointers" that  
> cover contiguous ranges in memory. Most people who've used D gained the  
> intuition that slices are superior to pointers in many ways, such as  
> easier checking for validity, higher-level compact primitives,  
> streamlined and safe interface. However, if STL iterators are  
> generalized pointers, what is the corresponding generalization of D's  
> slices? Intuitively that generalization should also be superior to  
> iterators.
>
> In a related development, the Boost C++ library has defined ranges as  
> pairs of two iterators and implemented a series of wrappers that accept  
> ranges and forward their iterators to STL functions. The main outcome of  
> Boost ranges been to decrease the verboseness and perils of naked  
> iterator manipulation (see  
> http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a  
> C++ application using Boost could avail itself of containers, ranges,  
> and iterators. The Boost notion of range is very close to a  
> generalization of D's slice.
>
> We have considered that design too, but that raised a nagging question.  
> In most slice-based D programming, using bare pointers is not necessary.  
> Could then there be a way to use _only_ ranges and eliminate iterators  
> altogether? A container/range design would be much simpler than one also  
> exposing iterators.
>
> All these questions aside, there are several other imperfections in the  
> STL, many caused by the underlying language. For example STL is  
> incapable of distinguishing between input/output iterators and forward  
> iterators. This is because C++ cannot reasonably implement a type with  
> destructive copy semantics, which is what would be needed to make said  
> distinction. We wanted the Phobos design to provide appropriate answers  
> to such questions, too. This would be useful particularly because it  
> would allow implementation of true and efficient I/O integrated with  
> iteration. STL has made an attempt at that, but istream_iterator and  
> ostream_iterator are, with all due respect, a joke that builds on  
> another joke, the iostreams.
>
> After much thought and discussions among Walter, Bartosz and myself, I  
> defined a range design and reimplemented all of std.algorithm and much  
> of std.stdio in terms of ranges alone. This is quite a thorough test  
> because the algorithms are diverse and stress-test the expressiveness  
> and efficiency of the range design. Along the way I made the interesting  
> realization that certain union/difference operations are needed as  
> primitives for ranges. There are also a few bugs in the compiler and  
> some needed language enhancements (e.g. returning a reference from a  
> function); Walter is committed to implement them.
>
> I put together a short document for the range design. I definitely  
> missed about a million things and have been imprecise about another  
> million, so feedback would be highly appreciated. See:
>
> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>
>
> Andrei

1) There is a typo:

// Copies a range to another
void copy(R1, R2)(R1 src, R2 tgt)
{
    while (!src.isEmpty)
    {
        tgt.putNext(r.getNext);  // should be tgt.putNext(src.getNext);
    }
}

2) R.next and R.pop could have better names. I mean, they represent  
similar operations yet names are so different.

3) Walter mentioned that built-in array could be re-implemented using a  
pair of pointers instead of ptr+length. Will it ever get a green light? It  
fits range concept much better.

4) We need some way of supporting dollar notation in user containers. The  
hack of using __dollar is bad (although it works).

5) I don't quite like names left and right! :) I think they should  
represent limits (pointers to begin and end, in case of array) rather that  
values. In this case, built-in arrays could be implemented as follows:

struct Array(T)
{
    T* left;
    T* right;
    size_t length() { return right-left; }
    ref T opIndex(size_t index) { return left[index]; }
    // etc
}

The rationale behind having access to range limits is to allow operations  
on them. For example,
R.left-=n;

could be used instead of
foreach(i; 0..n) {
    R.pop();
}

which is more efficient in many cases.

Other that that - great, I like it.
September 08, 2008
Re: RFC on range design for D2
BCS wrote:
> Reply to Andrei,
> 
>> Hello,
>>
>> Walter, Bartosz and myself have been hard at work trying to find the
>> right abstraction for iteration. That abstraction would replace the
>> infamous opApply and would allow for external iteration, thus paving
>> the way to implementing real generic algorithms.
> 
> First of all, I /Like/ opApply. I know there are issues with it so I'd 
> rather see it supplemented rather than replaced.

We all like the way it looks. Ranges will preserve its syntax within a 
much more efficient and expressive implementation.

Andrei
September 08, 2008
Re: RFC on range design for D2
Andrei Alexandrescu wrote:
...
> 
> After much thought and discussions among Walter, Bartosz and myself, I 
> defined a range design and reimplemented all of std.algorithm and much 
> of std.stdio in terms of ranges alone.

Yup.  This is why I implemented all of Tango's algorithms specifically 
for arrays from the start--slices represent a reasonable approximation 
of ranges, and this seems far preferable to the iterator approach of 
C++.  Glad to hear that this is what you've decided as well.

> This is quite a thorough test 
> because the algorithms are diverse and stress-test the expressiveness 
> and efficiency of the range design. Along the way I made the interesting 
> realization that certain union/difference operations are needed as 
> primitives for ranges. There are also a few bugs in the compiler and 
> some needed language enhancements (e.g. returning a reference from a 
> function); Walter is committed to implement them.

Very nice.  The inability to return a reference has been a thorn in my 
side for ages.

> I put together a short document for the range design. I definitely 
> missed about a million things and have been imprecise about another 
> million, so feedback would be highly appreciated. See:
> 
> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

It seems workable from a 1000' view.  I'll have to try and apply the 
approach to some algorithms and see if anything comes up.  So far, 
dealing with bidirectional ranges seems a bit weird, but that's likely 
more related to the syntax (ie. 'pop') than anything.


Sean

P.S. This decision has interesting implications for D2+, given the 
functional tendencies already present in the language :-)
September 08, 2008
Re: RFC on range design for D2
Andrei Alexandrescu wrote:

>  feedback would be highly appreciated

1) Example in "4. Bidirectional range"

Reversing of ranges can be done in constant runtime, but the example 
exposes runtime linear in the number of elements.

This might be a hint, that a "6. Reversable Range" might be required, 
because a range reversable in constant time requires more space.


2) [left,right][Diff,Union]

Ranges are not sets; therefore not only me might have problems to 
capture the idea behind "difference" and "union" on ranges.

Of course one can define whatever one wants, but I would prefer
[sub,snip,cut,split,...][B,E][B,E] (r,s)

I.e. `subBB(r,s)' is the subrange of `r' starting at the beginning of 
`r' and ending at the beginning of `s' (including the beginning of `r', 
but not including the beginning of `s').

It my be of some worth to include the `B' or `E' as parameters to the  
choosen keyword(?) to enable algorithmically accesses:

| sub(B,B,r,s)

instead of `leftDiff( r, s)'

-manfred
-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)
September 09, 2008
Re: RFC on range design for D2
Sergey Gromov wrote:
> Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>> Walter, Bartosz and myself have been hard at work trying to find the 
>> right abstraction for iteration. That abstraction would replace the 
>> infamous opApply and would allow for external iteration, thus paving the 
>> way to implementing real generic algorithms.
> 
> opApply() wasn't my hero either. :)  Your article really looks like 
> something I'd expect to find in D.  It only requires foreach support, 
> and yeah, return by reference.

Indeed. Both are in the works.

>> I put together a short document for the range design. I definitely 
>> missed about a million things and have been imprecise about another 
>> million, so feedback would be highly appreciated. See:
>>
>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
> 
> - next operation not mentioned in section 3, forward ranges.

Thanks! Fixed.

> - the union operations look... weird.  Unobvious.  I'm too sleepy now to 
> propose anything better but I'll definitely give it a try.  The rest of 
> the interface seems very natural.

I agree I hadn't known what primitives would be needed when I sat down. 
Clearly there was a need for some since individual iterators are not 
available anymore. New ideas would be great; I suggest you validate them 
by implementing some nontrivial algorithms in std.algorithm with your, 
um, computational basis of choice :o).

> - what's a collection?  How do you get a range out of there?  Collection 
> should be a range itself, like an array.  But it shouldn't be destroyed 
> after passing it to foreach().  How to achieve this if foreach() 
> essentially uses getNext()?

These are great questions, I'm glad you asked. The way I see things, D2 
ranges can be of two kinds: owned and unowned. For example D1's ranges 
are all unowned:

auto a = new int[100];
...

This array is unowned because it going out of scope leaves the 
underlying array in place. Now consider:

scope a = new int[100];

In this case the array is owned by the current scope. Scoped data is a 
very crude form of ownership that IMHO brought more trouble than it 
solved. It's a huge hole smack in the middle of everything, and we plan 
to revisit it as soon as we can.

A better design would be to define collections that own their contents. 
For example:

Array!(int) a(100);

This time a does own the underlying array. You'd be able to get ranges 
all over it:

int[] b = a.all;

So now we have two nice notions: Arrays own the data. Ranges walk over 
that data. An array can have many ranges crawling over it. But two 
arrays never overlap. The contents of the array will be destroyed (BUT 
NOT DELETED) when a goes out of scope.

What's the deal with destroyed but not deleted? Consider:

int[] a;
if (condition) {
   Array!(int) b;
   a = b.all;
}
writeln(a);

This is a misuse of the array in that a range crawling on its back has 
survived the array itself. What should happen now? Looking at other 
languages:

1) All Java objects are unowned, meaning the issue does not appear in 
the first place, which is an advantage. The disadvantage is that scarce 
resources must be carefully managed by hand in client code.

2) C++ makes the behavior undefined because it destroys data AND 
recycles memory as soon as the array goes out of scope. Mayhem ensues.

We'd like:

1.5) Allow object ownership but also make the behavior of incorrect code 
well-defined so it can be reasoned about, reproduced, and debugged.

That's why I think an Array going out of scope should invoke destructors 
for its data, and then obliterate contents with ElementType.init. That 
way, an Array!(File) will properly close all files AND put them in a 
"closed" state. At the same time, the memory associated with the array 
will NOT be deallocated, so a range surviving the array will never crash 
unrelated code, but instead will see closed files all over.

In the case of int, there is no destructor so none of that happens. 
Surviving ranges will continue looking at the contents, which is now 
unowned.

So there is a difference in the way data with destructors and data 
without destructors is handled. I don't like that, but this is the most 
reasonably effective design I came up with so far.

About the "collection should be a range itself" mantra, I've had a 
micro-epiphany. Since D's slices so nicely model at the same time arrays 
and their ranges, it is very seductive to think of carrying that to 
other collection types. But I got disabused of that notion as soon as I 
wanted to define a less simple data structure. Consider a matrix:

auto a = BlockMatrix!(float, 3)(100, 200, 300);

defines a block contiguous matrix of three dimensions with the 
respective sizes. Now a should be the matrix AND its range at the same 
time. But what's "the range" of a matrix? Oops. As soon as you start to 
think of it, so many darn ranges come to mind.

* flat: all elements in one shot in an arbitrary order

* dimension-wise: iterate over a given dimension

* subspace: iterate over a "slice" of the matrix with fewer dimensions

* diagonal: scan the matrix from one corner to the opposite corner

I guess there are some more. So before long I realized that the most 
gainful design is this:

a) A matrix owns its stuff and is preoccupied with storage internals, 
allocation, and the such.

b) The matrix defines as many range types as it wants.

c) Users use the ranges.

For example:

foreach (ref e; a.flat) e *= 1.1;
foreach (row; a.dim(0)) row[0, 0] = 0;
foreach (col; a.dim(1)) col[1, 1] *= 5;

and so on.

Inevitably naysayers will, well, naysay: D defined a built-in array, but 
it also needs Array, so built-in arrays turned out to be useless. So how 
is that better than C++ which has pointers and vector? Walter has long 
feared such naysaying and opposed addition of user-defined array types 
to Phobos. But now I am fully prepared to un-naysay the naysayers: 
built-in slices are a superior building block to naked pointers. They 
are in fact embodying a powerful concept, that of a range. With ranges 
everything there is can be built efficiently and safely. Finally, 
garbage collection helps by ensuring object ownership while preserving 
well-definedness of incorrect code.


Andrei
September 09, 2008
Re: RFC on range design for D2
Reply to Andrei,

> BCS wrote:
> 
>> Reply to Andrei,
>> 
>>> Hello,
>>> 
>>> Walter, Bartosz and myself have been hard at work trying to find the
>>> right abstraction for iteration. That abstraction would replace the
>>> infamous opApply and would allow for external iteration, thus paving
>>> the way to implementing real generic algorithms.
>>> 
>> First of all, I /Like/ opApply. I know there are issues with it so
>> I'd rather see it supplemented rather than replaced.
>> 
> We all like the way it looks. Ranges will preserve its syntax within a
> much more efficient and expressive implementation.
> 
> Andrei
> 

I was referring to the implementation as visible from the called code's side
« First   ‹ Prev
1 2 3 4 5
Top | Discussion index | About this forum | D home