RFC on range design for D2 (page 5) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » RFC on range design for D2 (page 5)

September 09, 2008

Re: RFC on range design for D2

Posted by Sergey Gromov
in reply to Andrei Alexandrescu

Sergey Gromov

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Sergey Gromov wrote:
> > - what's a collection?  How do you get a range out of there?  Collection should be a range itself, like an array.  But it shouldn't be destroyed after passing it to foreach().  How to achieve this if foreach() essentially uses getNext()?
> 
> These are great questions, I'm glad you asked. The way I see things, D2 ranges can be of two kinds: owned and unowned. For example D1's ranges are all unowned:
> 
> [snip]
> 
> A better design would be to define collections that own their contents. For example:
> 
> Array!(int) a(100);
> 
> This time a does own the underlying array. You'd be able to get ranges all over it:
> 
> int[] b = a.all;

I really don't like to have basic language constructs implemented as templates.  It's like Tuple!() which is sorta basic type but requires template trickery to really work with it.

> So now we have two nice notions: Arrays own the data. Ranges walk over that data. An array can have many ranges crawling over it. But two arrays never overlap. The contents of the array will be destroyed (BUT NOT DELETED) when a goes out of scope.

This invalidates the idea of safe manipulations with data no matter where you've got that data from.

> About the "collection should be a range itself" mantra, I've had a micro-epiphany. Since D's slices so nicely model at the same time arrays and their ranges, it is very seductive to think of carrying that to other collection types. But I got disabused of that notion as soon as I wanted to define a less simple data structure. Consider a matrix:
> 
> auto a = BlockMatrix!(float, 3)(100, 200, 300);
> 
> defines a block contiguous matrix of three dimensions with the respective sizes. Now a should be the matrix AND its range at the same time. But what's "the range" of a matrix? Oops. As soon as you start to think of it, so many darn ranges come to mind.

If you cannot think of a natural default range for your collection--- well, it's your decision not to implement range interface for it.  But if it does have a natural iteration semantics, it should be possible to implement:

auto a = new File("name");
auto b = new TreeSet!(char);

foreach(ch; a)
  b.insert(ch);

foreach(ch; b)
    writeln("unique char ", ch);

Here is the problem.  First foreach() naturally and expectedly changes the state of an object a.  Second foreach() naturally and expectedly does not make changes to object b.

Solution:

File is an Input range in your notion.  It supports isEmpty() and getNext(), it is non-copyable (but, note, referenceable).

TreeSet is a Collection, which you don't discuss.  It implements opSlice () without arguments, which is required and sufficient to define a collection.  opSlice() must return a range that, at least, supports input range operations.

foreach() checks if a passed object implements opSlice() so that it can iterate non-destructively. If no opSlice() is provided, it falls back to getNext().

> a) A matrix owns its stuff and is preoccupied with storage internals, allocation, and the such.
> 
> b) The matrix defines as many range types as it wants.
> 
> c) Users use the ranges.

No problem.  The matrix lives as long as a range refers to it.  As expected.

> Inevitably naysayers will, well, naysay: D defined a built-in array, but it also needs Array, so built-in arrays turned out to be useless. So how is that better than C++ which has pointers and vector? Walter has long feared such naysaying and opposed addition of user-defined array types to Phobos. But now I am fully prepared to un-naysay the naysayers: built-in slices are a superior building block to naked pointers. They are in fact embodying a powerful concept, that of a range. With ranges everything there is can be built efficiently and safely. Finally, garbage collection helps by ensuring object ownership while preserving well-definedness of incorrect code.

Slices need to implement random access range interface, that's all.

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Sergey Gromov

Andrei Alexandrescu

Posted in reply to Sergey Gromov

Sergey Gromov wrote:
> Denis Koroskin <2korden@gmail.com> wrote:
>> 5) I don't quite like names left and right! :) I think they should  represent limits (pointers to begin and end, in case of array) rather that  values. In this case, built-in arrays could be implemented as follows:
>>
>> struct Array(T)
>> {
>>      T* left;
>>      T* right;
>>      size_t length() { return right-left; }
>>      ref T opIndex(size_t index) { return left[index]; }
>>      // etc
>> }
>>
>> The rationale behind having access to range limits is to allow operations  on them. For example,
>> R.left-=n;
>>
>> could be used instead of
>> foreach(i; 0..n) {
>>      R.pop();
>> }
> 
> Now you stepped onto your own landmine.  :)  "R.left-=n" extends the range beyond its beginning with unpredictable consequences.  That's why such operations shouldn't be easily accessible.

Oh I thought it's R.right -= n.

It has become clear to me that a range never increases. It always shrinks. It can increase if fused to another range (I'm thinking of relaxing the fusion operations to allow for overlapping/adjacent ranges, not only ranges that include one another). But without extra info from the container a range can never grow.


Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Sergey Gromov
in reply to Andrei Alexandrescu

Sergey Gromov

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Sergey Gromov wrote:
> > Denis Koroskin <2korden@gmail.com> wrote:
> >> 5) I don't quite like names left and right! :) I think they should represent limits (pointers to begin and end, in case of array) rather that values. In this case, built-in arrays could be implemented as follows:
> >>
> >> struct Array(T)
> >> {
> >>      T* left;
> >>      T* right;
> >>      size_t length() { return right-left; }
> >>      ref T opIndex(size_t index) { return left[index]; }
> >>      // etc
> >> }
> >>
> >> The rationale behind having access to range limits is to allow operations
> >> on them. For example,
> >> R.left-=n;
> >>
> >> could be used instead of
> >> foreach(i; 0..n) {
> >>      R.pop();
> >> }
> > 
> > Now you stepped onto your own landmine.  :)  "R.left-=n" extends the range beyond its beginning with unpredictable consequences.  That's why such operations shouldn't be easily accessible.
> 
> Oh I thought it's R.right -= n.
> 
> It has become clear to me that a range never increases. It always shrinks. It can increase if fused to another range (I'm thinking of relaxing the fusion operations to allow for overlapping/adjacent ranges, not only ranges that include one another). But without extra info from the container a range can never grow.

It was obviously a typo, but a very dangerous typo indeed.

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Sergey Gromov

Andrei Alexandrescu

Posted in reply to Sergey Gromov

Sergey Gromov wrote:
> Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>> Sergey Gromov wrote:
>>> - what's a collection?  How do you get a range out of there?  Collection should be a range itself, like an array.  But it shouldn't be destroyed after passing it to foreach().  How to achieve this if foreach() essentially uses getNext()?
>> These are great questions, I'm glad you asked. The way I see things, D2 ranges can be of two kinds: owned and unowned. For example D1's ranges are all unowned:
>>
>> [snip]
>>
>> A better design would be to define collections that own their contents. For example:
>>
>> Array!(int) a(100);
>>
>> This time a does own the underlying array. You'd be able to get ranges all over it:
>>
>> int[] b = a.all;
> 
> I really don't like to have basic language constructs implemented as templates.  It's like Tuple!() which is sorta basic type but requires template trickery to really work with it.

Well I guess we disagree on a number of issues here. The problem with
"sorta basic" types is that the list could go on forever. I'd rather use
a language that allows creation of good types from a small core, instead
of one that tries to supplant all sorta basic types it could think of.

>> So now we have two nice notions: Arrays own the data. Ranges walk over that data. An array can have many ranges crawling over it. But two arrays never overlap. The contents of the array will be destroyed (BUT NOT DELETED) when a goes out of scope.
> 
> This invalidates the idea of safe manipulations with data no matter where you've got that data from.

Manipulation remains typesafe. The problem is that sometimes we want to
ensure timely termination of certain resources.

>> About the "collection should be a range itself" mantra, I've had a micro-epiphany. Since D's slices so nicely model at the same time arrays and their ranges, it is very seductive to think of carrying that to other collection types. But I got disabused of that notion as soon as I wanted to define a less simple data structure. Consider a matrix:
>>
>> auto a = BlockMatrix!(float, 3)(100, 200, 300);
>>
>> defines a block contiguous matrix of three dimensions with the respective sizes. Now a should be the matrix AND its range at the same time. But what's "the range" of a matrix? Oops. As soon as you start to think of it, so many darn ranges come to mind.
> 
> If you cannot think of a natural default range for your collection---
> well, it's your decision not to implement range interface for it.  But if it does have a natural iteration semantics, it should be possible to implement:

It's not that I can't think of one. It's that I think of too many.

> auto a = new File("name");
> auto b = new TreeSet!(char);
> 
> foreach(ch; a)
>   b.insert(ch);
> 
> foreach(ch; b)
>     writeln("unique char ", ch);
> 
> Here is the problem.  First foreach() naturally and expectedly changes the state of an object a.  Second foreach() naturally and expectedly does not make changes to object b.
> 
> Solution:
> 
> File is an Input range in your notion.  It supports isEmpty() and getNext(), it is non-copyable (but, note, referenceable).

You left a crucial detail out. What does getNext() return?

In the new std.stdio design, a File is preoccupied with opening,
closing, and transferring data for the underlying file. On top of it
several input ranges can be constructed - that read lines, blocks, parse
text, format text, and so on. (One thing I want is to allow
std.algorithm to work with I/O easily and naturally.)

I fully understand there can be so many design choices in handling all
this stuff, it's not even funny. I can't get excited about an equivalent
solution that to me has no obvious advantages. I can even less get
excited about a solution that I have objections with. That doesn't mean
someone else can't get excited over it, and probably rightly so.

> TreeSet is a Collection, which you don't discuss.  It implements opSlice
> () without arguments, which is required and sufficient to define a collection.  opSlice() must return a range that, at least, supports input range operations.
> 
> foreach() checks if a passed object implements opSlice() so that it can iterate non-destructively. If no opSlice() is provided, it falls back to getNext().
> 
>> a) A matrix owns its stuff and is preoccupied with storage internals, allocation, and the such.
>>
>> b) The matrix defines as many range types as it wants.
>>
>> c) Users use the ranges.
> 
> No problem.  The matrix lives as long as a range refers to it.  As expected.

I, too, wish reference counting is a solution to everything.


Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Alix Pexton

Andrei Alexandrescu

Posted in reply to Alix Pexton

Alix Pexton wrote:
> Andrei Alexandrescu wrote:
>> (why the heck don't they call it structural typing...)
>> Andrei
> 
> If it were the norm to call it "structural typing" and someone asked why it was called
> so, the one being asked would likely have to resort to diagrams and gesticulation in
> order to adequately convey the reasoning.
> 
> With "duck typing" a simple and widespread saying* is all that is required.
> 
> A...
> 
> * "If it walks like a duck and it quacks like a duck, then it's a duck!"

Yeah, I know. It's a good point. Yet I'm somehow weary of cultural references in formalisms.

Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Bill Baxter

Andrei Alexandrescu

Posted in reply to Bill Baxter

Bill Baxter wrote:
> On Tue, Sep 9, 2008 at 6:50 AM, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>> I put together a short document for the range design. I definitely missed
>> about a million things and have been imprecise about another million, so
>> feedback would be highly appreciated. See:
>>
>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
> 
> Small typo:
> 
> "which opens forward ranges to much more many algorithms"

How do I fix it?

Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Robert Jacques

Andrei Alexandrescu

Posted in reply to Robert Jacques

Robert Jacques wrote:
> On Mon, 08 Sep 2008 23:53:17 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> 
>> Robert Jacques wrote:
>>> On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>>>> Denis Koroskin wrote:
>>>>> 3) Walter mentioned that built-in array could be re-implemented using a pair of pointers instead of ptr+length. Will it ever get a green light? It fits range concept much better.
>>>>
>>>> Walter told me to first implement my design, and if it works, he'll do the change. Yes, it does fit ranges much better because the often-used next and, um, pop will only touch one word instead of two.
>>>  I'd warn that changing away from ptr+length would create logical incosistencies between 1D arrays and 2D/3D/ND arrays.
>>
>> How so?
> 
> An ND array is typically defined as a fat pointer like so:
> struct array(T,size_t N) {
>     T* ptr;
>     size_t[N]    lengths;      // of each dimension
>     ptrdiff_t[N] byte_strides; // of each dimension
> }
> So a 1D array is
> {
>     T* ptr;
>     size_t lengths;
>     ptrdiff_t byte_strides = T.sizeof; //Currently a compile time constant in the built-in array
>     size_t length() { return lengths; }
> }
> which is logically consistent with a general dense matrix and aside from some name change and the stride being a compile time constant, is identical to the current D arrays.
> However, { T* first; T* last } may not be logically extended to ND arrays, particularly sliced ND arrays, as T* last no longer has any meaning.

Hmmm, I see. That could become a problem if we wanted lower-dimensional matrices to be prefixes of higher-dimensional matrices. This is a worthy goal, but one that my matrices don't pursue.

>>>>> 4) We need some way of supporting dollar notation in user containers. The hack of using __dollar is bad (although it works).
>>>>
>>>> It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider:
>>>>
>>>> auto x = matrix[$-1, $-1];
>>>>
>>>> Here the dollar's occurrences have different meanings. A good start would be to expand the above into:
>>>>
>>>> auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];
>>>  I'd also add that multiple dimension slicing should be supported. i.e.
>>> auto x = matrix[2..5,0..$,3]
>>> would become
>>> auto x = matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3) 
>>>
>>> with
>>> struct Slice (T) { T start; T end; }
>>> Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the array
>>
>> Multidimensional slicing can be implemented with staggered indexing:
>>
>> matrix[2..5][0..$][3]
> 
> Yes, but doing so utilizes expression templates and is relatively slow:
> matrix_row_slice temp1 = matrix.opSlice(2,5);
> matrix_col_slice temp2 = temp1.opSlice(0,$);
> matrix                 = temp2.opIndex(3);
> 
> And causes code bloat. Worst matrix[2..5] by itself would be an unstable type. Either foo(matrix[2..5]) would not compile or it would generate code bloat and hard to find logic bugs. (Due to the fact that you've embedded the dimension of the slice operation into the type).

What is an unstable type?

There is no use of expression templates, but indeed multiple slices are created. This isn't as bad as it seems because the desire was to access several elements, so the slice is supposed to be around for long enough to justify its construction cost.

I agree it would be onerous to access a single element with e.g. matrix[1][1][2].

>> means: first, take a slice 2..5 that returns a matrix range one dimension smaller. Then, for that type take a slice from 0 to $. And so on.
>>
>> This works great for row-wise storage. I'm not sure how efficient it would be for other storage schemes.
> 
> No it doesn't. It works great for standard C arrays of arrays, but these are not matrices and have a large number of well documented performance issues when used as such. In general, multi-dimentional data structures relatively common and should be cleanly supported.

[Citation needed]

>> Note how nice the distinction between the container and its views works: there is only one matrix. But there are many ranges and subranges within it, bearing various relationships with one another.
> 
> Yes, Data+View (i.e MVC for data structures) is a good thing(TM). But generally, matrices have been views into data and not the data themselves. (unless needed for memory management, etc)

Well if better terminology comes along I'm all for it. I want to define the "matrix storage" as a owning container, and several "matrix ranges" that access the data stored by the storage.


Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Bill Baxter
in reply to Andrei Alexandrescu

Bill Baxter

Posted in reply to Andrei Alexandrescu

On Tue, Sep 9, 2008 at 7:53 PM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Bill Baxter wrote:
>>
>> On Tue, Sep 9, 2008 at 6:50 AM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>>>
>>> I put together a short document for the range design. I definitely missed about a million things and have been imprecise about another million, so feedback would be highly appreciated. See:
>>>
>>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>>
>> Small typo:
>>
>> "which opens forward ranges to much more many algorithms"
>
> How do I fix it?

Just make it "to many more algorithms" instead of "to much more many algorithms".

--bb

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Bill Baxter

Andrei Alexandrescu

Posted in reply to Bill Baxter

Bill Baxter wrote:
> On Tue, Sep 9, 2008 at 1:06 PM, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>> Manfred_Nowak wrote:
>>> Andrei Alexandrescu wrote:
>>>
>>>> maybe "nextTo" or something could be more suggestive.
>>> r.tillBeg(s), r.tillEnd(s),
>>> r.fromBeg(s), r.fromEnd(s) ?
>> Sounds good! Walter doesn't like abbreviations, so probably *Begin would
>> please him more.
> 
> But till and until are synonyms.  They both sound like iteration.
> Although it might be unavoidable since all prepositions that give a
> destination seem to imply going to that destination.  till, until,
> toward, to, up to, etc.  So might as well go with the shortest one,
> "to".
> 
> r.toBegin(s), r.toEnd(s)
> r.fromBegin(s), r.fromEnd(s)

These are the names that I find most appealing at the moment. They required the fewest neurons to fire when glancing over them and mapping them to the needed operations.

Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Bill Baxter

Andrei Alexandrescu

Posted in reply to Bill Baxter

Bill Baxter wrote:
> On Tue, Sep 9, 2008 at 7:53 PM, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>> Bill Baxter wrote:
>>> On Tue, Sep 9, 2008 at 6:50 AM, Andrei Alexandrescu
>>> <SeeWebsiteForEmail@erdani.org> wrote:
>>>> I put together a short document for the range design. I definitely missed
>>>> about a million things and have been imprecise about another million, so
>>>> feedback would be highly appreciated. See:
>>>>
>>>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>>> Small typo:
>>>
>>> "which opens forward ranges to much more many algorithms"
>> How do I fix it?
> 
> Just make it "to many more algorithms" instead of "to much more many
> algorithms".

Thanks, fixed.

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation