C++ / Why Iterators Got It All Wrong (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » C++ / Why Iterators Got It All Wrong (page 2)

September 02, 2017

Re: C++ / Why Iterators Got It All Wrong

Posted by Robert M. Münch
in reply to Mark

Robert M. Münch

Posted in reply to Mark

On 2017-09-01 20:34:32 +0000, Mark said:

> On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
>> Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1
>> 
>> I haven't read everything, so not sure if it worth to take a look.
> 
> Iterators have many problems. Andrei's talk some years ago, titled "Iterators Must Go", points out many of them in a clear fashion. I think most of the problems stem from the fact that they are inherently a leaky abstraction. Iterators basically treat all data structures as a singly/doubly linked list or as arrays (if they allow random access). There is no natural or intuitive way of describing, say, a tree or an arbitrary graph as a list/array. Ranges and cursors try to solve some of the issues but they still have this fundamental problem.

Iterators are not the silver bullet. But IIRC you can specify if you want to iterate over a graph BF or DF. If you just need to "iterate" over the elements things work pretty good IMO. If you want to select a sub-set and then iterate things get much complicater anyway.

> It may be unrealisic to expect such a nice abstraction of iteration for other data structures, not to mention a universal one which will work for all interesting data structures.

That's true, hence I didn't ever expected this at all. Some pretty simple & basic building blocks and the rest is done on-demand fitting the use-case.

-- 
Robert M. Münch
http://www.saphirion.com
smarter | better | faster

September 02, 2017

Re: C++ / Why Iterators Got It All Wrong

Posted by Moritz Maxeiner
in reply to Robert M. Münch

Moritz Maxeiner

Posted in reply to Robert M. Münch

On Saturday, 2 September 2017 at 20:17:40 UTC, Robert M. Münch wrote:
> On 2017-08-31 07:13:55 +0000, drug said:
>
>> Interesting. How is it comparable with iterators and ranges ideas?
>
> Well, see: http://www.rebol.com/docs/core23/rebolcore-6.html#section-6

Thanks for your post about Rebol, I didn't know it before.
After reading through the series chaper, though, AFAICT Rebol series *are* iterators (begin+end), just with really nice, functional (read: LISP) syntax?

September 03, 2017

Re: C++ / Why Iterators Got It All Wrong

Posted by Ilya Yaroshenko
in reply to Robert M. Münch

Ilya Yaroshenko

Posted in reply to Robert M. Münch

On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
> Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1
>
> I haven't read everything, so not sure if it worth to take a look.

Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].

User level API is ndslice [4] (D n-dimensional random access range) , can be combined with Phobos. In the same time library uses low level unbounded random access iterator [2] representation to implement 90% of topology logic. This approach allows to maintain few times smaller, simpler and less error-prone code comparing with std.range. The best example it `bitwise` functions implementation in Phobos and Mir Algorithm. Iterators are very good for type composition when one need its own D range that can not be constructed using existing mir.ndslice.topology / std.range API.

[1] https://github.com/libmir/mir-algorithm
[2] http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html
[3] http://docs.algorithm.dlang.io/latest/mir_ndslice_iterator.html
[4] http://docs.algorithm.dlang.io/latest/mir_ndslice_slice.html#Slice

Best Regards,
Ilya

September 03, 2017

Re: C++ / Why Iterators Got It All Wrong

Posted by Moritz Maxeiner
in reply to Ilya Yaroshenko

Moritz Maxeiner

Posted in reply to Ilya Yaroshenko

On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko wrote:
> On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
>> Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1
>>
>> I haven't read everything, so not sure if it worth to take a look.
>
> Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].

Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?

September 03, 2017

Re: C++ / Why Iterators Got It All Wrong

Posted by Robert M. Münch
in reply to Moritz Maxeiner

Robert M. Münch

Posted in reply to Moritz Maxeiner

On 2017-09-02 21:27:58 +0000, Moritz Maxeiner said:

> Thanks for your post about Rebol, I didn't know it before.

As said, the official Rebol-2 version is a dead-end. Even our main product is still based on it :-) 15 years old technology, but still working and we know all problemes. So very few surprises. And, it's very productive.

There exists a half-done official Rebol-3 version as open-source. It was picked up by some and continued. And than there is a thing called Red, which uses a lot of ideas but compiles. Worth a look too. It's really cool, because it compiles native apps for Android without any SDK for example.

Overall, it's worth to spend some time with Rebol. I'm sure you won't your time back and can learn a lot. Things to look at: VIEW (GUI), PARSE (for parsing) and after this using PARSE to create DSL with Rebol. Very cool feature.


> After reading through the series chaper, though, AFAICT Rebol series *are* iterators (begin+end), just with really nice, functional (read: LISP) syntax?

There is no difference between code & data in Rebol. And it has a very rich set of datatpye, IIRC about 35 native ones. And many of them are series, which can be operated in the same way.

From my experience, traversing datastructures with a functional syntax and concept is really natural to work with. It's mostly what you would tell someone to do using english.

-- 
Robert M. Münch
http://www.saphirion.com
smarter | better | faster

September 03, 2017

Re: C++ / Why Iterators Got It All Wrong

Posted by Ilya Yaroshenko
in reply to Moritz Maxeiner

Ilya Yaroshenko

Posted in reply to Moritz Maxeiner

On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner wrote:
> On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko wrote:
>> On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
>>> Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1
>>>
>>> I haven't read everything, so not sure if it worth to take a look.
>>
>> Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].
>
> Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?

There are three general kinds of dynamic dense tensors. Mir implements all of them:

1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths.

2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride.

3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors.

Finite random access range Issues:

1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer).

2. Random access range holds its length. Yes, Phobos has infinity ranges (mir hasn't), but the practice is that almost any RAR constructed using Phobos has length and you can not strip when type is constructed. Anyway, infinity RAR is just a pointer/iterator that can move forward but can not move backward. Tensors hold their own lengths for each dimensions, so range's length is just a useless payload.

3. Ranges can not move backward (auto b = a[-4 .. $].). This means one can not implement dynamic `reversed`(flip) [1, 2] operation for dimensions that have strides (universal or canonical tensors). _Dynamic_ means that instead of construct a new type with reversed order for a dimension, we just negate the corresponding stride and move the cursor.

[1] http://docs.algorithm.dlang.io/latest/mir_ndslice_dynamic.html#.reversed
[2]  https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.flip.html

Best regards,
Ilya

September 03, 2017

Re: C++ / Why Iterators Got It All Wrong

Posted by Moritz Maxeiner
in reply to Ilya Yaroshenko

Moritz Maxeiner

Posted in reply to Ilya Yaroshenko

On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko wrote:
> On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner wrote:
>> On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko wrote:
>>> On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
>>>> Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1
>>>>
>>>> I haven't read everything, so not sure if it worth to take a look.
>>>
>>> Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].
>>
>> Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?
>
> There are three general kinds of dynamic dense tensors. Mir implements all of them:
>
> 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths.
>
> 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride.
>
> 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors.
>
> Finite random access range Issues:
>
> 1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer).

Won't you have to implement opIndex* operators either way in order to use the `a[]` syntax? The main difference I can see should be that you'd also have to implement the InputRange (front,popFront,empty), ForwardRange (save), and BidirectionalRange (back,popBack) members, which if you don't need them, is indeed understandably too much?

>
> 2. Random access range holds its length. Yes, Phobos has infinity ranges (mir hasn't), but the practice is that almost any RAR constructed using Phobos has length and you can not strip when type is constructed. Anyway, infinity RAR is just a pointer/iterator that can move forward but can not move backward. Tensors hold their own lengths for each dimensions, so range's length is just a useless payload.

I'm not sure what you mean here. Is this still about accessing the elements of one tensor? If so, what do Phobos' ranges have to do with your tensor type's API?

>
> 3. Ranges can not move backward (auto b = a[-4 .. $].). This means one can not implement dynamic `reversed`(flip) [1, 2] operation for dimensions that have strides (universal or canonical tensors). _Dynamic_ means that instead of construct a new type with reversed order for a dimension, we just negate the corresponding stride and move the cursor.

Right, that is indeed a limitation of the range abstraction (though a type could expose both functionality to move backwards _and_ the range API) and if you need that, it makes sense not to use ranges.

Thanks for the summary of issues :)

September 03, 2017

Re: C++ / Why Iterators Got It All Wrong

Posted by Moritz Maxeiner
in reply to Robert M. Münch

Moritz Maxeiner

Posted in reply to Robert M. Münch

On Sunday, 3 September 2017 at 08:37:36 UTC, Robert M. Münch wrote:
> On 2017-09-02 21:27:58 +0000, Moritz Maxeiner said:
>
>> Thanks for your post about Rebol, I didn't know it before.
>
> As said, the official Rebol-2 version is a dead-end. Even our main product is still based on it :-) 15 years old technology, but still working and we know all problemes. So very few surprises. And, it's very productive.
>
> There exists a half-done official Rebol-3 version as open-source. It was picked up by some and continued. And than there is a thing called Red, which uses a lot of ideas but compiles. Worth a look too. It's really cool, because it compiles native apps for Android without any SDK for example.
>
> Overall, it's worth to spend some time with Rebol. I'm sure you won't your time back and can learn a lot. Things to look at: VIEW (GUI), PARSE (for parsing) and after this using PARSE to create DSL with Rebol. Very cool feature.

I'll put in on my ever growing list of things to check out in depth, thanks :p

>
>
>> After reading through the series chaper, though, AFAICT Rebol series *are* iterators (begin+end), just with really nice, functional (read: LISP) syntax?
>
> There is no difference between code & data in Rebol. And it has a very rich set of datatpye, IIRC about 35 native ones. And many of them are series, which can be operated in the same way.

Sounds like LISP :)

>
> From my experience, traversing datastructures with a functional syntax and concept is really natural to work with. It's mostly what you would tell someone to do using english.

I agree, though I was talking about what the abstract data type of a "series" is, i.e. what operations is exposes. From my observation:
A D input range exposes via empty/front/popFront.
A classic iterator exposes via begin/end.
A Rebol series seems to be a safer form of iterator, as it doesn't expose begin/end directly, but exposes restricted operations that are defined as manipulating begin/end.

September 03, 2017

Re: C++ / Why Iterators Got It All Wrong

Posted by Ilya Yaroshenko
in reply to Moritz Maxeiner

Ilya Yaroshenko

Posted in reply to Moritz Maxeiner

On Sunday, 3 September 2017 at 12:35:16 UTC, Moritz Maxeiner wrote:
> On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko wrote:
>> On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner wrote:
>>> On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko wrote:
>>>> On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
>>>>> Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1
>>>>>
>>>>> I haven't read everything, so not sure if it worth to take a look.
>>>>
>>>> Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].
>>>
>>> Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?
>>
>> There are three general kinds of dynamic dense tensors. Mir implements all of them:
>>
>> 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths.
>>
>> 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride.
>>
>> 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors.
>>
>> Finite random access range Issues:
>>
>> 1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer).
>
> Won't you have to implement opIndex* operators either way in order to use the `a[]` syntax? The main difference I can see should be that you'd also have to implement the InputRange (front,popFront,empty), ForwardRange (save), and BidirectionalRange (back,popBack) members, which if you don't need them, is indeed understandably too much?

front,
popFront,
empty,
save,
back,
popBack,
opIndexOpAssign,
2 x opIndex(.opSlice) for rhs scalars and ranges
2 x opIndexAssign(.opSlice) for rhs scalars and ranges
2 x opIndexOpAssign(.opSlice) for rhs scalars and ranges

Plus, because of range topology slicing may be slower good to add this ones:

popFrontN
popFrontExactly,
popBackN
popBackExactly,

and maybe i forgot something ...

>>
>> 2. Random access range holds its length. Yes, Phobos has infinity ranges (mir hasn't), but the practice is that almost any RAR constructed using Phobos has length and you can not strip when type is constructed. Anyway, infinity RAR is just a pointer/iterator that can move forward but can not move backward. Tensors hold their own lengths for each dimensions, so range's length is just a useless payload.
>
> I'm not sure what you mean here. Is this still about accessing the elements of one tensor? If so, what do Phobos' ranges have to do with your tensor type's API?

ndslice is:
1. Slice structure with a huge amount of features from mir algorithm
2. multidimensional RAR: .front!0, front!1, .front!(N - 1) and all other generalized RAR primitves including multidimensional index [1, 2, 3, ...] and slicing [1 .. 4, 3 .. $].
3. Common one dimensional RAR on top of outer most dimension. So one can use Phobos funcitons for Slice structure. For example, a matrix will be iterated row-by-row.

But this paragraph was about another issue:

struct BLASMatrixTypeBasedOnRanges
{
   double[] _data; // arrays is the simplest RAR
   size_t rows, cols;
   size_t stride;
}

sizeof(YourBLASMatrixTypeBasedOnRanges) == size_t.sizeof * 5;

in other hand:

sizeof(Slice!(Canonical, [2], double*)) == size_t.sizeof * 4;

Ranges requires for 25% more space for canonical matrixes then iterators.

>>
>> 3. Ranges can not move backward (auto b = a[-4 .. $].). This means one can not implement dynamic `reversed`(flip) [1, 2] operation for dimensions that have strides (universal or canonical tensors). _Dynamic_ means that instead of construct a new type with reversed order for a dimension, we just negate the corresponding stride and move the cursor.
>
> Right, that is indeed a limitation of the range abstraction (though a type could expose both functionality to move backwards _and_ the range API) and if you need that, it makes sense not to use ranges.
>
> Thanks for the summary of issues :)

September 03, 2017

Re: C++ / Why Iterators Got It All Wrong

Posted by Moritz Maxeiner
in reply to Ilya Yaroshenko

Moritz Maxeiner

Posted in reply to Ilya Yaroshenko

On Sunday, 3 September 2017 at 14:19:19 UTC, Ilya Yaroshenko wrote:
> On Sunday, 3 September 2017 at 12:35:16 UTC, Moritz Maxeiner wrote:
>> On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko wrote:
>>> On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner wrote:
>>>> On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko wrote:
>>>>> On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:
>>>>>> Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1
>>>>>>
>>>>>> I haven't read everything, so not sure if it worth to take a look.
>>>>>
>>>>> Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].
>>>>
>>>> Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?
>>>
>>> There are three general kinds of dynamic dense tensors. Mir implements all of them:
>>>
>>> 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths.
>>>
>>> 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride.
>>>
>>> 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors.
>>>
>>> Finite random access range Issues:
>>>
>>> 1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer).
>>
>> Won't you have to implement opIndex* operators either way in order to use the `a[]` syntax? The main difference I can see should be that you'd also have to implement the InputRange (front,popFront,empty), ForwardRange (save), and BidirectionalRange (back,popBack) members, which if you don't need them, is indeed understandably too much?
>
> front,
> popFront,
> empty,
> save,
> back,
> popBack,
> opIndexOpAssign,
> 2 x opIndex(.opSlice) for rhs scalars and ranges
> 2 x opIndexAssign(.opSlice) for rhs scalars and ranges
> 2 x opIndexOpAssign(.opSlice) for rhs scalars and ranges
>
> Plus, because of range topology slicing may be slower good to add this ones:
>
> popFrontN
> popFrontExactly,
> popBackN
> popBackExactly,
>
> and maybe i forgot something ...

Didn't know about the *N *Exactly ones, but yeah, it's a lot. Though unless you aren't interesting in the `a[]` syntax, you can't avoid opIndex*.

>
>>>
>>> 2. Random access range holds its length. Yes, Phobos has infinity ranges (mir hasn't), but the practice is that almost any RAR constructed using Phobos has length and you can not strip when type is constructed. Anyway, infinity RAR is just a pointer/iterator that can move forward but can not move backward. Tensors hold their own lengths for each dimensions, so range's length is just a useless payload.
>>
>> I'm not sure what you mean here. Is this still about accessing the elements of one tensor? If so, what do Phobos' ranges have to do with your tensor type's API?
>
> ndslice is:
> 1. Slice structure with a huge amount of features from mir algorithm
> 2. multidimensional RAR: .front!0, front!1, .front!(N - 1) and all other generalized RAR primitves including multidimensional index [1, 2, 3, ...] and slicing [1 .. 4, 3 .. $].
> 3. Common one dimensional RAR on top of outer most dimension. So one can use Phobos funcitons for Slice structure. For example, a matrix will be iterated row-by-row.
>
> But this paragraph was about another issue:
>
> struct BLASMatrixTypeBasedOnRanges
> {
>    double[] _data; // arrays is the simplest RAR
>    size_t rows, cols;
>    size_t stride;
> }

Now I get what you mean, you were talking about not using the dynamic arrays built into the language (which indeed carry their length for safety purposes), not about exposing range API here; you're right, you don't need to use a dynamic array here, as you already have the length encoded another way (it would be wasteful), but strictly speaking D's builtin dynamic arrays are not ranges (as they are neither aggregate types nor do they have the range functions baked into the language). They only become ranges when you import the appropriate free functions from Phobos (or define some yourself).

>
> sizeof(YourBLASMatrixTypeBasedOnRanges) == size_t.sizeof * 5;
>
> in other hand:
>
> sizeof(Slice!(Canonical, [2], double*)) == size_t.sizeof * 4;
>
> Ranges requires for 25% more space for canonical matrixes then iterators.

We do have to differentiate between a container and the API with which to iterate over its elements.
D's dynamic arrays (a pointer+length) require more space then using only a raw pointer, of course. If that's more than an iterator depends on the type of iterator you use. Most common iterator implementations I know consist of a begin and an end pointer, essentially requiring the same space as D's dynamic arrays.
In the case you describe, though, we aren't talking about the iteration API, we are talking about what's used internally for storage.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation