September 04, 2017
On Sunday, 3 September 2017 at 16:33:04 UTC, Moritz Maxeiner wrote:
> On Sunday, 3 September 2017 at 14:19:19 UTC, Ilya Yaroshenko wrote:
>> Ranges requires for 25% more space for canonical matrixes then iterators.
>
> We do have to differentiate between a container and the API with which to iterate over its elements.
> D's dynamic arrays (a pointer+length) require more space then using only a raw pointer, of course. If that's more than an iterator depends on the type of iterator you use. Most common iterator implementations I know consist of a begin and an end pointer, essentially requiring the same space as D's dynamic arrays.
> In the case you describe, though, we aren't talking about the iteration API, we are talking about what's used internally for storage.

Maybe I should call it cursors or generic pointers instead of iterators.
September 04, 2017
On Monday, 4 September 2017 at 04:29:36 UTC, Ilya wrote:
>
> Maybe I should call it cursors or generic pointers instead of iterators.

dcollections uses a cursor approach that seems different from yours
https://github.com/schveiguy/dcollections/blob/master/concepts.txt

Maybe generic pointers makes more sense? Unless I'm mistaken, mir iterators are more like pointers with specific operations defined for how they are incremented, assigned, etc.

I had been a little confused by the name iterators because I don't see any functions in mir that use begin/end. But I don't know if that's more about C++'s implementation of iterators or more core to the concept.


September 04, 2017
On 2017-09-03 12:46:05 +0000, Moritz Maxeiner said:

> I'll put in on my ever growing list of things to check out in depth, thanks :p

You're welcome. Don't wait to long ;-)

>> There is no difference between code & data in Rebol. And it has a very rich set of datatpye, IIRC about 35 native ones. And many of them are series, which can be operated in the same way.
> 
> Sounds like LISP :)

Yes, some ideas are the same. But the syntax is way more beautyful and useable.

> I agree, though I was talking about what the abstract data type of a "series" is, i.e. what operations is exposes. From my observation:
> A D input range exposes via empty/front/popFront.
> A classic iterator exposes via begin/end.
> A Rebol series seems to be a safer form of iterator, as it doesn't expose begin/end directly, but exposes restricted operations that are defined as manipulating begin/end.

The series has elements and a "current pointer" which can be manipulated. Begin & end are just that, begin and end of the series. You don't manipulate them directly. Of couse you can change a series by adding/removing, cutting etc. which then changes begin/end accordingly.

>> a: [a b c d e f g]
== [a b c d e f g]
>> a/5
== e
>> skip a 5
== [f g]
>> a
== [a b c d e f g]
>> b: a/5
== e
>> type? b
== word!
>> type? a
== block!
>> b: skip a 5
== [f g]
>> type? b
== block!
>> index? b
== 6
>> head b
== [a b c d e f g]

You can even treat such a block as fixed-size record and operate on this. Very handy.

-- 
Robert M. Münch
http://www.saphirion.com
smarter | better | faster

September 04, 2017
On Sunday, 3 September 2017 at 12:46:05 UTC, Moritz Maxeiner wrote:
> I agree, though I was talking about what the abstract data type of a "series" is, i.e. what operations is exposes. From my observation:
> A D input range exposes via empty/front/popFront.
> A classic iterator exposes via begin/end.
> A Rebol series seems to be a safer form of iterator, as it doesn't expose begin/end directly, but exposes restricted operations that are defined as manipulating begin/end.

They are all pretty low-level abstractions. C++ has posited that iterators should be the bridge connecting generic data structures and generic algorithms. But they are pretty awful at that. They make it incredibly easy to destroy one of your structure's invariants, or to use it in a way that ignores some of its invariants (leading to inferior performance and sometimes unnecessarily bulky code). Ironically iterators are frequently used in OO code even though they clearly violate the "Tell, don't ask" principle, as do D ranges (and presumably also Rebol series, though I only skimmed over the documentation).
September 05, 2017
On Saturday, 2 September 2017 at 20:22:44 UTC, Robert M. Münch wrote:
> Iterators are not the silver bullet. But IIRC you can specify if you want to iterate over a graph BF or DF. If you just need to "iterate" over the elements things work pretty good IMO. If you want to select a sub-set and then iterate things get much complicater anyway.

There isn't really some finite (or small, anyway) choice here. I may want to iterate over a binary tree in preorder/inorder/postorder DF order. I may want to iterate over it in BF order. I may want to handle the left son before the right one, or vice versa. This is already 8 iterators. In C++ land I also need to provide const and non-const versions of each iterator, which brings it up to 16 iterators (I'm not sure what's the situation in D - can you have const ranges and still be able to call popFront on them?)

From the implementor's side, this is a headache. However, it's not that bad in a language in D, since "static if" saves you a lot of code in cases like this.

The bigger problem is on the user's side: Which iterator should she use? For most applications more than one type of iterator will be adequate, but the "right" choice will often be implementation-dependent. For instance, if the tree in question is a complete binary tree, then it is often implemented as an array that contains the tree's elements in BF order. Therefore, if some operation on the tree can be done by traversing it BF then the user should probably use a BF iterator, for performance reasons. But the user doesn't know how the tree is implemented, so she can't make that choice...

If the implementor informs the user about the implementation in the documentation, then his data structure is a cost-free abstraction (assuming that the user always chooses wisely...) but also a leaky one. If he doesn't, then the abstraction doesn't leak but it is no longer cost-free. He can't have both.

A simpler example is a class interface for a (two-dimensional) matrix. You would expect the interface to provide a row-major order iterator and a column-major order one. Again, from the user's POV, the right choice (performance wise) will be dependent on the implementation. Asymptotic complexity is sometimes considered an essential detail of a "true" abstraction but this doesn't help here since the difference in performance is only observed in practice.

This is why I find the presentation in the OP - "Why iterators got it all wrong" - a bit iffy. Iterators did get a lot of things wrong, but the presentation focuses on relatively mundane issues - ugly syntax and somewhat annoying semantics.
September 06, 2017
On Monday, 4 September 2017 at 04:29:36 UTC, Ilya wrote:
> Maybe I should call it cursors or generic pointers instead of iterators.

Well, «C++ iterators» are table pointer abstractions, so you need a pair.

An «iterator» would be a possibly heavy object that is used for traversing a possibly complex and heterogenous data-structure without exposing any notion of size or end a-priori (e.g. before it arrives at the end, if there is an end).


September 06, 2017
On Wednesday, 6 September 2017 at 06:57:25 UTC, Ola Fosheim Grøstad wrote:
>
> Well, «C++ iterators» are table pointer abstractions, so you need a pair.
>
> An «iterator» would be a possibly heavy object that is used for traversing a possibly complex and heterogenous data-structure without exposing any notion of size or end a-priori (e.g. before it arrives at the end, if there is an end).

mir-ndslices are like a generalization of D's slices. Instead of pointer and a length, they are an iterator and lengths/strides. So the size is exposed.
September 06, 2017
On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko wrote:
> 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths.
>
> 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride.
>
> 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors.

Can you elaborate?
September 06, 2017
On Wednesday, 6 September 2017 at 20:24:05 UTC, Enamex wrote:
> On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko wrote:
>> 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths.
>>
>> 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride.
>>
>> 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors.
>
> Can you elaborate?

IMO, it's something that still needs to get explained better in the documentation. I haven't tried to because I'm not 100% on it.

Below is as best as I have figured things out:

All Slices in mir can have an iterator, lengths, and strides.

The lengths are always N-dimensional and contain information on the shape of the Slice. So for instance, if the lengths are [3, 4], then N=2 and it is a 2-dimensional slice, with 3 rows and 4 columns.

I left out packs...which are an added complication. Packs can be used to make slices of slices. For the above Slice, the default would simply be that the packs are [1], which means that there is no slice of slicing going on. If the packs were now [1, 1] (the sum of packs must equal N), then that is like saying you now have a slice of slices. In this case, slice[0] would be a row instead of a scalar. This is what allows you to iterate by row or by column.

So when you're thinking about contiguous, canonical, and universal. The way that lengths and packs work is the same for all of them. The difference is in the strides. Contiguous slices don't have a strides vector. Canonical slices have a strides vector with a length of N-1. Universal slices have a strides vector of length N.

So how are the strides used and why do they matter? I'm not sure I grok this part 100%, so I'll do my best. Strides tell you how much difference there is between the units of each array. So for instance, if my slice (call it a) has lengths [2, 3, 4] with strides [12, 4, 1], then a[0] is a [3, 4] matrix, which is where the 12 comes from. To move the pointer to the start of the next [3, 4] matrix (a[1]), requires moving 12 of whatever the type is. This would be a universal slice because it has N=3 lengths and strides. So if you call a._strides, then it would give you [12, 4, 1]. If a were canonical, calling _strides would give you [12, 4] because _strides for canonical slices have length N-1. Now if a were contiguous instead of universal and you call _strides on it, then it would give you [], because contiguous slices have no strides.

The memory footprint of the slice appears different for these with a and a[0] of universal being larger than canonical and contiguous. This largely reflects the storage of the strides data.

Similarly, a[0] has _strides [4, 1] for universal, [4] for canonical, and [] for contiguous. Mir is written in such a way that a[0] the same regardless of the SliceKind. For the most part, this means that it isn't really obvious that there is a difference between them. It matters in some underlying functions, but I haven't needed to do much other than sometimes convert a contiguous slice to universal (though it's not always clear to me why, I just do it).
September 07, 2017
On Wednesday, 6 September 2017 at 21:44:01 UTC, jmh530 wrote:
> On Wednesday, 6 September 2017 at 20:24:05 UTC, Enamex wrote:
> Similarly, a[0] has _strides [4, 1] for universal, [4] for canonical, and [] for contiguous. Mir is written in such a way that a[0] the same regardless of the SliceKind. For the most part, this means that it isn't really obvious that there is a difference between them. It matters in some underlying functions, but I haven't needed to do much other than sometimes convert a contiguous slice to universal (though it's not always clear to me why, I just do it).

For example, lets takes `transposed` function. It does not transpose the date. Instead, it swap dimensions.
Assume you have a canonical matrix with _lengths = [3, 4]. So its strides are [4]. Now we want to swap dimensions, but to do it we need to swap both lengths and strides. So first we need to convert a slice to universal, so it will have both strides we want to swap: [4, 1]. Transposed slice will have _lengths = [4, 3] and _strides = [1, 4].

Best Regards,
Ilya