March 10, 2017
On Fri, Mar 10, 2017 at 07:41:31AM -0800, Jonathan M Davis via Digitalmars-d wrote:
> On Friday, March 10, 2017 14:15:45 Nick Treleaven via Digitalmars-d wrote:
> > On Friday, 10 March 2017 at 01:10:21 UTC, H. S. Teoh wrote:
[...]
> > > Using opSlice() for slicing (i.e., arr[]) is old,
> > > backward-compatible behaviour.
> >
> > This seems non-intuitive to me (at least for single dimension containers) - when you see var[], do you think var is being indexed or do you think var is being sliced like an array (equivalent to var[0..$])?
> 
> Yeah, I've never understood how it made any sense for opIndex to be used for slicing, and I've never used it that way.

It's very simple, really.  Under the old behaviour, you have:

	arr[]		--->	arr.opSlice()
	arr[x]		--->	arr.opIndex(x)
	arr[x..y]	--->	arr.opSlice(x,y)

This made implementing higher-dimensional slicing operators hard to define, especially if you want mixed slicing and indexing (aka subdimensional slicing):

	arr[x, y]	--->	arr.opIndex(x, y)
	arr[x, y..x]	--->	?
	arr[x..y, z]	--->	?
	arr[w..x, y..z]	--->	arr.opSlice(w, x, y, z)  // ?

Kenji's insight was that we can solve this problem by homogenizing opSlice and opIndex, such that [] *always* translates to opIndex, and .. always translates to opSlice.

So, under the new behaviour:

	arr[]		--->	arr.opIndex()
	arr[x]		--->	arr.opIndex(x)
	arr[x,y]	--->	arr.opIndex(x,y)

	arr[x..y]	--->	arr.opIndex(arr.opSlice(x,y))
	arr[x, y..z]	--->	arr.opIndex(x, arr.opSlice(y,z))
	arr[x..y, z]	--->	arr.opIndex(arr.opSlice(x,y), z)

This allows mixed-indexing / subdimensional slicing to consistently use opIndex, with opSlice returning objects representing index ranges, so that in a multidimensional user type, you could unify all the cases under a single definition of opIndex:

	IndexRange opSlice(int x, int y) { ... }

	auto opIndex(I...)(I indices)
	{
		foreach (idx; indices) {
			static if (is(typeof(idx) == IndexRange))
			{
				// this index is a slice
			}
			else
			{
				// this index is a single index
			}
		}
	}

Without this unification, you'd have to implement 2^n different overloads of opIndex / opSlice in order to handle all cases of subdimensional slicing in n dimensions.

So you can think of it simply as:

	[]	==	opIndex
	..	==	opSlice

in all cases.

It is more uniform this way, and makes perfect sense to me.


> I generally forget that that change was even made precisely because it makes no sense to me, whereas using opSlice for slicing makes perfect sense. I always use opIndex for indexing and opSlice for slicing just like they were originally designed.
[...]

This is probably why Kenji didn't deprecate the original use of opSlice, since for the 1-dimensional case the homogenization of opSlice / opIndex is probably unnecessary and adds extra work for the programmer: if you want to implement arr[x..y] you have to write both opSlice and an opIndex overload that accepts what opSlice returns, as opposed to just writing a single opSlice.

So probably we should leave it the way it is (and perhaps clarify that in the spec), as deprecating the "old" use of opSlice in the 1-dimensional case would cause problems.


T

-- 
Chance favours the prepared mind. -- Louis Pasteur
March 10, 2017
On Friday, 10 March 2017 at 18:43:43 UTC, H. S. Teoh wrote:
>
> So probably we should leave it the way it is (and perhaps clarify that in the spec), as deprecating the "old" use of opSlice in the 1-dimensional case would cause problems.
>

ndslice just recently added an indexed function
http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html#.indexed
that is like slicing based on some index. Other languages have something similar.

However, it's not something that's built-in in D. Thus, given the indexed example:

auto source = [1, 2, 3, 4, 5];
auto indexes = [4, 3, 1, 2, 0, 4].sliced;
auto ind = source.indexed(indexes);

there's way to instead write

auto source = [1, 2, 3, 4, 5];
auto indexes = [4, 3, 1, 2, 0, 4].sliced;
auto ind = source[indexes];

So to me, there does seem scope for some changes, even if they aren't the changes you mentioned in your post.

March 10, 2017
On Friday, March 10, 2017 10:43:43 H. S. Teoh via Digitalmars-d wrote:
> On Fri, Mar 10, 2017 at 07:41:31AM -0800, Jonathan M Davis via
Digitalmars-d wrote:
> > On Friday, March 10, 2017 14:15:45 Nick Treleaven via Digitalmars-d
wrote:
> > > On Friday, 10 March 2017 at 01:10:21 UTC, H. S. Teoh wrote:
> [...]
>
> > > > Using opSlice() for slicing (i.e., arr[]) is old,
> > > > backward-compatible behaviour.
> > >
> > > This seems non-intuitive to me (at least for single dimension containers) - when you see var[], do you think var is being indexed or do you think var is being sliced like an array (equivalent to var[0..$])?
> >
> > Yeah, I've never understood how it made any sense for opIndex to be used for slicing, and I've never used it that way.
>
> It's very simple, really.  Under the old behaviour, you have:
>
>   arr[]       --->    arr.opSlice()
>   arr[x]      --->    arr.opIndex(x)
>   arr[x..y]   --->    arr.opSlice(x,y)
>
> This made implementing higher-dimensional slicing operators hard to define, especially if you want mixed slicing and indexing (aka subdimensional slicing):
>
>   arr[x, y]   --->    arr.opIndex(x, y)
>   arr[x, y..x]    --->    ?
>   arr[x..y, z]    --->    ?
>   arr[w..x, y..z] --->    arr.opSlice(w, x, y, z)  // ?
>
> Kenji's insight was that we can solve this problem by homogenizing opSlice and opIndex, such that [] *always* translates to opIndex, and .. always translates to opSlice.
>
> So, under the new behaviour:
>
>   arr[]       --->    arr.opIndex()
>   arr[x]      --->    arr.opIndex(x)
>   arr[x,y]    --->    arr.opIndex(x,y)
>
>   arr[x..y]   --->    arr.opIndex(arr.opSlice(x,y))
>   arr[x, y..z]    --->    arr.opIndex(x, arr.opSlice(y,z))
>   arr[x..y, z]    --->    arr.opIndex(arr.opSlice(x,y), z)
>
> This allows mixed-indexing / subdimensional slicing to consistently use opIndex, with opSlice returning objects representing index ranges, so that in a multidimensional user type, you could unify all the cases under a single definition of opIndex:
>
>   IndexRange opSlice(int x, int y) { ... }
>
>   auto opIndex(I...)(I indices)
>   {
>       foreach (idx; indices) {
>           static if (is(typeof(idx) == IndexRange))
>           {
>               // this index is a slice
>           }
>           else
>           {
>               // this index is a single index
>           }
>       }
>   }
>
> Without this unification, you'd have to implement 2^n different overloads of opIndex / opSlice in order to handle all cases of subdimensional slicing in n dimensions.
>
> So you can think of it simply as:
>
>   []  ==  opIndex
>   ..  ==  opSlice
>
> in all cases.
>
> It is more uniform this way, and makes perfect sense to me.

Well, thanks for the explanation, but I'm sure that part of the problem here is that an operation like arr[x, y..z] doesn't even make sense to me. I have no idea what that does. But I don't normally do anything with multidimensional arrays, and in the rare case that I do, I certainly don't need to overload anything for them. I just slap together a multidimensional array of whatever type it is I want in a multidimensional array. I can certainly understand that there are folks who really do care about this stuff, but it's completely outside of what I deal with, and for anything I've ever dealt with, making opIndex be for _slicing_ makes no sense whatsoever, and the added functionality to the language with regards to multi-dimensional arrays is useless. So, this whole mess has always felt like I've had something nonsensical thrown at me because of a use case that I don't even properly understand.

> > I generally forget that that change was even made precisely because it makes no sense to me, whereas using opSlice for slicing makes perfect sense. I always use opIndex for indexing and opSlice for slicing just like they were originally designed.
>
> [...]
>
> This is probably why Kenji didn't deprecate the original use of opSlice, since for the 1-dimensional case the homogenization of opSlice / opIndex is probably unnecessary and adds extra work for the programmer: if you want to implement arr[x..y] you have to write both opSlice and an opIndex overload that accepts what opSlice returns, as opposed to just writing a single opSlice.
>
> So probably we should leave it the way it is (and perhaps clarify that in the spec), as deprecating the "old" use of opSlice in the 1-dimensional case would cause problems.

Well, I'd prefer that the original way be left, since that's all I've ever needed. If the new way makes life easier for the scientific programmers and whatnot, then great, but from the standpoint of anyone not trying to provide multi-dimensional overloads, using opIndex for slicing is just plain bizarre.

That being said, I'm fine with the compiler detecting if opIndex and opSlice are declared in a way that they conflict and then giving an error. I just don't want to be forced to use opIndex for slicing.

- Jonathan M Davis

March 10, 2017
On Friday, 10 March 2017 at 20:36:35 UTC, Jonathan M Davis wrote:
>
> problem here is that an operation like arr[x, y..z] doesn't even make sense to me. I have no idea what that does.

https://www.mathworks.com/help/matlab/math/matrix-indexing.html#f1-85544

You can stop reading as soon as it starts talking about "linear indexing". However if you're also curious what that means: https://www.mathworks.com/help/matlab/math/matrix-indexing.html#f1-85511
March 10, 2017
On Fri, Mar 10, 2017 at 12:36:35PM -0800, Jonathan M Davis via Digitalmars-d wrote:
> On Friday, March 10, 2017 10:43:43 H. S. Teoh via Digitalmars-d wrote:
[...]
> Well, thanks for the explanation, but I'm sure that part of the problem here is that an operation like arr[x, y..z] doesn't even make sense to me. I have no idea what that does.

That's a subdimensional slice. In this case, we're dealing with a 2D array -- you can think of it as a matrix -- and extracting the y'th to z'th elements from column x.  Conversely, arr[x..y, z] extracts the x'th to y'th elements from row z.  This kind of subdimensional slicing is pretty common when you work with things like tensors.


> But I don't normally do anything with multidimensional arrays, and in the rare case that I do, I certainly don't need to overload anything for them. I just slap together a multidimensional array of whatever type it is I want in a multidimensional array.

If by "multidimensional arrays" you mean arrays of arrays, then I can understand your sentiment.

But when dealing with high-dimensional tensors, storing them explicitly may not always be the best approach. Think sparse matrices, for example. You want to be able to provide array indexing / slicing operations to user types apart from the built-in arrays.

Not to mention that there are many problems with using arrays of arrays as "multidimensional" arrays, besides storage issues. One being that you can't easily represent a slice of an array of arrays across the minor dimension (i.e., a slice of every i'th element of each array in an int[][]).  For things like that, you *really* want to be able to write arr[x, y..z] and arr[x..y, z] rather than arr[x][y..z] and arr[x..y][z]. Doing it the latter way means you need to implement arr.opSlice that returns a proxy type that implements opIndex.  Kenji's design allows you to implement all of these cases (and more) by just implementing a single type with a single opSlice and single opIndex, and no proxy types, to boot. It's clean and elegant.


> I can certainly understand that there are folks who really do care about this stuff, but it's completely outside of what I deal with, and for anything I've ever dealt with, making opIndex be for _slicing_ makes no sense whatsoever, and the added functionality to the language with regards to multi-dimensional arrays is useless. So, this whole mess has always felt like I've had something nonsensical thrown at me because of a use case that I don't even properly understand.

Please don't denigrate something as useless without at least trying to understand it first.


[...]
> Well, I'd prefer that the original way be left, since that's all I've ever needed. If the new way makes life easier for the scientific programmers and whatnot, then great, but from the standpoint of anyone not trying to provide multi-dimensional overloads, using opIndex for slicing is just plain bizarre.

Actually, it's the distinction between opSlice and opIndex in the old scheme that's what's bizarre. It's like saying that to implement userType(x) you need to declare userType.opSingleArgCall and to implement userType(x,y) you need to declare userType.opTwoArgCall, just because there happens to be 2 arguments instead of 1.  Why not just unify the two under a single opCall, just with two overloads depending on what arguments you want to pass to it?

In the same vein, requiring two different methods to implement arr[x] vs. arr[x..y] is bizarre.  They should be unified under a single method -- I don't care what you call it, maybe opIndex is a bad name because it gives the wrong connotation for what it does, maybe it should be named opSquareBrackets or something. But the point is that this distinction between how arr[x] and arr[x..y] are handled is artificial and needless, and does not easily generalize to higher dimensions.  Kenji's design is far superior.


> That being said, I'm fine with the compiler detecting if opIndex and opSlice are declared in a way that they conflict and then giving an error. I just don't want to be forced to use opIndex for slicing.
[...]

Nobody is forcing you to use opIndex for slicing right now, because the compiler currently accepts the old syntax for 1-dimensional arrays. And I already said it's probably a bad idea to deprecate the old syntax.


T

-- 
What do you mean the Internet isn't filled with subliminal messages? What about all those buttons marked "submit"??
March 10, 2017
On Friday, March 10, 2017 14:07:59 H. S. Teoh via Digitalmars-d wrote:
> On Fri, Mar 10, 2017 at 12:36:35PM -0800, Jonathan M Davis via
Digitalmars-d wrote:
> > I can certainly understand that there are folks who really do care about this stuff, but it's completely outside of what I deal with, and for anything I've ever dealt with, making opIndex be for _slicing_ makes no sense whatsoever, and the added functionality to the language with regards to multi-dimensional arrays is useless. So, this whole mess has always felt like I've had something nonsensical thrown at me because of a use case that I don't even properly understand.
>
> Please don't denigrate something as useless without at least trying to understand it first.

As I said, for what _I_ deal with, it's useless. It's obviously useful to some subset of programmers - particularly folks doing scientific stuff based on what I've seen about posts about multi-dimensional arrays - but it's useless to me. I did not mean to denigrate anything, so sorry if that wasn't clear. My point is that I don't want to have to worry about it or be affected by it when it is useless to me - particularly when I'd have to spend some time studying it to understand it properly. About the only time I've dealt with matrices in any real way was when I took linear algebra, and I've forgotten almost everything from that class. It simply has nothing to do with anything that I've ever needed to program, and I'd just as soon avoid any kind of math that would require it. So, as long as the multi-dimensional slicing stuff sits in its own little corner of the language where I don't have to deal with it, I'm fine. I just want to keep using opSlice for slicing and opIndex for indexing, because that makes sense to me and my needs, whereas using opIndex for a slicing operation just seems wrong, much as it apparently has a benefit for generic code dealing with multi-dimensional indexing and slicing.

And as long as the current situation with opSlice working for slicing like it always has continues, I don't care what the subset of programmers who care about multi-dimensional arrays do with opIndex. Unfortunately, it comes up periodically that someone pushes for everything to change to use opIndex for slicing or even to deprecate opSlice for slicing even when the code has nothing to do with multi-dimensional indexing or slicing, and I do object to that. If no multi-dimensional indexing or slicing is involved, I think that opSlice should be used for slicing, not opIndex. Fortunately though, there hasn't been a real push to move everything to use opIndex instead of opSlice and get rid of the original behavior, and I hope that that stays the case.

Regardless, thank you for your thorough explanation as to why it was changed so that opIndex could be used for a slicing operation.

- Jonathan M Davis

March 12, 2017
On Friday, 10 March 2017 at 18:43:43 UTC, H. S. Teoh wrote:
>
> 	IndexRange opSlice(int x, int y) { ... }

Why is opSlice a method-operator?

Wouldn't it be better if you could do something more general

auto idx_selection = (1,3,5); // select element 1, 3 and 5
auto idx_range = (0..4); // select element 0,1,2,3
auto idx_mixed = (0..3,4,5) // select 0,1,2,4,5
auto idx2d_range = [(0..4,5), (0..6)]  // select intersections of rows and columns

etc...
1 2
Next ›   Last »