December 22, 2006
Don Clugston wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> Another way out of it is to ban "length" but stick with "$". But "$" has another bunch of problems. It's a special character used only once, and only in a very particular situation. There is no general concept standing behind its usage: it sticks out like a sore thumb. "$" isn't the last index in an array. It's that only when used inside a slice, and refers only to the innermost index of the array. Quite a waste of a special character out there, and to little usefulness.
>>
>> But if we made "$" into an operator identifying the last element of _any_ array, which could refer to the last element of _the left-hand side_ array if we so want, then all of a sudden it becomes useful in a myriad of situations:
> 
> Provided that some such expansion path for "$" exists, it would seem to be adequate for D 1.0 to just remove "length". And this could be done by Jan 1.

That is correct. One advantage of the unary/nullary $ is that it's a strict extension to today's $. Thus, no existing code will be invalidated by the new semantics of $.

Andrei
December 22, 2006
Bill Baxter wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
> 
>> But if we made "$" into an operator identifying the last element of _any_ array, which could refer to the last element of _the left-hand side_ array if we so want, then all of a sudden it becomes useful in a myriad of situations:
>>
>> int i = a[$ - 1]; // get last element
>> int i = a[$b - 1]; // get a's element at position b.length - 1
>> if (a[$ - 1] == x) { ... }
>> if ($a > 0) { ... }
>> if ($a == $b) { ... }
>> swap(a[0], a[$ - 1]); // swap first and last element
> 
> Please give some thought to the case where a and b are of types not easily characterized by a single '.length'.  Matrix classes, or more generally multidimensional array classes being the canonical examples.  For those cases it is desirable to be able to have a '$' with different meaning "per axis".

I did. The thing with language design is that it's easy to either
underdo or overdo it, and that where underdoing or overdoing starts is
highly subjective.

IMHO the current meaning of "$" is a good example of underdoing. The
"$expression" meaning "(expression).length" is (again IMHO) just right. I
use collection.size() all the time in C++, and scalar(@array) or $#array
all the time in Perl, inside and outside index expressions. So I'd be
happy to have that. Taking it to the next step of meaning any
subdimension of a multidimensional (or fractal, heh) structure is, IMHO,
overdoing because I can think of few use examples that are both frequent
enough and interesting enough.


Andrei

December 22, 2006
Andrei Alexandrescu (See Website For Email) wrote:
> Pragma wrote:
>> b[0 .. $.length];
>> a[0 .. $.getIndexOf(';')];
>>
>> So in essence, every use of '$' would be a 'nullary' operator - an alias if you will.
> 
> This isn't going to be agreeable to most since the purpose of $ in the first place was to save typing.
> 
>> I'd imagine that extending things in this manner would simplify things grammatically while allowing for a wider category of uses.  However, it doesn't solve the issue that you brought up, and that I've quoted above.
>>
>> c[$-1];
>>
>> It looks like it should be an implicit cast of the '$' to a size_t (length), via it's use in an expression.  Any thoughts on this?
> 
> I'd rather have $ defined everywhere to mean length, which is useful outside [] as well.

Understood.  I just figured I'd throw that out there in case it had any merit in the current discussion.

> 
> Andrei
> 
> P.S. Maybe there's a misunderstanding? The grammar I sent does not have a problem w.r.t. unary vs. nullary; it's just a tad more complicated to avoid ambiguity.

Ah, I understand then.  The way you explained the grammar changes, it looked to me as though there was still an issue.

-- 
- EricAnderton at yahoo
December 22, 2006
Andrei Alexandrescu (See Website for Email) wrote:
> Bill Baxter wrote:
> 
>> Andrei Alexandrescu (See Website For Email) wrote:
>>
>>> But if we made "$" into an operator identifying the last element of _any_ array, which could refer to the last element of _the left-hand side_ array if we so want, then all of a sudden it becomes useful in a myriad of situations:
>>>
>>> int i = a[$ - 1]; // get last element
>>> int i = a[$b - 1]; // get a's element at position b.length - 1
>>> if (a[$ - 1] == x) { ... }
>>> if ($a > 0) { ... }
>>> if ($a == $b) { ... }
>>> swap(a[0], a[$ - 1]); // swap first and last element
>>
>>
>> Please give some thought to the case where a and b are of types not easily characterized by a single '.length'.  Matrix classes, or more generally multidimensional array classes being the canonical examples.  For those cases it is desirable to be able to have a '$' with different meaning "per axis".
> 
> 
> I did. The thing with language design is that it's easy to either
> underdo or overdo it, and that where underdoing or overdoing starts is
> highly subjective.
> 
> IMHO the current meaning of "$" is a good example of underdoing. The
> "$expression" meaning "(expression).length" is (again IMHO) just right. I
> use collection.size() all the time in C++, and scalar(@array) or $#array
> all the time in Perl, inside and outside index expressions. So I'd be
> happy to have that. Taking it to the next step of meaning any
> subdimension of a multidimensional (or fractal, heh) structure is, IMHO,
> overdoing because I can think of few use examples that are both frequent
> enough and interesting enough.

Maybe so.  Multidimensional arrays seem as common as air from where I sit, but I can see that not everyone works with such things every day.

If $ really did become synonymous with .length (or preferably .size) then one could have .length return an array rather than a simple number.  In that case multi-dim'ers could have
   M[5..$[0]-1, 0..$[1]-5].
Eh.  Not so pretty.  For comparison, in Python that would be M[5:,:-5], and in Matlab that would be M(6:end, 1:end-4).  Both of those look much better to me.  If the index to go with $ could be supplied automatically by the compiler then D could have M[5..$-1, 0..$-5].

Taking a different tack, I wonder if repeated indexing can be made as efficient (or nearly) as a single multi-index?
    M[5..$-1][0..$-5]

That's much easier to look at than M[5..$[0]-1, 0..$[1]-5], at least. And it's more general in the sense that from looking at the expression, M looks just like a standard Type[][] array.

Hmm.  I'll play with that.  I think it's at least technically possible, now that D has the ability to override opAssign.

--bb
December 23, 2006
On Wed, 20 Dec 2006 06:24:28 -0800, "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail@erdani.org> wrote:

>Long story short, "length" introduces a keyword through the back door, effectively making any use of "length" anywhere unrecommended and highly fragile.

I've always had a strong feeling that all-lowercase words should be reserved as potential future keywords anyway, barring a few special cases like 'i'.

I suppose a lot of that comes down to style, though. If you have a convention where most identifiers start with a capital (excluding prefixes like 'l_', 'p_' or 'm_'), that only really leaves common short non-descriptive names like 'i' as a special case.

It's one of the things that bugs me about the normal Java style - it allows all-lowercase identifiers (only using capitals as word separators), which look like potential keywords if you're not too familiar with Java. Besides, from natural language convention, capitals should appear at the start of a sentence - having them in the middle of an sentence-like identifier but not at the start just looks wrong to me.

-- 
Remove 'wants' and 'nospam' from e-mail.
December 24, 2006
Benji Smith wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> Let me illustrate further why ident is important and what solution we should have for it. Consider C's response to ident:
>>
>> #define IDENT(e) (e)
>>
>  > ...
>  >
>> ...leading to the following implementation of ident:
>>
>> auto ident(auto x) {
>>   return x;
>> }
> 
> I don't get it.
> 
> Why is it necessary (or even desirable) for functions to return lvalues?
> 
> I can see how it'd be an interesting trick, and I can appreciate the experimental curiosity about how the language (and the implementation) should cope with the explicit handling of lvalues.
> 
> But I can't think of a real-world use case.
> 
> Are there languages where this is currently possible? How do they implement it? And, much more importantly, what do people use it for?
> 
> --benji

There have been numerous cases here in the NG of people griping with this problem when using property-methods, or operator overloads such as opIndex, which both cannot return lvalues and thus have usage limitations.

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
December 25, 2006
Andrei Alexandrescu (See Website For Email) wrote:
> A simpler grammar would have been to simply allow:
> 
> UnaryExpression:
>     PostfixExpression
>     & UnaryExpression
>     ... etc. etc. ...
>     $ PostfixExpression
> 
> But this would have been ambiguous. If the compiler sees "$-1", then the bad grammar says that's a unary use of $ because -1 is a PostfixExpression. But that's not what we wanted! We wanted $ to be nullary. That's why I needed to put all the cases in UnaryExpression.

Wouldn't that ambiguity be fixed by making $ a postfix unary operator instead?  Or would that just introduce different ambiguities?

--bb
December 25, 2006
Oskar Linde wrote:
> Bill Baxter wrote:
>> After trying to write a multi-dimensional array class, my opinion is that D slice support could use some upgrades overall.
> 
> I'd be very interested in looking at what you've come up with. With my own implementation of a multi-dimensional array type a couple of months ago, I came to the same conclusion. I posted about it in:
> 
> news://news.digitalmars.com:119/edrv0n$hth$1@digitaldaemon.com
> http://www.digitalmars.com/d/archives/digitalmars/D/announce/4717.html
> 
>> What I'd like to see:
>>
>> --MultiRange Slice--
>> * A way to have multiple ranges in a slice, and a mix slice of and non-slice indices:
>>     A[i..j, k..m]
>>     A[i..j, p, k..m]
> (snip)
>  >      A[0..$,3..$]
> 
> Yes, I would too. It is quite frustrating having the syntax in the language but not being allowed to utilize it... :)
> 
> I work around this by instead using a custom slice syntax instead:
> 
> A[range(i,j), range(k,m)]
> A[range(i,j), p, range(k,m)]
> A[range(0,end), range(3..end)]
> A[end-1, p % end]

Yeh, that's similar to what I'm doing too.  But it's pretty ugly.  So I guess that means you're using opIndex for everything and leaving opSlice alone.  Are you able to have ranges return arrays and specific indexes return scalar values that way?  That seems to me a big reason for having opSlice exist in the first place.  The .. in the brackets not only means you're slicing, it also means the function should return another array, versus returning an element.  That seems like a nice distinction to have to me.

> 
> Basicly, the transformation is:
> 
> $ => end
> a..b => range(a,b)
> 
> I briefly described this in:
> news://news.digitalmars.com:119/eft9id$2aq3$1@digitaldaemon.com

Thanks for the link.  The 'end' thing isn't so bad, at least for a former Matlab user. :-)


--bb
December 25, 2006
Bill Baxter wrote:
> Oskar Linde wrote:
>> Bill Baxter wrote:
>>> After trying to write a multi-dimensional array class, my opinion is that D slice support could use some upgrades overall.
>>
>> I'd be very interested in looking at what you've come up with. With my own implementation of a multi-dimensional array type a couple of months ago, I came to the same conclusion. I posted about it in:
>>
>> news://news.digitalmars.com:119/edrv0n$hth$1@digitaldaemon.com
>> http://www.digitalmars.com/d/archives/digitalmars/D/announce/4717.html
>>
>>> What I'd like to see:
>>>
>>> --MultiRange Slice--
>>> * A way to have multiple ranges in a slice, and a mix slice of and non-slice indices:
>>>     A[i..j, k..m]
>>>     A[i..j, p, k..m]
>> (snip)
>>  >      A[0..$,3..$]
>>
>> Yes, I would too. It is quite frustrating having the syntax in the language but not being allowed to utilize it... :)
>>
>> I work around this by instead using a custom slice syntax instead:
>>
>> A[range(i,j), range(k,m)]
>> A[range(i,j), p, range(k,m)]
>> A[range(0,end), range(3..end)]
>> A[end-1, p % end]
> 
> Yeh, that's similar to what I'm doing too.  But it's pretty ugly.  So I guess that means you're using opIndex for everything and leaving opSlice alone.  Are you able to have ranges return arrays and specific indexes return scalar values that way?  That seems to me a big reason for having opSlice exist in the first place.  The .. in the brackets not only means you're slicing, it also means the function should return another array, versus returning an element.  That seems like a nice distinction to have to me.
> 
Is there anything particularly wrong with having foo[a..b,c,d..$] being syntactical sugar for foo[a..b][c][d..$] ? That way, I would imagine you could implement slicing and indexing of multidimensional arrays quite easily because, as Norbert Nemec said, indexing just returns an array with dimension reduced by one, and slicing returns an array of the same dimension, but perhaps different size. It also seems to allow any combination of slicing and indexing without needing variadic functions.
December 25, 2006
Reiner Pope wrote:
> Bill Baxter wrote:
>> Oskar Linde wrote:
>>> Bill Baxter wrote:
>>>> After trying to write a multi-dimensional array class, my opinion is that D slice support could use some upgrades overall.
>>>
>>> I'd be very interested in looking at what you've come up with. With my own implementation of a multi-dimensional array type a couple of months ago, I came to the same conclusion. I posted about it in:
>>>
>>> news://news.digitalmars.com:119/edrv0n$hth$1@digitaldaemon.com
>>> http://www.digitalmars.com/d/archives/digitalmars/D/announce/4717.html
>>>
>>>> What I'd like to see:
>>>>
>>>> --MultiRange Slice--
>>>> * A way to have multiple ranges in a slice, and a mix slice of and non-slice indices:
>>>>     A[i..j, k..m]
>>>>     A[i..j, p, k..m]
>>> (snip)
>>>  >      A[0..$,3..$]
>>>
>>> Yes, I would too. It is quite frustrating having the syntax in the language but not being allowed to utilize it... :)
>>>
>>> I work around this by instead using a custom slice syntax instead:
>>>
>>> A[range(i,j), range(k,m)]
>>> A[range(i,j), p, range(k,m)]
>>> A[range(0,end), range(3..end)]
>>> A[end-1, p % end]
>>
>> Yeh, that's similar to what I'm doing too.  But it's pretty ugly.  So I guess that means you're using opIndex for everything and leaving opSlice alone.  Are you able to have ranges return arrays and specific indexes return scalar values that way?  That seems to me a big reason for having opSlice exist in the first place.  The .. in the brackets not only means you're slicing, it also means the function should return another array, versus returning an element.  That seems like a nice distinction to have to me.
>>
> Is there anything particularly wrong with having foo[a..b,c,d..$] being syntactical sugar for foo[a..b][c][d..$] ? That way, I would imagine you could implement slicing and indexing of multidimensional arrays quite easily because, as Norbert Nemec said, indexing just returns an array with dimension reduced by one, and slicing returns an array of the same dimension, but perhaps different size. It also seems to allow any combination of slicing and indexing without needing variadic functions.

Generally speaking, indexing isn't free.  And doing it three times is 3x more expensive than doing it once.  At the very least if the thing being indexed is a class then a new instance of that class has to be created for each index op.   But likely there's also some internal state that has to be copied and adjusted as well.  And one expects that indexing operations will often appear in inner loops, so they should be as fast as possible.

However, if you use some sort of proxy structs to represent the intermediate indexing expressions and only do the indexing when it's really needed, it may be ok to use [][][].  Basically it's expression templates all over again, just here the expressions are limited to indexing or slice operations.

That may work, and it may even be as efficient as a real multi-index slice with compiler optimizations, but I think it will result in code that's far less clear and probably not as fast.

If it does pan out, though, then there are certainly advantages as you say to only having to worry about two cases ever -- single index and slice index.  Not the any-possible-combination-of-index-and-slice, which admittedly requires some slick vararg template trickery itself.

Anyway, I plan to try implementing that soon (with all the brackets, naturally since the syntactic sugar you mention currently doesn't exist).

--bb