December 21, 2006
Andrei Alexandrescu (See Website For Email) wrote:
> Another way out of it is to ban "length" but stick with "$". But "$" has another bunch of problems. It's a special character used only once, and only in a very particular situation. There is no general concept standing behind its usage: it sticks out like a sore thumb. "$" isn't the last index in an array. It's that only when used inside a slice, and refers only to the innermost index of the array. Quite a waste of a special character out there, and to little usefulness.
> 
> But if we made "$" into an operator identifying the last element of _any_ array, which could refer to the last element of _the left-hand side_ array if we so want, then all of a sudden it becomes useful in a myriad of situations:

Provided that some such expansion path for "$" exists, it would seem to be adequate for D 1.0 to just remove "length". And this could be done by Jan 1.
December 21, 2006
Andrei Alexandrescu (See Website For Email) wrote:

> But if we made "$" into an operator identifying the last element of _any_ array, which could refer to the last element of _the left-hand side_ array if we so want, then all of a sudden it becomes useful in a myriad of situations:
> 
> int i = a[$ - 1]; // get last element
> int i = a[$b - 1]; // get a's element at position b.length - 1
> if (a[$ - 1] == x) { ... }
> if ($a > 0) { ... }
> if ($a == $b) { ... }
> swap(a[0], a[$ - 1]); // swap first and last element

Please give some thought to the case where a and b are of types not easily characterized by a single '.length'.  Matrix classes, or more generally multidimensional array classes being the canonical examples.  For those cases it is desirable to be able to have a '$' with different meaning "per axis".

For those cases a we could have a small extension to your proposal. Have $b translate to b.length, yes, but also have $[3]b and $(1)b translate to to b.length[3] and b.length(1), respectively.  Seeing that, it makes me think perhaps $ would be better as a post-fix unary operator.  Then we'd have b$ --> b.length  and b$[3] --> b.length[3].

Then of course the next step is to have a parameter number automatically passed to the length method given and expression like a[$-1,$-1] so that
    a[$-1,$-1]
    ==>  a[$[0]-1,$[1]-1]
    ==>  a[a$[0]-1,a$[1]-1]
    ==>  a[a.length[0],a.length[1]]

The compiler can decide whether to do indexing or not based on whether .length results in an indexable value.

Finally, in general I think the choice of name 'length' is unfortunate because of it's implication of linearity.  But it's not too late.  If $ becomes associated with .size rather than .length in user types then everything will be ok.  For built-in arrays .length can become a synonym for .size, just as it is with std::string in C++.  C++/STL got this one right.  For generic containers .size is a much better name.

--bb
December 21, 2006
Benji Smith wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> Let me illustrate further why ident is important and what solution we should have for it. Consider C's response to ident:
>>
>> #define IDENT(e) (e)
>>
>  > ...
>  >
>> ...leading to the following implementation of ident:
>>
>> auto ident(auto x) {
>>   return x;
>> }
> 
> I don't get it.
> 
> Why is it necessary (or even desirable) for functions to return lvalues?
> 
> I can see how it'd be an interesting trick, and I can appreciate the experimental curiosity about how the language (and the implementation) should cope with the explicit handling of lvalues.
> 
> But I can't think of a real-world use case.
> 
> Are there languages where this is currently possible? How do they implement it? And, much more importantly, what do people use it for?
> 
> --benji
FWIW,
In PL/1 the substring function could be used as an lvalue. (Analogously, in D one can use array slices as lvalues.)

I can't remember whether PL/1 allowed one to use this feature to delete characters, or only to alter them, but in Python one can use this feature to delete characters (or replace them with something that isn't a character?).

December 21, 2006
Pragma wrote:
> Stewart Gordon wrote:
>> Pragma wrote:
>> <snip>
>>> But I'd like to echo the other comments in this thread regarding structs.  IMO, we're not there yet.  I think folks are looking for a solution that does this:
>>>
>>> - A ctor like syntax for creating a new struct
>>> - No more forced copy of the entire struct on creation
>>
>> What do you mean by this?
> 
> I'm glad you asked. :)
> 
> Static opCall() is not a ctor.  It never was.  People have been clamoring to be able to use this() inside of a struct, much like they can with classes and modules.  But the desire here goes beyond mere symmetry between type definitions.
> 
> The forced copy issue is something that is an artifact of emulating a constructor for a struct.  Take the standard approach for example:
> 
> struct Foo{
>   int a,b,c;
> }
> 
> Foo f = {a:1, b:2, c:3};
> Foo f = {1,2,3}; // more succinct version
> 
> So here we create a struct in place, and break encapsulation in the process.  What we really want is an opaque type, that has a little more smarts on creation.  Taking advantage of in/body/out would be nice too.  No problem, we'll just use opCall():
> 
> struct Foo{
>   int a,b,c;
>   static Foo opCall(int a,int b,int c){
>     Foo _this;
>     _this.a = a;
>     _this.b = b;
>     _this.c = c;
>     return _this;
>   }
> }
> 
> Foo f = Foo(1,2,3);
> 
> That's better, but look at what's really happening here.  Inlining and compiler optimization aside, the 'constructor' here creates a Foo on the stack which is then returned and *copied* to the destination 'f'.

If that's not compiled into a direct write (even to the point of keeping the value virtual unless if it's actually needed in contiguous memory) then there's something wrong with the optimiser.

> To most, that won't ever seem like a problem. But for folks who are working with Vector types or Matrix implementations, that's something to scream about.  In a nutshell, any struct wider than a register that is populated in the 100's to 1000's is wasting cycles needlessly.

I've never liked doing that - if you're going to have very large vectors or matrices, it's usually better just to switch to a programmatic model (run-time sized) rather than keeping it parametric (templated). At some point you're spending more time compiling than you are executing.

This requires duplication of a lot of code, which is unacceptable. I've been thinking about this problem for a long time and I still have no solution to it (mixins are so very not the right way); perhaps we're misconsidering how parametric and programmatic types should interact.

Let's say that the values used in the constructor of a type are recorded. So we might have (excuse the language):

	#!/usr/bin/moki --version=1

	#using: #moki.(size, static array);

	Matrix := #class
	{
		data : static array of (type, rows, cols);

		// No content needed.
		#this (type, rows : size, cols : size);
	};

Now "rows" and "cols" are both automatically constant properties of any created Matrix. If our algorithm requires certain limitations on the matrix, we can "specialise" it:

	transpose (matrix : Matrix (type, rows, rows)) : Matrix (type, rows, rows)
	...

And the compiler can generally optimise the Matrix like it were parametric - it could even apply discretionary optimisation so that it doesn't waste compile time on what doesn't even matter - but you could still use it programmatically.

> So that brings us to something like this:
> 
> struct Foo{
>   int a,b,c;
>   this(int a,int b,int c){
>     this.a = a;
>     this.b = b;
>     this.c = c;
>   }
> }
> 
> Foo f = Foo(1,2,3);
> 
> Ambiguity aside, this fixes encapsulation, gives a familiar syntax, and almost fixes the allocation issues.  (see below)

There's still an implied copy, but again, that shouldn't be relevant for any properly-working optimiser.

>>
>>> - Something that is disambiguated from static opCall
>> <snip>
>>
>> Do you mean that constructors for structs should have a notation
>> distinct from S(...)?
>>
> 
> Well, I think it's one of the reasons why we don't have ctors for structs right now.  The preferred syntax for a "struct ctor" would probably be this:
> 
> S foo = S(a,b,c);
> 
> Which is indistinct from "static opCall".  Throwing 'new' in there wouldn't work either, since that would be a dynamic allocation:
> 
> S* foo = new S(a,b,c);

"new <struct>" didn't used to parse, and I argued then that it shouldn't because everywhere else the "new" operator exactly described the type it would create. If this special case were removed, it could work as a constructor call.

But really I think static opCall should just be killed off.

> So that leaves us with "something else" that provides both a way to invoke a ctor, yet allocates the data on the stack and doesn't force you to create an additional copy:
> 
> S foo(a,b,c);  // c++ style
> S foo = stackalloc S(a,b,c); // alloca() style (in place of new)
> S foo = new(stack) S(a,b,c): // another idea
> 
December 21, 2006
Pragma wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>>
>> A simpler grammar would have been to simply allow:
>>
>> UnaryExpression:
>>     PostfixExpression
>>     & UnaryExpression
>>     ... etc. etc. ...
>>     $ PostfixExpression
>>
>> But this would have been ambiguous. If the compiler sees "$-1", then the bad grammar says that's a unary use of $ because -1 is a PostfixExpression. But that's not what we wanted! We wanted $ to be nullary. That's why I needed to put all the cases in UnaryExpression.
>>
> 
> Nice post, and one heck of an argument!
> 
> FWIW, I advocated something similar during the last round of debates before the '$' operator was introduced.  What I wanted to see was '$' to become like 'this' within slice and array expressions, so that the issues regarding 'length' could be resolved.  In essence one could simply say '$.length' and mean 'the length of the current array':
> 
> b[0 .. $.length];
> a[0 .. $.getIndexOf(';')];
> 
> So in essence, every use of '$' would be a 'nullary' operator - an alias if you will.
> 

I rather like this.  And I think I liked it then, too... if not, oh well.

> I'd imagine that extending things in this manner would simplify things grammatically while allowing for a wider category of uses.  However, it doesn't solve the issue that you brought up, and that I've quoted above.
> 
> c[$-1];
> 
> It looks like it should be an implicit cast of the '$' to a size_t (length), via it's use in an expression.  Any thoughts on this?

If $ is like a 'this', then it ought to be have semantically the same, so if $ is a class/struct with an opCast to size_t defined, the obvious happens.  If its anything else, it ought to be a compile time error,  perhaps suggesting you had meant '$.length' instead.

-- Chris Nicholson-Sauls

December 21, 2006
Russ Lewis wrote:
> Walter Bright wrote:
> 
>> More ABI changes, and implicit [] => * no longer allowed.
>>
>> http://www.digitalmars.com/d/changelog.html
>>
>> http://ftp.digitalmars.com/dmd.175.zip
> 
> Looks like casts from void* to struct* is broken.
> 
> Russ

Not sure what I did wrong or what I'm doing right now, but they seem to be working.  Sorry for any confusion I caused.
December 22, 2006
Bill Baxter wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
> 

> Then of course the next step is to have a parameter number automatically passed to the length method given and expression like a[$-1,$-1] so that
>     a[$-1,$-1]
>     ==>  a[$[0]-1,$[1]-1]
>     ==>  a[a$[0]-1,a$[1]-1]
>     ==>  a[a.length[0],a.length[1]]

Slight typo there.  Last line should of course have been:

      ==>  a[a.length[0]-1,a.length[1]-1]

> The compiler can decide whether to do indexing or not based on whether .length results in an indexable value.
> 
> Finally, in general I think the choice of name 'length' is unfortunate because of it's implication of linearity.  But it's not too late.  If $ becomes associated with .size rather than .length in user types then everything will be ok.  For built-in arrays .length can become a synonym for .size, just as it is with std::string in C++.  C++/STL got this one right.  For generic containers .size is a much better name.

Another thing which occurred to me is that if the meaning of $ becomes tied to "size" rather than "length", then then you also have the mnemonic of $ looking like an 's' as in 'size'.

I also still think making it a postfix operator makes sense.

--bb
December 22, 2006
Chris Nicholson-Sauls wrote:
> Pragma wrote:
>> Andrei Alexandrescu (See Website For Email) wrote:
>>>
>>> A simpler grammar would have been to simply allow:
>>>
>>> UnaryExpression:
>>>     PostfixExpression
>>>     & UnaryExpression
>>>     ... etc. etc. ...
>>>     $ PostfixExpression
>>>
>>> But this would have been ambiguous. If the compiler sees "$-1", then the bad grammar says that's a unary use of $ because -1 is a PostfixExpression. But that's not what we wanted! We wanted $ to be nullary. That's why I needed to put all the cases in UnaryExpression.
>>>
>>
>> Nice post, and one heck of an argument!
>>
>> FWIW, I advocated something similar during the last round of debates before the '$' operator was introduced.  What I wanted to see was '$' to become like 'this' within slice and array expressions, so that the issues regarding 'length' could be resolved.  In essence one could simply say '$.length' and mean 'the length of the current array':
>>
>> b[0 .. $.length];
>> a[0 .. $.getIndexOf(';')];
>>
>> So in essence, every use of '$' would be a 'nullary' operator - an alias if you will.

In both of those cases the use seems rather silly to me because a and b are both single characters to begin with.  Might as well just type
  b[0 .. b.length];
  a[0 .. a.getIndexOf(';')];
instead.  But I get the point.  Sometimes you have
  g_openSocketHandles[0 .. g_openSocketHandles.getIndexOf()]

But maybe just allowing 'this' in the brackets is enough there, without going on and abbreviating it to $.  The $==.length proposal at least has the advantage of being backwards compatible.

>>
> 
> I rather like this.  And I think I liked it then, too... if not, oh well.
> 
>> I'd imagine that extending things in this manner would simplify things grammatically while allowing for a wider category of uses.  However, it doesn't solve the issue that you brought up, and that I've quoted above.
>>
>> c[$-1];
>>
>> It looks like it should be an implicit cast of the '$' to a size_t (length), via it's use in an expression.  Any thoughts on this?
> 
> If $ is like a 'this', then it ought to be have semantically the same, so if $ is a class/struct with an opCast to size_t defined, the obvious happens.  If its anything else, it ought to be a compile time error,  perhaps suggesting you had meant '$.length' instead.
> 
> -- Chris Nicholson-Sauls

Not sure I like $==this as must as $==.length.  I have pressing need for a brief syntax for specifying the length, but no such thing for a shorter form of 'this'.  But anyway, if you're going to allow '$' to mean 'this' inside brackets, first you first need the language feature that allows 'this' to be used inside brackets in the first place.  And maybe if you have that you'll find it's sufficient.

Another thing is if you're going to allow 'this' in brackets, then you should take the idea to its logical conclusions and allow it in member function call parameter lists too.  That might be nice for things like enum paramters.

Of course if $ gets translated into a call to a method/property, you could have it your way if you prefer for your classes.  Just use
    opDollar() { return this; }
and voila! You can use your $.getIndexOf(';').

--bb

December 22, 2006
Bill Baxter wrote:
> Chris Nicholson-Sauls wrote:
>> Pragma wrote:
>>> Andrei Alexandrescu (See Website For Email) wrote:
>>>>
>>>> A simpler grammar would have been to simply allow:
>>>>
>>>> UnaryExpression:
>>>>     PostfixExpression
>>>>     & UnaryExpression
>>>>     ... etc. etc. ...
>>>>     $ PostfixExpression
>>>>
>>>> But this would have been ambiguous. If the compiler sees "$-1", then the bad grammar says that's a unary use of $ because -1 is a PostfixExpression. But that's not what we wanted! We wanted $ to be nullary. That's why I needed to put all the cases in UnaryExpression.
>>>>
>>>
>>> Nice post, and one heck of an argument!
>>>
>>> FWIW, I advocated something similar during the last round of debates before the '$' operator was introduced.  What I wanted to see was '$' to become like 'this' within slice and array expressions, so that the issues regarding 'length' could be resolved.  In essence one could simply say '$.length' and mean 'the length of the current array':
>>>
>>> b[0 .. $.length];
>>> a[0 .. $.getIndexOf(';')];
>>>
>>> So in essence, every use of '$' would be a 'nullary' operator - an alias if you will.
> 
> In both of those cases the use seems rather silly to me because a and b are both single characters to begin with.  Might as well just type
>   b[0 .. b.length];
>   a[0 .. a.getIndexOf(';')];
> instead.  But I get the point.  Sometimes you have
>   g_openSocketHandles[0 .. g_openSocketHandles.getIndexOf()]
> 
> But maybe just allowing 'this' in the brackets is enough there, without going on and abbreviating it to $.  The $==.length proposal at least has the advantage of being backwards compatible.
> 
>>>
>>
>> I rather like this.  And I think I liked it then, too... if not, oh well.
>>
>>> I'd imagine that extending things in this manner would simplify things grammatically while allowing for a wider category of uses.  However, it doesn't solve the issue that you brought up, and that I've quoted above.
>>>
>>> c[$-1];
>>>
>>> It looks like it should be an implicit cast of the '$' to a size_t (length), via it's use in an expression.  Any thoughts on this?
>>
>> If $ is like a 'this', then it ought to be have semantically the same, so if $ is a class/struct with an opCast to size_t defined, the obvious happens.  If its anything else, it ought to be a compile time error,  perhaps suggesting you had meant '$.length' instead.
>>
>> -- Chris Nicholson-Sauls
> 
> Not sure I like $==this as must as $==.length.  I have pressing need for a brief syntax for specifying the length, but no such thing for a shorter form of 'this'.  But anyway, if you're going to allow '$' to mean 'this' inside brackets, first you first need the language feature that allows 'this' to be used inside brackets in the first place.  And maybe if you have that you'll find it's sufficient.

The problem with actually using the 'this' keyword in place of $ is one of ambiguity. Given a collection class 'Set' and some other class 'Foo', what to do if a 'this' is used within a slice of a 'Set' instance within a member of 'Foo'?  Does it evaluate to the Foo referance it would in all other cases?  Or to a Set referanc?  And if the latter, how to get the Foo referance if that really is what I wanted?

The $ would have to be different from 'this' in the classes' sense.  Perhaps it would be better to call it a 'self' or even a 'with' than a 'this'.

-- Chris Nicholson-Sauls
December 22, 2006
Pragma wrote:
> b[0 .. $.length];
> a[0 .. $.getIndexOf(';')];
> 
> So in essence, every use of '$' would be a 'nullary' operator - an alias if you will.

This isn't going to be agreeable to most since the purpose of $ in the first place was to save typing.

> I'd imagine that extending things in this manner would simplify things grammatically while allowing for a wider category of uses.  However, it doesn't solve the issue that you brought up, and that I've quoted above.
> 
> c[$-1];
> 
> It looks like it should be an implicit cast of the '$' to a size_t (length), via it's use in an expression.  Any thoughts on this?

I'd rather have $ defined everywhere to mean length, which is useful outside [] as well.

Andrei

P.S. Maybe there's a misunderstanding? The grammar I sent does not have a problem w.r.t. unary vs. nullary; it's just a tad more complicated to avoid ambiguity.