View mode: basic / threaded / horizontal-split · Log in · Help
December 21, 2006
Re: DMD 0.177 release [Length in slice expressions]
Andrei Alexandrescu (See Website For Email) wrote:
> Don Clugston wrote:
>> Andrei Alexandrescu (See Website For Email) wrote:
>>> Similarly, let's say that a group of revolutionaries convinces Walter 
>>> (as I understand happened in case of using "length" and "$" inside 
>>> slice expressions, which is a shame and an absolute disaster that 
>>> must be undone at all costs) to implement "auto"
>>
>> This off-hand remark worries me. I presume that you mean being able to 
>> reference the length of a string, from inside the slice? (rather than 
>> simply the notation).
>> And the problem being that it requires a sliceable entity to know its 
>> length? Or is the problem more serious than that?
>> It's worrying because any change would break an enormous amount of code.
> 
> It would indeed break an enormous amount of code, but "all costs" 
> includes "enormous costs". :o) A reasonable migration path is to 
> deprecate them soon and make them illegal over the course of one year.
> 
> A small book could be written on just how bad language design is using 
> "length" and "$" to capture slice size inside a slice expression. I 
> managed to write two lengthy emails to Walter about them, and just 
> barely got started. Long story short, "length" introduces a keyword 
> through the back door, effectively making any use of "length" anywhere 
> unrecommended and highly fragile. 

That hadn't occurred to me, but you're right.  I never use length in 
that context precisely because it does look like it could be a local 
identifier, whereas I know it'll be clear it's not if I use $.  Also 
"length" is just too long to be of much use to me as a shortcut.  If I'm 
going to be that verbose I might as well type out the whole 
"varname.length".

> Using "$" is a waste of symbolic real 
> estate to serve a narrow purpose; the semantics isn't naturally 
> generalized to its logical conclusion; 

I do use this one, but I agree.  It is unnecessarily special cased for 
built-in array types.  For user-defined types, in 'myvar[0..$]' the $ 
does not expand to 'myvar.length' as one would naturally expect it to. 
Or any sort of opLength() call.  It's just a syntax error.

> and the choice of symbol itself 
> as a reminiscent of Perl's regexp is at best dubious ("#" would have 
> been vastly better as it has count connotation in natural language, and 
> making it into an operator would have fixed the generalization issue). 

I think you'll have to admit that's just your personal taste there. 
Using $ to indicate 'end' is a regexp thing, but regexp's go way beyond 
Perl.

I don't really care what it is as long as there's an terse way to 
specify 'the end' in an indexing expression.

> As things stand now, the rules governing the popping up of "length" and 
> "$" constitute a sudden boo-boo on an otherwise carefully designed 
> expression landscape.


After trying to write a multi-dimensional array class, my opinion is 
that D slice support could use some upgrades overall.  What I'd like to see:

--MultiRange Slice--
* A way to have multiple ranges in a slice, and a mix slice of and 
non-slice indices:
    A[i..j, k..m]
    A[i..j, p, k..m]

  I'm not saying built-in arrays like int[] should allow the above 
expressions, but that at least user types should be allowed to have such 
opSlice methods.  (Currently opSlice's are limited to having 2 arguments 
that represent the values that appear on either side of a single '..' 
token. You can only have two arguments max, but the arguments can be of 
any type.)

The problem is that opSlice has to look like opSlice(T1 lo, T2 hi) right 
now -- just two parameters (or zero).

One possible solution is to turn a single i..j into a single int[2] 
argument (or a mytype[2], for the general case).  But that means one 
won't be able to distinguish A[[1,3]] from A[1..3].  It also means more 
interesting extensions to slice syntax, like adding a stepsize on a 
range, will be ruled out.

Another solution is a built-in slice type.  Ranges like a..b would get 
converted to slice instances automatically.  It would basically be a 
struct with two ints in the simplest case, but to support user types as 
indexes it would need to be template-like, i.e. slice!(type).  A slice 
would look basically like
    struct slice(T=int) { T lo,hi; }
It could also have a .step property.  With the above, lo and hi would 
have to be of the same type, but really it makes sense to let them 
differ, so slice!(T1,T2).  For a range with stepsize, 
slice!(Tlo,Thi,Tstep).

To make writing opSlice methods sane, a single number like the p above 
should be converted to a slice also.  So all arguments passed to opSlice 
would be of type slice, and in the simple case of integer indices, it 
would just be:
    Type opSlice(slice s) { return x[s.lo..s.hi]; }
since integers would be the default types for slice.


--User Definable '$'--
* A way to specify 'the end' in user types.  In the general case the 
meaning of '$' in a slice cannot be known (because any type can be used 
as an index), nor can it be simply substituted with something like a 
.length property, because it may depend on context.  Consider a 
multi-dimensional array class --

     A[0..$,3..$]

The first $ means one thing, and the second one means another.

One solution - make an opLength that gets called with the parameter 
number in which the $ appears.  [My hypothesis is that the param# is the 
only context that ever matters in determining the meaning of $.]  So in 
the above int opLength(int i) would get called twice, once with i==0, 
once with i==1.   opLength can be made to return any type if the user 
just wants it to get 'passed through' to the opSlice call.  If you don't 
need the context you can define it as opLength().

--Step sizes--
This is a handy feature of Python slices.  The general syntax for a 
slice in Python is lo:hi:step, meaning go from 'lo' to 'hi', stepping by 
'step' at a time.   But any of the 3 components can be left out.
lo:hi means step=1.
lo::2 means go to the end, stepping by 2.
:hi means 0 to hi.  Negative steps are also allowed:
hi:lo:-1 means go backwards from hi to lo
::-1 go backwards from the last to first element

D syntax could be something like lo..hi:step.  I like the omission part 
of Python's syntax.  If D had that then most uses of $ would go away 
since we'd have A[3..] as an alternative to A[3..$].



--bb
December 21, 2006
Re: DMD 0.177 release [Length in slice expressions]
== Quote from Andrei Alexandrescu (See Website For Email)
(SeeWebsiteForEmail@erdani.org)'s article
> Don Clugston wrote:
> > Andrei Alexandrescu (See Website For Email) wrote:
> >> Similarly, let's say that a group of revolutionaries convinces Walter
> >> (as I understand happened in case of using "length" and "$" inside
> >> slice expressions, which is a shame and an absolute disaster that must
> >> be undone at all costs) to implement "auto"
> >
> > This off-hand remark worries me. I presume that you mean being able to
> > reference the length of a string, from inside the slice? (rather than
> > simply the notation).
> >
> > And the problem being that it requires a sliceable entity to know its
> > length? Or is the problem more serious than that?
> > It's worrying because any change would break an enormous amount of code.
>
> It would indeed break an enormous amount of code, but "all costs"
> includes "enormous costs". :o) A reasonable migration path is to
> deprecate them soon and make them illegal over the course of one year.
>
> A small book could be written on just how bad language design is using
> "length" and "$" to capture slice size inside a slice expression. I
> managed to write two lengthy emails to Walter about them, and just
> barely got started. Long story short, "length" introduces a keyword
> through the back door, effectively making any use of "length" anywhere
> unrecommended and highly fragile. Using "$" is a waste of symbolic real
> estate to serve a narrow purpose; the semantics isn't naturally
> generalized to its logical conclusion; and the choice of symbol itself
> as a reminiscent of Perl's regexp is at best dubious ("#" would have
> been vastly better as it has count connotation in natural language, and
> making it into an operator would have fixed the generalization issue).
> As things stand now, the rules governing the popping up of "length" and
> "$" constitute a sudden boo-boo on an otherwise carefully designed
> expression landscape.

I guess the question is, what is the best alternative.  I agree about
'length', and I usually don't use "length" in this way, but I do things
like x[$-2..$] all the time.  Some proposals:

1. Symbols

Going in the symbol direction, it might also make sense to *add* something
like "^" for the start of a container.  This would be useful with AAs and
user defined types.  We could use both: a[^+2..$-2].  This would only really
be useful with containers that did not index from 0, i.e. non-integer or AA
indices.

char[char[]] words;
words[^.."brink"]; // all words in dictionary before 'brink'
words["brack"..$] // instead of symbols

Which could translate to:
words.opSlice(words.opBegin(), "brink")
words.opSlice("brack", opEnd())

2. I like this better: I call it "with without with"

In order to maximize the dollar value :) of syntax symbol real estate, the
meaning of $ could be expanded as follows:

Something like X[$begin..$end] could be a shortcut for either X[0..X.length]
for arrays, or X[X.opBegin()..X.opEnd()] for user types.

I think the above solves the problem, doesn't it?  The "$end" phrase is terse
enough for most coders, unique enough to avoid namespace conflicts, avoids the
problem of keywords ghosting in and out of existence in mid-expression, and
avoids ruining $ (or # if #end is used instead) for the symbol space.

We can stop right there... or go on to something for post-1.0:

Other applications of $ could be:

A. Syntax reduction for enumerated types and fields:

 struct Colors {
   enum { red, green, blue };
   void set(int c);
 };

 Colors c;
 c.set($red);

 This use of enumerated type is becoming more common, having "$" be a
 shortcut for <context>.X might make a lot of code more readable.  The
 question then becomes, "Which contexts are searched for .X?"

B. Reserved for language features.

 Leave this open for language designer use.  All $xyz expressions are context
 dependent keywords.  This allows much shorter words to be used, and allows
 language features to be named intelligently without worrying about crashing
 into user-defined names.  For example, C could never introduce a new keyword
 called "begin" or "end", since it would break nearly every C program, but we
 can easily add a keyword called $begin which will not conflict with anything,
 since the $ saves us from conflicts.

 Most of the discussions for new features here have at least some arguments on
 how to add the new syntax for the feature, what other uses those symbols could
 be used for, etc.  The $xyz route allows Walter to introduce lots of language
 concepts in the future without conflicts.  It could even be used to prototype
 keywords that are experimental.  They can even be removed or promoted to non-$
 status later if desired.

 NOTE that if # was used instead of $, it would dovetail nicely with the
 "#line" and "#file" quasi-keywords.

> > These issues you're raising seem to be far too fundamental to be fixed
> > in the next few days, casting grave doubts on whether a D1.0 release on
> > Jan 1 is a good idea.
>
> The lvalue/rvalue issue is fundamental. I'm not in the position to
> assess whether it's a maker or breaker of D 1.0.
>
> The "length"/"$" issue is not fundamental the same way that C's
> declaration syntax, Java's throw specifications, C++'s use of "<" and
> ">" for templates, and Mao Zedong's refusal to use a toothbrush are not
> fundamental. It will "just" go down in history as a huge embarrassment
> and a good resource for cheap shooters and naysayers. If I understand
> its genesis, it will also be a canonical example of why design by
> committee is bad.
>
> Andrei

I like the terseness of "$" but I'm willing to do away with it if it
really is that bad.  What I'm wondering, is how far do you think we
need to roll back the syntax, before it's "The Right Thing" (tm) again?

Do we really need to go all the way to myarray[0..myarray.length], or
can some intermediate solution work?

Kevin
December 21, 2006
Re: DMD 0.177 release
Thomas Kuehne wrote:
> enum S{ FOO }
> template Templ(S T) { }
> mixin Templ!(S.FOO) bar;
> 
> Do you consider S an keyword here?

You're right, it makes parsing dependent on the symbol table, breaking a 
nice property of D. Back to the drawing board.

Andrei
December 21, 2006
Re: DMD 0.177 release
Chris Nicholson-Sauls wrote:
> Benji Smith wrote:
>> Are there languages where this is currently possible?
> 
> C++, by returning a referance.

Perl 5 too.

Andrei
December 21, 2006
Re: DMD 0.177 release
Benji Smith wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> Let me illustrate further why ident is important and what solution we 
>> should have for it. Consider C's response to ident:
>>
>> #define IDENT(e) (e)
>>
>  > ...
>  >
>> ...leading to the following implementation of ident:
>>
>> auto ident(auto x) {
>>   return x;
>> }
> 
> I don't get it.
> 
> Why is it necessary (or even desirable) for functions to return lvalues?

Methods might want to return lvalues, but indeed the need is not 
overwhelming. (They could return pointers after all.) But the point is 
different. You want to have a grip on all types, and ident shows that 
you can't. For example, in current D you can't (barring a hack that I 
saw in a post around here) have a template that takes a function and 
creates one of the exact signature. That is a vastly useful and 
desirable thing to want; think e.g. of a function that memoizes any 
other function.


Andrei
December 21, 2006
Re: DMD 0.177 release [Length in slice expressions]
Derek Parnell wrote:
> On Wed, 20 Dec 2006 06:24:28 -0800, Andrei Alexandrescu (See Website For
> Email) wrote:
> 
>  
>> A small book could be written on just how bad language design is using 
>> "length" and "$" to capture slice size inside a slice expression. I 
>> managed to write two lengthy emails to Walter about them, and just 
>> barely got started. 
> 
> Please share your thoughts here if you can too.

Gladly; I dug my email and let me share a couple of excerpts.

---------

int length = 5;
int[] a = new int[length * 2];
int[] b = a[length .. length * 2];
int c = a[length - 1 .. (b[0 .. length])[0]);

In each of its uses, length has a different semantics. The behavior is 
well-defined for all cases, but nonintuitive and about as pleasant as 
nails on the blackboard.

Now D has a compile-time option to ban the "length" name in scopes in 
which the slice operator is used. That would render the example above 
illegal. There is also a rule that identifiers in nested scopes cannot 
mask one another. So length will be banned from *any* scope that nests a 
scope using a slice:

int length;
if (a) {
  foreach (b; c) {
    while (d) {
      switch (e) {
        case f: g = h[0 .. length - 1];
        ...
      }
    }
  }
}

This code will not compile. Worse, it *will* compile until you add the 
slice operation. Combining the two rules and taking them to their 
logical conclusion, any code using "length" is frail because there's 
always a risk that somebody might insert a slice, rendering the entire 
function uncompilable. What happened is that now "length" has become a 
backdoor-introduced keyword. Books will advise users to never use it 
even when it works, coding standards will ban it, language lawyers will 
use it to detract D, and users of other languages will smile 
condescendingly and stay with their languages.

There are a few ways out of it. "length" could be actually made a 
keyword. But even that one isn't very uniform, and steals yet another 
good identifier name.

Another way out of it is to ban "length" but stick with "$". But "$" has 
another bunch of problems. It's a special character used only once, and 
only in a very particular situation. There is no general concept 
standing behind its usage: it sticks out like a sore thumb. "$" isn't 
the last index in an array. It's that only when used inside a slice, and 
refers only to the innermost index of the array. Quite a waste of a 
special character out there, and to little usefulness.

But if we made "$" into an operator identifying the last element of 
_any_ array, which could refer to the last element of _the left-hand 
side_ array if we so want, then all of a sudden it becomes useful in a 
myriad of situations:

int i = a[$ - 1]; // get last element
int i = a[$b - 1]; // get a's element at position b.length - 1
if (a[$ - 1] == x) { ... }
if ($a > 0) { ... }
if ($a == $b) { ... }
swap(a[0], a[$ - 1]); // swap first and last element

---------------

Grammar for nullary/unary $:

---------------

I think I nailed down the way the count operator $ can work in a manner 
that's terse, expressive, and safe.

My basic goal is to enable the operator $ to be unary (applying to an 
array) to return its size, and also nullary (applying to nothing) to 
implicitly mean "fetch the size of the innermost array in the 
expression". So this code should work:

int[] foo;
foo[$ - 1]; // refers to foo's last element
foo[$foo - 1]; // same
int[][] bar;
bar[foo[$]]; // refers to bar indexed with foo's last element
bar[foo[$bar]]; // refers to bar indexed with foo's element at $bar

To insert my operator $ within D's grammar, go to the grammar page: 
http://www.digitalmars.com/d/expression.html$UnaryExpression and scroll 
down to Unary Expression. There, add the following rules:

UnaryExpression:
    PostfixExpression
    & UnaryExpression
    ... etc. etc. ...
    $ Identifier
    $ PostfixExpression . Identifier
    $ PostfixExpression ( )
    $ PostfixExpression ( ArgumentList )
    $ IndexExpression
    $ SliceExpression
    $ ArrayLiteral
    $ ( Expression )

Now a unary expression can be the $ operator followed by an identifier, 
a member access, a function call, an array access, or a slice expression 
(awesome! pick the size of the slice!), a literal array (for 
conformity), or a parenthesized expression. Perfect!

But we haven't yet filled the role of $ as a nullary operator. To do so, 
let's go in the grammar to 
http://www.digitalmars.com/d/expression.html$PrimaryExpression and 
append one more rule to it the PrimaryExpression rule:

PrimaryExpression:
    Identifier
    .Identifier
    ... etc. etc. ...
    $

Now the grammar is unambiguous and will properly distinguish unary and 
nullary uses of the $ operator.

This is more elegant than the current crap with "$" and "length" popping 
up. Besides, you can now use $ in many more places than inside []s. 
However, the grammar size does increase quite a bit, which is more fuss 
than I hoped for just one operator.

A simpler grammar would have been to simply allow:

UnaryExpression:
    PostfixExpression
    & UnaryExpression
    ... etc. etc. ...
    $ PostfixExpression

But this would have been ambiguous. If the compiler sees "$-1", then the 
bad grammar says that's a unary use of $ because -1 is a 
PostfixExpression. But that's not what we wanted! We wanted $ to be 
nullary. That's why I needed to put all the cases in UnaryExpression.



Andrei
December 21, 2006
Re: DMD 0.177 release [Length in slice expressions]
Andrei Alexandrescu (See Website For Email) wrote:
> Derek Parnell wrote:
>> On Wed, 20 Dec 2006 06:24:28 -0800, Andrei Alexandrescu (See Website For
>> Email) wrote:
>>
>>  
>>> A small book could be written on just how bad language design is 
>>> using "length" and "$" to capture slice size inside a slice 
>>> expression. I managed to write two lengthy emails to Walter about 
>>> them, and just barely got started. 
>>
>> Please share your thoughts here if you can too.
> 
> Gladly; I dug my email and let me share a couple of excerpts.
<snipped excerpts>

Wow, I understand it now. I only hope that at least 'length' will be 
deprecated before 1.0.

I like your dollars. I'm not so good with grammars, will your proposal 
also work for user defined types?
December 21, 2006
Re: DMD 0.177 release [Length in slice expressions]
Lutger wrote:
> Wow, I understand it now. I only hope that at least 'length' will be 
> deprecated before 1.0.
> 
> I like your dollars.

Well, just don't take'em away from my bank account :o).

> I'm not so good with grammars, will your proposal 
> also work for user defined types?

The plan is that $expression is rewritten into (expression).length. The
consistent thing to do is to make that into an onXyz() function, but I
don't find this name inconsistency jarring.


Andrei
December 21, 2006
Re: DMD 0.177 release [Length in slice expressions]
Bill Baxter wrote:
> After trying to write a multi-dimensional array class, my opinion is 
> that D slice support could use some upgrades overall.

I'd be very interested in looking at what you've come up with. With my 
own implementation of a multi-dimensional array type a couple of months 
ago, I came to the same conclusion. I posted about it in:

news://news.digitalmars.com:119/edrv0n$hth$1@digitaldaemon.com
http://www.digitalmars.com/d/archives/digitalmars/D/announce/4717.html

> What I'd like to see:
> 
> --MultiRange Slice--
> * A way to have multiple ranges in a slice, and a mix slice of and 
> non-slice indices:
>     A[i..j, k..m]
>     A[i..j, p, k..m]
(snip)
>      A[0..$,3..$]

Yes, I would too. It is quite frustrating having the syntax in the 
language but not being allowed to utilize it... :)

I work around this by instead using a custom slice syntax instead:

A[range(i,j), range(k,m)]
A[range(i,j), p, range(k,m)]
A[range(0,end), range(3..end)]
A[end-1, p % end]

Basicly, the transformation is:

$ => end
a..b => range(a,b)

I briefly described this in:
news://news.digitalmars.com:119/eft9id$2aq3$1@digitaldaemon.com

The resulting code becomes quite optimal without the need for a position 
dependent opLength type of operator, but handling all the cases puts a 
larger burden on the implementor of opIndex.

> The problem is that opSlice has to look like opSlice(T1 lo, T2 hi) right 
> now -- just two parameters (or zero).
[snip]
> Another solution is a built-in slice type.  Ranges like a..b would get 
> converted to slice instances automatically.  

Yes, this would be my suggestion too. Adding an opApply to one such 
built in range type would also have the nice side effect of allowing the 
syntactical sugar:

foreach(i; 5..10)

> --User Definable '$'--
[snip]
> One solution - make an opLength that gets called with the parameter 
> number in which the $ appears. 

Yes, that is probably the cleanest solution. And if no such 
opLength(int) overload exists, return the result of opLength() (or 
possibly .length)

/Oskar
December 21, 2006
Re: DMD 0.177 release [Length in slice expressions]
Andrei Alexandrescu (See Website For Email) wrote:
> 
> A simpler grammar would have been to simply allow:
> 
> UnaryExpression:
>     PostfixExpression
>     & UnaryExpression
>     ... etc. etc. ...
>     $ PostfixExpression
> 
> But this would have been ambiguous. If the compiler sees "$-1", then the 
> bad grammar says that's a unary use of $ because -1 is a 
> PostfixExpression. But that's not what we wanted! We wanted $ to be 
> nullary. That's why I needed to put all the cases in UnaryExpression.
> 

Nice post, and one heck of an argument!

FWIW, I advocated something similar during the last round of debates 
before the '$' operator was introduced.  What I wanted to see was '$' to 
become like 'this' within slice and array expressions, so that the 
issues regarding 'length' could be resolved.  In essence one could 
simply say '$.length' and mean 'the length of the current array':

b[0 .. $.length];
a[0 .. $.getIndexOf(';')];

So in essence, every use of '$' would be a 'nullary' operator - an alias 
if you will.

I'd imagine that extending things in this manner would simplify things 
grammatically while allowing for a wider category of uses.  However, it 
doesn't solve the issue that you brought up, and that I've quoted above.

c[$-1];

It looks like it should be an implicit cast of the '$' to a size_t 
(length), via it's use in an expression.  Any thoughts on this?

-- 
- EricAnderton at yahoo
15 16 17 18 19 20 21 22
Top | Discussion index | About this forum | D home