Jump to page: 1 2
Thread overview
comments on m..n array index syntax. make it m through n inclusive
Aug 16, 2001
Chris Friesen
Aug 17, 2001
Walter
Aug 17, 2001
Chris Friesen
Aug 17, 2001
Sheldon Simms
Jan 11, 2002
Mac Reiter
Jan 12, 2002
Pavel Minayev
Jan 12, 2002
Walter
Jan 14, 2002
Mac Reiter
Jan 14, 2002
Pavel Minayev
Jan 16, 2002
Roland
Jan 16, 2002
Pavel Minayev
Jan 17, 2002
Roland
Feb 15, 2002
DrWhat?
Feb 15, 2002
Pavel Minayev
August 16, 2001

On the whole, it looks pretty good.  I have already given my thoughts on generic programming, but I wanted to make a comment on your array index range notation.


Quoted from your document:
   In general, (a[n..m] op e) is defined as:

        for (i = n; i < m; i++)
            a[i] op e;


        s[] = t[];              the 3 elements of t[3] are copied into s[3]
        s[1..2] = t[0..1];      same as s[1] = t[0]
        s[0..2] = t[1..3];      same as s[0] = t[1], s[1] = t[2]


While I can see how this came from C/C++, I think it's very confusing.  I think it would make a whole lot more sense to read the [m..n] notation as being the range of indices which are covered.  This would then be identical behaviour to math programs such as maple.  Plus, it has the added advantage of being syntactically similar to accessing a single array element.

Thus,

a[1] = b[1];       obvious
a[1..3] = b[1..3];      same as a[1]=b[1], a[2]=b[2], a[3]=b[3]
a[1..3] = b[6..8];      same as a[1]=b[6], a[2]=b[7], a[3]=b[8]

Translating to english, the m..n notation converts to "take elements n through m" which I think makes a lot more sense then "take elements n through m-1".

As a final piece of syntactical sugar, what about something like

a[1,4,7] = b[3,2,8];   same as a[1]=b[3], a[4]=b[2], a[7]=b[8]

where you specify a list of indices to copy?


Chris





-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com
August 17, 2001
"Chris Friesen" <cfriesen@nortelnetworks.com> wrote in message news:3B7C20B2.E2B8595F@nortelnetworks.com...
> On the whole, it looks pretty good.  I have already given my thoughts on
generic
> programming, but I wanted to make a comment on your array index range
notation.
> Quoted from your document:
>    In general, (a[n..m] op e) is defined as:
>         for (i = n; i < m; i++)
>             a[i] op e;
>         s[] = t[];              the 3 elements of t[3] are copied into
s[3]
>         s[1..2] = t[0..1];      same as s[1] = t[0]
>         s[0..2] = t[1..3];      same as s[0] = t[1], s[1] = t[2]
> While I can see how this came from C/C++, I think it's very confusing.  I
think
> it would make a whole lot more sense to read the [m..n] notation as being
the
> range of indices which are covered.  This would then be identical
behaviour to
> math programs such as maple.  Plus, it has the added advantage of being syntactically similar to accessing a single array element.
>
> Thus,
>
> a[1] = b[1];       obvious
> a[1..3] = b[1..3];      same as a[1]=b[1], a[2]=b[2], a[3]=b[3]
> a[1..3] = b[6..8];      same as a[1]=b[6], a[2]=b[7], a[3]=b[8]
>
> Translating to english, the m..n notation converts to "take elements n
through
> m" which I think makes a lot more sense then "take elements n through
m-1".

That's a good point. But I am so used to writing loops that go from n to m-1, that diverging from that will cause a lot of inadvertant bugs.


> As a final piece of syntactical sugar, what about something like a[1,4,7] = b[3,2,8];   same as a[1]=b[3], a[4]=b[2], a[7]=b[8] where you specify a list of indices to copy?

That does work pretty neat, but are there enough uses of this to justify the feature?



August 17, 2001
Walter wrote:
> "Chris Friesen" <cfriesen@nortelnetworks.com> wrote in message

> > Translating to english, the m..n notation converts to "take elements n
> through
> > m" which I think makes a lot more sense then "take elements n through
> m-1".
> 
> That's a good point. But I am so used to writing loops that go from n to m-1, that diverging from that will cause a lot of inadvertant bugs.

Sure, but then you just write
a[m..n-1] = b[m..n-1]

Doesn't that make more sense than writing

a[0..n] = b[0..n]

when you only have n elements to begin with?  Since its a whole new syntax anyways, I would like to make it something logical and obvious to a new user. Thinking back to my old programming days when loops were "for i = 1 to 10 do"...

I think that having it obvious in the statemnet what the range of values is will end up being clearer in the end.  I think the concept of ranges would be useful in switch statements as well, but I'll address that in another thread.

> > As a final piece of syntactical sugar, what about something like a[1,4,7] = b[3,2,8];   same as a[1]=b[3], a[4]=b[2], a[7]=b[8] where you specify a list of indices to copy?
> 
> That does work pretty neat, but are there enough uses of this to justify the feature?

I kind of doubt it.  Like I said, syntactical sugar.

Chris
August 17, 2001
Im Artikel <9lhqng$d9n$1@digitaldaemon.com> schrieb "Walter" <walter@digitalmars.com>:

> "Chris Friesen" <cfriesen@nortelnetworks.com> wrote in message news:3B7C20B2.E2B8595F@nortelnetworks.com...
>> On the whole, it looks pretty good.  I have already given my thoughts on
> generic
>> programming, but I wanted to make a comment on your array index range
> notation.
>> Quoted from your document:
>>    In general, (a[n..m] op e) is defined as:
>>         for (i = n; i < m; i++)
>>             a[i] op e;
>>         s[] = t[];              the 3 elements of t[3] are copied into
> s[3]
>>         s[1..2] = t[0..1];      same as s[1] = t[0] s[0..2] = t[1..3];
>>             same as s[0] = t[1], s[1] = t[2]
>> While I can see how this came from C/C++, I think it's very confusing.
>> I
> think
>> it would make a whole lot more sense to read the [m..n] notation as being
> the
>> range of indices which are covered.  This would then be identical
> behaviour to
>> math programs such as maple.  Plus, it has the added advantage of being syntactically similar to accessing a single array element.
>>
>> Thus,
>>
>> a[1] = b[1];       obvious
>> a[1..3] = b[1..3];      same as a[1]=b[1], a[2]=b[2], a[3]=b[3] a[1..3]
>> = b[6..8];      same as a[1]=b[6], a[2]=b[7], a[3]=b[8]
>>
>> Translating to english, the m..n notation converts to "take elements n
> through
>> m" which I think makes a lot more sense then "take elements n through
> m-1".
> 
> That's a good point. But I am so used to writing loops that go from n to m-1, that diverging from that will cause a lot of inadvertant bugs.

I'm very used to writing loops like that too, but this notation in the D document really confused me at first. I think it's completely counterintuitive and agree with Chris 100%.

-- 
Sheldon Simms / sheldon@semanticedge.com
August 17, 2001
Yes, this is counter intuitive. But it should not be in the language in the first place. The definition of something like indexing should be in the library.

What if I want range-checked indexes. What if I don't want them. What if I want a stride (that is, elements are A[0], A[4], A[8], but A[1] and A[0] are the same.) What if I want a different base (indexes in range 1..100 rather than 0..99)?


Chris Friesen wrote:

> On the whole, it looks pretty good.  I have already given my thoughts on generic programming, but I wanted to make a comment on your array index range notation.
>
> Quoted from your document:
>    In general, (a[n..m] op e) is defined as:
>
>         for (i = n; i < m; i++)
>             a[i] op e;
>
>         s[] = t[];              the 3 elements of t[3] are copied into s[3]
>         s[1..2] = t[0..1];      same as s[1] = t[0]
>         s[0..2] = t[1..3];      same as s[0] = t[1], s[1] = t[2]
>
> While I can see how this came from C/C++, I think it's very confusing.  I think it would make a whole lot more sense to read the [m..n] notation as being the range of indices which are covered.  This would then be identical behaviour to math programs such as maple.  Plus, it has the added advantage of being syntactically similar to accessing a single array element.

I agree there. This notation is confusing to the extreme. If you want to have this behavior, use the mathematical notation:

    s[1..2[ = t[0..1[

For mathematicians, this means 1 to 2, 2 excluded. I can see that this might be difficult to parse, thou...


> As a final piece of syntactical sugar, what about something like
>
> a[1,4,7] = b[3,2,8];   same as a[1]=b[3], a[4]=b[2], a[7]=b[8]

This is problematic in the presence of multi-dimensional arrays.



January 11, 2002
(First, pardon me if the array slicing syntax debate is over.  I just found out about D a few days ago, and just started looking at the spec seriously today.)

On Thu, 16 Aug 2001 15:36:18 -0400, Chris Friesen <cfriesen@nortelnetworks.com> wrote:

>Quoted from your document:
>   In general, (a[n..m] op e) is defined as:
>
>        for (i = n; i < m; i++)
>            a[i] op e;
>
>
>        s[] = t[];              the 3 elements of t[3] are copied into s[3]
>        s[1..2] = t[0..1];      same as s[1] = t[0]
>        s[0..2] = t[1..3];      same as s[0] = t[1], s[1] = t[2]
>
>
>While I can see how this came from C/C++, I think it's very confusing.  I think

Just to throw another vote in here - when I first read the description of slicing in the D spec, I assumed it was a typo.  I then read a later line where a slice was described as:

	int a[10];
	int b[]

	b = a;
	b = a[];
	b = a[0 .. a.length];

This explains WHY the syntax is the way it is, but I must strenuously agree that it does not justify it.  Since "off by one" errors are second only to pointer handling errors in programming, any new syntax should be very clear and intuitive in its use.

I would also agree that some form of exclusive bound would be acceptable, though hard to parse:

		b = a[0 .. a.length-1];
replaced by:
		b = a[0 .. a.length);

If the currently described exclusive ending bound remains in D, I would simply have to remove the slicing syntax from my set of tools, because I would always get it wrong -- I've switched from Basic to C/C++ enough times to know that much.

Mac Reiter
January 12, 2002
"Mac Reiter" <reiter@nomadics.com> wrote in message news:3c3f702f.27813613@news.digitalmars.com...

> This explains WHY the syntax is the way it is, but I must strenuously agree that it does not justify it.  Since "off by one" errors are second only to pointer handling errors in programming, any new syntax should be very clear and intuitive in its use.
...
> If the currently described exclusive ending bound remains in D, I would simply have to remove the slicing syntax from my set of tools, because I would always get it wrong -- I've switched from Basic to C/C++ enough times to know that much.

I thought the same when I argued on the topic.
Now, after I used it for a while, I have to agree with Walter that
end-exclusive form is what you need in 90% cases. It's not so
counter-intuitive as one might think, in fact, I didn't yet make
any mistakes with this syntax so far! Just try to write something
using slices heavily and you'll see it for yourself....




January 12, 2002
"Pavel Minayev" <evilone@omen.ru> wrote in message news:a1o00d$31ed$1@digitaldaemon.com...
> "Mac Reiter" <reiter@nomadics.com> wrote in message news:3c3f702f.27813613@news.digitalmars.com...
>
> > This explains WHY the syntax is the way it is, but I must strenuously agree that it does not justify it.  Since "off by one" errors are second only to pointer handling errors in programming, any new syntax should be very clear and intuitive in its use.
> ...
> > If the currently described exclusive ending bound remains in D, I would simply have to remove the slicing syntax from my set of tools, because I would always get it wrong -- I've switched from Basic to C/C++ enough times to know that much.
>
> I thought the same when I argued on the topic.
> Now, after I used it for a while, I have to agree with Walter that
> end-exclusive form is what you need in 90% cases. It's not so
> counter-intuitive as one might think, in fact, I didn't yet make
> any mistakes with this syntax so far! Just try to write something
> using slices heavily and you'll see it for yourself....

Look at the string.d code for examples!


January 14, 2002
On Sat, 12 Jan 2002 03:30:04 +0300, "Pavel Minayev" <evilone@omen.ru> wrote:

I apologize up front for the length of this posting.  Unfortunately, I do not have the time necessary to edit it down while maintaining the points I am trying to make.

>"Mac Reiter" <reiter@nomadics.com> wrote in message news:3c3f702f.27813613@news.digitalmars.com...
>
>> This explains WHY the syntax is the way it is, but I must strenuously agree that it does not justify it.  Since "off by one" errors are second only to pointer handling errors in programming, any new syntax should be very clear and intuitive in its use.
>...
>> If the currently described exclusive ending bound remains in D, I would simply have to remove the slicing syntax from my set of tools, because I would always get it wrong -- I've switched from Basic to C/C++ enough times to know that much.
>
>I thought the same when I argued on the topic.
>Now, after I used it for a while, I have to agree with Walter that
>end-exclusive form is what you need in 90% cases. It's not so
>counter-intuitive as one might think, in fact, I didn't yet make
>any mistakes with this syntax so far! Just try to write something
>using slices heavily and you'll see it for yourself....

The mere thought causes me to wake up at night in a cold sweat from maintenance nightmares.

How many times do C programmers blow up stacks and heaps because they forget to allocate enough space for the NULL at the end of a C string?

How many programmers are going to assume they need the -1 on the final bound and end up one element short all the time?

How many programmers are going to think they copied the entire array and wonder why their code explodes or throws an exception when they try to access that last element?

Any decision will work for people who program exclusively in the given language.  Java people got used to January being 0 and December being 11, eventually.  But a lot of programs got bad dates and lots of exceptions thrown regarding December, too.  Experienced C programmers don't have problems remembering that scanf needs the address for all variable types EXCEPT strings (char arrays), but *EVERY* new and some intermediate C programmers have blown up programs because of it.

This form saves typing "-1" 90% of the time.  But it generates a giant blind spot when you have an off-by-one error the other 10% of the time, because when you're trying to do the code review you look at it and it *looks* like it does the right thing, but in reality it is leaving the last element off.  Code should do what it says.

I don't mind an end-exclusive form.  I just don't think it should use the end-INCLUSIVE syntax.  Most of us had some math thrown at us along the way, and we know that [] includes both endpoints and [) does not include the last endpoint.  If nothing else, seeing the ) at the end of the range will make you stop and think about what you are looking at.

	a[5..7] should be a[5], a[6], and a[7]
	a[5..7) should be a[5] and a[6]

If your newsreader font is really small, the second line used a closing parenthesis instead of a closing square bracket.

This might be difficult to parse, but it really shouldn't be.  I would expect some kind of "grouping stack" that keeps track of the most recent outstanding opening symbol.  If that is the case, all you have to do is accept a closing parenthesis as a valid match to an opening square bracket.

Even conversion of existing code to the new format *shouldn't* be too hard (says the non-compiler-writer, possibly with no ground to stand on).  Make an intermediate version of the compiler that accepts either form, and treats both of them as end-exclusive.  Have that compiler dump out a new file with the closing ] converted to a ).  Users can use this compiler to convert code files and prepare for the new version of the compiler that supports end-exclusive AND end-inclusive. A conversion like this needs to be done as early as possible, because if you think it will be hard now, imagine how hard it will be as more and more code accumulates.

Alternatively, some kind of #pragma-like device could be used, but then you have to choose which behavior is default, and code reviewers have to check for #pragmas to understand the code they are reading, and it all just gets nasty.

Ultimately, since D is Mr. Bright's language, it will do whatever he wants it to do.  I do know that I have had to do a LOT of maintenance programming and code reviews of other people's code, and that I rarely get the opportunity to work exclusively in one language for extended periods of time.  My comments about blind spots and the principle of least astonishment -- A system and its commands should behave the way most people would predict, that is, the system should operate with "least astonishment" -- come from experience and practice.  I realize that Mr. Bright also has tremendous experience, having viewed his substantial list of commercial programming successes.  But if [] remains end-exclusive, and if our company eventually started using it for production work, the style policy would have to require that array slicing not be used unless no alternative was available, and require specific boilerplate commentary when it was used.  This would be necessary to avoid astonishing new programmers and/or reviewers who came across the code.  I already do similar things when I need to use <= in a for loop instead of <, especially if it is a nested loop and one loop uses < but the other uses <=.  That looks like a bug, so I comment it to explain why it is that way.  Every use of array slicing looks like a bug to me, so every use would require a comment explaining its purpose.

Again, I apologize for the length of this posting.
Mac

January 14, 2002
"Mac Reiter" <reiter@nomadics.com> wrote in message news:3c42f52d.258466765@news.digitalmars.com...

> How many programmers are going to assume they need the -1 on the final bound and end up one element short all the time?

These were my words... earlier.

> This form saves typing "-1" 90% of the time.  But it generates a giant blind spot when you have an off-by-one error the other 10% of the time, because when you're trying to do the code review you look at it and it *looks* like it does the right thing, but in reality it is leaving the last element off.  Code should do what it says.

My reply is simple: RTFM first. Always!
On other hand, Walter should have probably written it in red and
bold, all capital: "array slices are end-exclusive!", at the beginning
of the reference =)




« First   ‹ Prev
1 2