Thread overview
Yet another include/exclusive slicing thread
Oct 22, 2004
Tomer Altman
Oct 23, 2004
Russ Lewis
Oct 23, 2004
Sjoerd van Leent
Oct 31, 2004
Walter
Oct 24, 2004
David Medlock
Oct 24, 2004
Derek
Oct 24, 2004
Ben Hinkle
Oct 25, 2004
David Medlock
October 22, 2004
First of all, I know and understand that things are hard to change, that Walter has absolute authority over the language specification and the fact this topic has been brought up many times already. However I believe (hope) that my following post will pour more light on the subject and help turn the matters to the best.

WARNING: LONG POST!

The main advantage of having an inclusive range for start and ending indices for
slicing is the intuitivity and clearness of what the programmer meant as ".." is
a universal denotation of a "range", symmetric:
A[0..2] affects cells 0,1,2.

Now points against it, which for each I will give a counter opinion:
1. To denote the slice is until the end of the array, one must use (length - 1).

While this is correct numerically, one logically thinks about a range in terms
of start and end and not in terms of "length". Therefore if a new property with
a name such as "last", "end", "maxIndex" could be introduced and will be equal
to (length - 1). In this case, A[0..last] can be written, instead of
a[0..length-1], keeping the clarity.
Adding a "last" property can also help in case one wants to simply change the
last cell in the array:
A[last] = 5;


2. Programmers are used to the for(i=0;i<length;++i) idiom.

While it is true that right now this is the case, I believe it became this way from historic reasons of shorthand instead of writing code such as "for(i=0;i<=length-1;++i)" which is certainly confusing and long.

From my experience, even tho "for(i=0;i<length;++i)" is better for simple loops, almost always whenever I have a more complex loop that goes from a variable index "n" till a variable index "m", I prefer using the <= notation as it gives a clear idea until where the loop iterates.

I then realized that the reason why the <= notation isn't the common one is that the "length" of the array doesn't actually fit in a loop over indices, just a number from which one can denote the maximum index.

Since the system is structured, adding a "last" property as in point 1 would
diminish the shorthand motive for using "for(i=0;i<length;++i)". Instead it
could be "for(i=0;i<=last;++i)"

Moreover, in case you DO refer to length, the usage of ".." is counter-intuitive. As other people noted, an inclusion of a index and length notation can be useful regardless of a start..end notation.

The reasoning behind D is to make a language which is like C and C++, but is better designed, reducing the amount of thought one should use to read/write code which does what s/he intuitively thinks of. In this case, it means removing the need for the exclusive < notation and instead using a symmetric <= notation using a new language feature ("lastIndex").

Note: In both of my points, the code can be compiled by translating the new version to the old version and compiling it just the same, efficiency isn't hindered.


3. Creating a zero lengthed array should be easy, so A[0..0] is a good way.

If a certain index and length notation would be added (for example with the syntax A[4#10], since "#" implies a number of elements), then creating an 0 sized array is trivial, A[0#0] would work.


4. Thousands of lines of D code have already been written, we shouldn't change it now for legacy reasons.

While this is true for the reason C and C++ stayed backwards compatible, the
proportions are completely different, thousands versus hundreds of millions.
This caused many of the faults these languages now suffer from and are the
reason new languages (Java, D, etc.) were designed.
The language is still fresh and evolving. Changing it now might prove to much
more fruitful when it gets popular and actually gets to millions of lines of
complex code written.

On this note, I suggest adding a mandatory "language version" declaration at the header of source files so that the compiler can act accordingly as the language and libraries evolve. For example, "version 0.56" has a certain feature, while "version 0.6" still has that feature but maybe has a different meaning (such as this post's topic). That way even if the language changes expressions, old code can be compiled appropriatly. Ofcourse if the versioning doesn't appear, it will fall off to the version just before versioning is added.


I hope very much that this will help making D a better language for all of us.


October 23, 2004
I used to be a vocal proponent of this very same thing.  Walter's response to me, basically, was that it was designed that way because it made the code easier to write, more often.  Basically, he said that it's more common for you to know the index of the element after the last than it was for you to know the index of the last one.  For instance, say you want to slice an array into two pieces, and you know the index where it should start.  With Walter's current design, the code looks like this:
	char[] array = <whatever>;
	int sliceIndx = <whatever>;
	char[] slice1 = array[0..sliceIndx];
	char[] slice2 = array[sliceIndx..length];
With inclusive ranges, you have to add two extra "-1"s to the code:
	char[] slice1 = array[0..sliceIndx-1];
	char[] slice2 = array[sliceIndx..length-1];

At the time, I didn't believe him.  However, in my D experience since, I've had to say that I  think he is right.  It is far more common to use non-inclusive ranges than inclusive once.

So, I'm a convert.  Yes, it looks confusing, and takes a little to learn.  But it's probably the best way to do things after all.

October 23, 2004
Russ Lewis wrote:
> I used to be a vocal proponent of this very same thing.  Walter's response to me, basically, was that it was designed that way because it made the code easier to write, more often.  Basically, he said that it's more common for you to know the index of the element after the last than it was for you to know the index of the last one.  For instance, say you want to slice an array into two pieces, and you know the index where it should start.  With Walter's current design, the code looks like this:
>     char[] array = <whatever>;
>     int sliceIndx = <whatever>;
>     char[] slice1 = array[0..sliceIndx];
>     char[] slice2 = array[sliceIndx..length];
> With inclusive ranges, you have to add two extra "-1"s to the code:
>     char[] slice1 = array[0..sliceIndx-1];
>     char[] slice2 = array[sliceIndx..length-1];
> 
> At the time, I didn't believe him.  However, in my D experience since, I've had to say that I  think he is right.  It is far more common to use non-inclusive ranges than inclusive once.
> 
> So, I'm a convert.  Yes, it looks confusing, and takes a little to learn.  But it's probably the best way to do things after all.
> 

I think that a solution to this should be possible. Why not let people decide themselves to use inclusive or exclusive notation for slicing. The following should be possible to implement:

char[] slice1 = array[0 .. length];	// The same thing
char[] slice2 = array[start : end];	// Different operator

Could such a solution be the one you're looking for?

Regards,
Sjoerd
October 23, 2004
Tomer Altman wrote:

> While this is true for the reason C and C++ stayed backwards compatible, the
> proportions are completely different, thousands versus hundreds of millions.
> This caused many of the faults these languages now suffer from and are the
> reason new languages (Java, D, etc.) were designed.
> The language is still fresh and evolving. Changing it now might prove to much
> more fruitful when it gets popular and actually gets to millions of lines of
> complex code written.

Java chose exclusive ranges too, if that helps...

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#substring(int,%20int)
> public String substring(int beginIndex, int endIndex)
>   Returns a new string that is a substring of this string.
>   The substring begins at the specified beginIndex and extends to  the character
>   at index endIndex - 1. Thus the length of the substring is endIndex-beginIndex.

Some of us think that it's a *good thing*, just as
we like arrays to start from zero and not from one ?

--anders
October 24, 2004
I think the whole issue stems from the ridiculous notion that we start counting things at zero in programming languages.

Its completely counterintuitive unless you have been writing compilers and you know that:
char *p;
p[2] == *(p + 2)


With one based indexes, then the inclusive idea has more merit.
No need for a[length-1], just a[length] for the last item.

for( i=1; i<=length; i++ ) ... looks more readable to me than
for( i=0; i<length; i++)

This is definitely *not* a critique of Walter by any means, since he has made C familiarity a priortity.  Its more in legacy C this has come to pass.


My 0.02$ spend it wisely.

October 24, 2004
I see where you are coming from - Fortran and MATLAB both include the endpoint in slices (and they both use 1-based indexing instead of 0-based). Non-programmers tend to like that more than C-style. One area where including the endpoint makes sense is with custom containers like a sorted associative array - why should a slice of such an array need to know the key for the element after the desired slice? Similarly in a linked list the slice from one node to another should probably include the endpoint. The difference becomes important when items are added to the list - do they go into the slice or after the slice? In MinTL slicing by integers will exclude the endpoint and slicing by key or node will include the endpoint.

I think people in this newsgroup are a bit worn out right now, though, so I don't expect this topic to get much debate.

-Ben
October 24, 2004
David Medlock wrote:

> With one based indexes, then the inclusive idea has more merit.
> No need for a[length-1], just a[length] for the last item.
> 
> for( i=1; i<=length; i++ ) ... looks more readable to me than
> for( i=0; i<length; i++)

I think you meant: "for i := 1 to length do" as readable :-)

Since D uses C-style arrays, its exclusive indexing makes sense ?
(just as inclusive indexing would make sense with Pascal arrays)

And of course, for array loops, the "foreach" is excellent...

--anders
October 24, 2004
On Sat, 23 Oct 2004 21:38:46 -0400, David Medlock wrote:

> I think the whole issue stems from the ridiculous notion that we start counting things at zero in programming languages.
> 
> Its completely counterintuitive unless you have been writing compilers
> and you know that:
> char *p;
> p[2] == *(p + 2)
> 
> 
> With one based indexes, then the inclusive idea has more merit. No need for a[length-1], just a[length] for the last item.
> 
> for( i=1; i<=length; i++ ) ... looks more readable to me than
> for( i=0; i<length; i++)
> 
> This is definitely *not* a critique of Walter by any means, since he has made C familiarity a priortity.  Its more in legacy C this has come to pass.
> 
> 
> My 0.02$ spend it wisely.

I'm with you here too. I know that D's heritage does not permit it to use 1-based indexing so I'm not debating its pros and cons here.

I think of 0-indexing as not really indexes at all but offsets to the beginning of the element. I've been programming for more than 25 years and a large part of that is with C, and yet 1-based indexing always seems more natural to me. I now do a lot of programming with Euphoria and with Progress, both which use 1-based indexing and it is just easier to read/comprehend and explain to normal people (not programmers!).

-- 
Derek
Melbourne, Australia
October 25, 2004
Ben Hinkle wrote:
> I see where you are coming from - Fortran and MATLAB both include the
> endpoint in slices (and they both use 1-based indexing instead of
> 0-based). Non-programmers tend to like that more than C-style. One
> area where including the endpoint makes sense is with custom
> containers like a sorted associative array - why should a slice of
> such an array need to know the key for the element after the desired
> slice? Similarly in a linked list the slice from one node to another
> should probably include the endpoint. The difference becomes important
> when items are added to the list - do they go into the slice or after
> the slice? In MinTL slicing by integers will exclude the endpoint and
> slicing by key or node will include the endpoint.
> 
> I think people in this newsgroup are a bit worn out right now, though,
> so I don't expect this topic to get much debate.
> 
> -Ben

I like the (often downtrodden) pascal language a lot, because it allows you to set the range of your array.

I found some old Delphi Code I wrote like 2 years ago and It was perfectly readable.

-dm
October 31, 2004
"Sjoerd van Leent" <svanleent@wanadoo.nl> wrote in message news:cldcc1$2hue$1@digitaldaemon.com...
> I think that a solution to this should be possible. Why not let people decide themselves to use inclusive or exclusive notation for slicing. The following should be possible to implement:
>
> char[] slice1 = array[0 .. length]; // The same thing char[] slice2 = array[start : end]; // Different operator
>
> Could such a solution be the one you're looking for?

While that would technically work, I suspect that there would be constant confusion over which was which.