Thread overview | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
November 08, 2002 Slicing | ||||
---|---|---|---|---|
| ||||
Hi, As posted before, Im working on a front end to zlib using the stream framework provided in phobos. I successfully managed to get gnu’s zlib library working using Digital Mars’s compiler on Win2k. One can assume lots of binary array manipulation when dealing with compression and streams, however I have constantly got strange results depending on the approach I used for distributing memory during compression. When using a static array and repeatedly calling the C library functions, the less memory I used the more my byte count was off (exactly one byte for each call). After several hours of research, I determined this was due to the method of copying I chose… slicing. My understanding of arrays (I come from a strong C background) for D was that they are 0 based as in C and that slicing was done based on the array operations as defined in the snippet I copied into the posting. I determined that when utilizing slicing operations, arrays are in fact both 0 and 1 based for the upper bound but always 0 based for the lower bound? So I decided to write a quick test application (below) to test the theory and, well, the results are below. So, can anybody tell me whether this is intentional, just the way slicing is really meant to be, and if so, what are the rules for the bounds of arrays, because one byte can through off a cyclic redundancy check pretty quickly. This is the first time I’ve actually used slicing in my work so maybe I am just dumb to the way it’s supposed to work. My two cents, just keep it 0 based all the way! Not trying to flame, just trying to get it right, Karim Sharif snippet from D lang. spec; Array Operations In general, (a[n..m] op e) is defined as: for (i = n; i < m; i++) a[i] op e; So, for the expression: a[] = b[] + 3; the result is equivalent to: for (i = 0; i < a.length; i++) a[i] = b[i] + 3; simple test application to test theory (test.d); import c.stdio; void main(){ // regular char array char [] a = "This is a test string"; char [] b; b = a[0..a.length]; puts(b); b = a[]; puts(b); b = a[0..21]; <- But then why would this work ? puts(b); b = a[0..20]; puts(b); puts(&a[20]); puts(&a[21]); <- This would be the over bound error } compiled and produced output of cmd C:\dmd\src\dzlib>dmd test.d link short,,,user32+kernel32/noi C:\dmd\src\dzlib>test This is a test string This is a test string This is a test string This is a test string g Error: ArrayBoundsError short.d(18) |
November 08, 2002 Re: Slicing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Karim Sharif | In article <aqgk59$19l2$1@digitaldaemon.com>, Karim Sharif says... >// regular char array >char [] a = "This is a test string"; >char [] b; I think your problem with this example lies in the fact that D will null terminate string constants. |
November 08, 2002 Re: Slicing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Karim Sharif | >After several hours of research, I determined this was due to the method of copying I chose… slicing. My understanding of arrays (I come from a strong C background) for D was that they are 0 based as in C and that slicing was done based on the array operations as defined in the snippet I copied into the posting.
First off, I didn't look terribly closely at your code (at work and in a hurry). So if the following doesn't apply, I apologize.
Keep in mind that the slicing syntax uses a "half open" range. The right hand side is actually non-inclusive. To get away from the pseudo-mathematician terms and get concrete:
array[5..10]
DOES contain a[5], a[6], a[7], a[8], and a[9]
does NOT contain a[10]
I think that would explain the feeling that the slice is 1 based for the upper bound. It isn't. It is 0 based, it just doesn't include the upper bound.
I personally hate this syntax. I find it to be misleading purely for the sake of convenience. But I've had the week-long war over it and lost, so that's just the way it is. It pretty much means that I won't ever use slices, but that's OK, I suppose.
The precise reason why I hate this syntax is because, when discussing ranges, there is an old, and amazingly well established syntax for describing ranges. We leaned it in grade school when we were learning the number line:
[5..10) means 5,6,7,8,9
[5..10] means 5,6,7,8,9,10
(5..10] means 6,7,8,9,10
(5..10) means 6,7,8,9
I don't care whether or not D supports all 4 forms -- some of those would be impossible to distinguish from function calls without taking symbol table context into account. But D ONLY supports the first version, semantically, so I would prefer if it used the accepted standard way of expressing those semantics. And since the lead symbol is still '[', it should still be easy to parse and lex out.
Mac
|
November 08, 2002 Re: Slicing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Patrick Down | Thanks, nice try, however… Although my example code used a char[] the class I am working on only uses byte[](except for the toString() method, but Im not currently using that), and the phenomenon continues. I can only see this happening in char * or char[] casts because the compiler is trying to make up for the fact that D strings and C strings are not really the same (and my code doesn’t use any constants in this respect either) I would have been happy to post the entire file, but that seemed a little excessive in terms of asking readers to drudge through it all, rather pass me an email and Ill send the code to you if your really interested. Thanks for the thought though, Karim Karim@SharifClan.com In article <aqgltd$1bi3$1@digitaldaemon.com>, Patrick Down says... > >In article <aqgk59$19l2$1@digitaldaemon.com>, Karim Sharif says... >>// regular char array >>char [] a = "This is a test string"; >>char [] b; > >I think your problem with this example lies in the >fact that D will null terminate string constants. > > > |
November 08, 2002 Re: Slicing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mac Reiter | > array[5..10]
> DOES contain a[5], a[6], a[7], a[8], and a[9]
> does NOT contain a[10]
Hi,
I'm a new user..
anyway I do agree you, I found this syntax quite strange and counter
intuitive...
it took me twice as much as it should be to understand this syntax..
I found this is very different from
a[N] which include a[0], a[1], ... a[N-1].
a[i..j] should, in my mind, either countain all element from i to j inclusive or begin at i and include j elements....
I think more voice should complain about that....
|
November 09, 2002 Re: Slicing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mac Reiter |
>array[5..10]
>
>DOES contain a[5], a[6], a[7], a[8], and a[9]
>
>does NOT contain a[10]
>
>I personally hate this syntax.
Yeah, it really stinks badly. I have to open my window now...
Mark
|
November 09, 2002 Re: Slicing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lloyd Dupont | We did, and it got vetoed by Walter. Check the older threads. Sean "Lloyd Dupont" <lloyd@galador.net> wrote in message news:aqhfjn$27n6$1@digitaldaemon.com... > > array[5..10] > > DOES contain a[5], a[6], a[7], a[8], and a[9] > > does NOT contain a[10] > Hi, > > I'm a new user.. > anyway I do agree you, I found this syntax quite strange and counter > intuitive... > it took me twice as much as it should be to understand this syntax.. > > I found this is very different from > a[N] which include a[0], a[1], ... a[N-1]. > > > a[i..j] should, in my mind, either countain all element from i to j inclusive or begin at i and include j elements.... > > I think more voice should complain about that.... > > |
November 09, 2002 Re: Slicing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lloyd Dupont | I actually think that it is right and expected it to work as it does, because D has zero based arrays. it is a matter of convenience. int[10] a; // creates an array of ten items, 0..(10-1) a[n..n+len] creates a slice of length 'len' much the same as java.lang.String's substring method. its just another one of those "depends who you are" questions that separate programmer from each other and programmers from mathematicians. along with should arrays start from index 0 or 1, should post-increment operators be in the language "Lloyd Dupont" <lloyd@galador.net> wrote in message news:aqhfjn$27n6$1@digitaldaemon.com... > > array[5..10] > > DOES contain a[5], a[6], a[7], a[8], and a[9] > > does NOT contain a[10] > Hi, > > I'm a new user.. > anyway I do agree you, I found this syntax quite strange and counter > intuitive... > it took me twice as much as it should be to understand this syntax.. > > I found this is very different from > a[N] which include a[0], a[1], ... a[N-1]. > > > a[i..j] should, in my mind, either countain all element from i to j inclusive or begin at i and include j elements.... > > I think more voice should complain about that.... > > |
November 11, 2002 Re: Slicing (new argument, I think) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Wynn | In article <aqjl6t$214u$1@digitaldaemon.com>, Mike Wynn says... > >I actually think that it is right and expected it to work as it does, >because D has zero based arrays. >it is a matter of convenience. >int[10] a; // creates an array of ten items, 0..(10-1) >a[n..n+len] creates a slice of length 'len' >much the same as java.lang.String's substring method. But if what interests you is creating a slice based on its length, why not: a[n:len]; a[n#len]; // arguably better - kinda looks like a +, also has denotation of // "number of things" That saves the extra "n+" part. It also focuses on the length of the slice, rather than the endpoints, so it is a closer match to what all of the people who like the current slicing syntax seem to use. What bothers me is that ".." denotes a range, not a length, so you shouldn't argue that "it works well for lengths". If a length-based syntax (like one of the above) is introduced, I would have absolutely *no* complaints, since it would be doing what it is supposed to do. But ranges have their own syntax (".."), and are orthogonal to concerns of length or of 0 or 1 based arrays. The other thing that bothers me is that the justification for this slicing system *always* comes back as "it works well for lengths". Which means that novices are going to be taught that they can pull out a subarray of length n with the syntax: a[0..n] and they're going to misunderstand what really happened. Because the explanation will focus on length, and how a substring of length n in a 0 based array will not include [n] itself, they will start thinking in terms of lengths, not ranges. Then, when they need to extract a subarray from further into the array, they're going to try: a[5..n] and not understand why it failed. They didn't see the "full" version of the first syntax: a[0..0+n] because *nobody* is going to write that. So they weren't aware of having to add the starting location to the length to get what they wanted. I vote for switching to length-based slices, with the a[start#len] syntax. Anyone else in favor? Mac > >its just another one of those "depends who you are" questions that separate programmer from each other and programmers from mathematicians. along with should arrays start from index 0 or 1, should post-increment operators be in the language > >"Lloyd Dupont" <lloyd@galador.net> wrote in message news:aqhfjn$27n6$1@digitaldaemon.com... >> > array[5..10] >> > DOES contain a[5], a[6], a[7], a[8], and a[9] >> > does NOT contain a[10] >> Hi, >> >> I'm a new user.. >> anyway I do agree you, I found this syntax quite strange and counter >> intuitive... >> it took me twice as much as it should be to understand this syntax.. >> >> I found this is very different from >> a[N] which include a[0], a[1], ... a[N-1]. >> >> >> a[i..j] should, in my mind, either countain all element from i to j inclusive or begin at i and include j elements.... >> >> I think more voice should complain about that.... |
November 11, 2002 Re: Slicing (new argument, I think) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mac Reiter | I'm in favor of that, but I'd also like range slices to use the a[first..last] syntax. I hope to God they don't switch over from the "it works well for lengths" argument to "there's too much legacy code written in D to change it now" argument. ;) I think it's bad the way it is now, and needs changing. Anyone from any background is going to look at a[1..3] and assume it's talking about a range including entry 1, entry 3, and everything in between. That's intuitively how people think about ranges; that's how they work in other languages. Sean "Mac Reiter" <Mac_member@pathlink.com> wrote in message news:aqoobq$tmn$1@digitaldaemon.com... > But if what interests you is creating a slice based on its length, why not: > > a[n:len]; > a[n#len]; // arguably better - kinda looks like a +, also has denotation of > // "number of things" > > That saves the extra "n+" part. It also focuses on the length of the slice, > rather than the endpoints, so it is a closer match to what all of the people who > like the current slicing syntax seem to use. > > What bothers me is that ".." denotes a range, not a length, so you shouldn't > argue that "it works well for lengths". If a length-based syntax (like one of > the above) is introduced, I would have absolutely *no* complaints, since it > would be doing what it is supposed to do. But ranges have their own syntax > (".."), and are orthogonal to concerns of length or of 0 or 1 based arrays. > > The other thing that bothers me is that the justification for this slicing system *always* comes back as "it works well for lengths". Which means that > novices are going to be taught that they can pull out a subarray of length n > with the syntax: > > a[0..n] > > and they're going to misunderstand what really happened. Because the explanation will focus on length, and how a substring of length n in a 0 based > array will not include [n] itself, they will start thinking in terms of lengths, > not ranges. Then, when they need to extract a subarray from further into the > array, they're going to try: > > a[5..n] > > and not understand why it failed. They didn't see the "full" version of the > first syntax: > > a[0..0+n] > > because *nobody* is going to write that. So they weren't aware of having to add > the starting location to the length to get what they wanted. > > I vote for switching to length-based slices, with the a[start#len] syntax. > Anyone else in favor? > Mac |
Copyright © 1999-2021 by the D Language Foundation