Arrays, Slices, Cases (page 5) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » D » Arrays, Slices, Cases (page 5)

March 27, 2003

Re: Arrays, Slices, Cases

Posted by Helmut Leitner
in reply to Sean L. Palmer

Helmut Leitner

Posted in reply to Sean L. Palmer


"Sean L. Palmer" wrote:
> There are situations where it would cost you performance, but not many. "Helmut Leitner" <helmut.leitner@chello.at> wrote in message news:3E82FBF5.75FDADF4@chello.at...
> >
> >
> > "Sean L. Palmer" wrote:
> > >
> > > I disagree.  Nonzero-based arrays buy you convenience.  Now you don't
> have
> > > to remember to subtract the base, it'll do it automatically.
> >
> > Yes, but it will cost any user of an array a little bit of performance for the potential base correction. If D is seeking to be a successor to C and C++ in system and game programming, one has to be very careful with this.
> 
> As in C++, you don't pay for what you don't use.  It doesn't necessarily cost any performance anyway.
> 
> If your array is from 1..10, and starts at address 0x80000000, and each entry is 4 bytes long, the compiler just takes your index and does this to compute the address:
> 
> (index*4)+0x7ffffffc
> 
> If it were zero based, it'd do this:
> 
> (index*4)+0x80000000

And when this address in not available at compile time (e. g. an element of an dynamically allocate part of an object) or passed through function call interfaces - how will you do it then without loss of performance? I think that's impossible.

-- 
Helmut Leitner    leitner@hls.via.at
Graz, Austria   www.hls-software.com

March 27, 2003

Re: Arrays, Slices, Cases

Posted by Mike Wynn
in reply to Helmut Leitner

Mike Wynn

Posted in reply to Helmut Leitner

"Helmut Leitner" <leitner@hls.via.at> wrote in message news:3E833158.D568C839@hls.via.at...
>
>
> "Sean L. Palmer" wrote:
> > There are situations where it would cost you performance, but not many. "Helmut Leitner" <helmut.leitner@chello.at> wrote in message news:3E82FBF5.75FDADF4@chello.at...
> > >
> > >
> > > "Sean L. Palmer" wrote:
> > > >
> > > > I disagree.  Nonzero-based arrays buy you convenience.  Now you
don't
> > have
> > > > to remember to subtract the base, it'll do it automatically.
> > >
> > > Yes, but it will cost any user of an array a little bit of performance for the potential base correction. If D is seeking to be a successor
to
> > > C and C++ in system and game programming, one has to be very careful with this.
> >
> > As in C++, you don't pay for what you don't use.  It doesn't necessarily cost any performance anyway.
> >
> > If your array is from 1..10, and starts at address 0x80000000, and each entry is 4 bytes long, the compiler just takes your index and does this
to
> > compute the address:
> >
> > (index*4)+0x7ffffffc
> >
> > If it were zero based, it'd do this:
> >
> > (index*4)+0x80000000
>
> And when this address in not available at compile time (e. g. an element of an dynamically allocate part of an object) or passed through function call interfaces - how will you do it then without loss of performance? I think that's impossible.
>

in C++ any array can be rebased thus;

inline template<T> T * rebase( T * ar, int base ) {
    return &(ar[-base]);
}

or C example
int * myarray = malloc( sizeof(int) * 80 );
....
int * onebase =  &myarray[-1]; // or myarray - 1;

onebase[1] is now myarray[0] :)

in a func call
int func( int * ar ) {
    ar -= 1;  // ar now 1 based.
    .....
}

March 28, 2003

Re: Arrays, Slices, Cases

Posted by Mark Evans
in reply to Helmut Leitner

Mark Evans

Posted in reply to Helmut Leitner

Helmut Leitner says...

>And when this address in not available at compile time ...
> - how will you do it then without
>loss of performance? I think that's impossible.

That tradeoff the programmer should be allowed to make.  I suspect it's wrong anyway.  The base value needs computation only once - dynamically or otherwise - and thereafter may be stored.  Each (random) array access involves at minimum one addition to find the desired element's memory address.  Using a different base adds not a whit of extra work.

We had a similar debate about negative indices.  Walter was against them for performance reasons that are easily addressed if not completely fictitious. Sean's motto 'pay for what you use' is apropos.

Farmer says...
>In D, there are many ways to express concepts, e.g. functions, classes, D-structs, templates. I believe that non-zero based arrays are not really required to express concepts in away that suitable for the problems, programmers have to solve.

Functional languages owe much of their fabulous productivity to array (list) handling capabilities.  An array can hold virtually anything, not just numbers. It can have more than one dimension to associate objects on different axes en masse.  C++ folks unfamiliar with such paradigms know little of what they're missing, so I understand these counteroffers, but there is no substitute.  The ability to pick apart, rearrange, index, map across, thread, and otherwise sling arrays around - and morph them into new forms - is a truly expressive and compact way to write tons of code with performance results comparable to C and even better, depending on your C programmer and his available time for optimizing nested inner loops and chasing down off-by-one errors.

Mark

March 28, 2003

Re: Arrays, Slices, Cases

Posted by Ilya Minkov
in reply to Helmut Leitner

Ilya Minkov

Posted in reply to Helmut Leitner

Helmut Leitner wrote:
> 
> "Sean L. Palmer" wrote:
> 
>>I disagree.  Nonzero-based arrays buy you convenience.  Now you don't have
>>to remember to subtract the base, it'll do it automatically.
> 
> 
> Yes, but it will cost any user of an array a little bit of performance
> for the potential base correction. If D is seeking to be a successor to
> C and C++ in system and game programming, one has to be very careful
> with this. 

No, it won't. In Pascal, functions disliked taking arrays of unknown size. You could only make them take an array[SomeConstant..] of SomeType, then you could process this array as SomeConstant-based. This function would only have a run-time specification of an array length, but not of a base, and thus wouldn't probably be any slower than a 0-based array. I'm not sure whether it was allowed to assume that arrays had the same base, or type checking had caught such misuse.

-i.

March 28, 2003

Re: Arrays, Slices, Cases

Posted by Helmut Leitner
in reply to Ilya Minkov

Helmut Leitner

Posted in reply to Ilya Minkov

Ilya Minkov wrote:
> 
> Helmut Leitner wrote:
> >
> > "Sean L. Palmer" wrote:
> >
> >>I disagree.  Nonzero-based arrays buy you convenience.  Now you don't have to remember to subtract the base, it'll do it automatically.
> >
> >
> > Yes, but it will cost any user of an array a little bit of performance for the potential base correction. If D is seeking to be a successor to C and C++ in system and game programming, one has to be very careful with this.
> 
> No, it won't. In Pascal, functions disliked taking arrays of unknown size. You could only make them take an array[SomeConstant..] of SomeType, then you could process this array as SomeConstant-based. This function would only have a run-time specification of an array length, but not of a base, and thus wouldn't probably be any slower than a 0-based array. I'm not sure whether it was allowed to assume that arrays had the same base, or type checking had caught such misuse.

You are right about Pascal. It was a major PITA that it was impossible even to write a generic function to e.g . calculate the average of an array of arbitrary size because the array size was part of the parameter type and therefore part of the function definition.

But I don't think that you can compare this. If you want a base != 0, than you have to pay for it. Maybe not much, but you have to pay.

If you don't offset the array pointer you will have to add the offset when accessing the array elements.

If you offset the array pointer you need to reset it before you free it's dynamical memory. So you have to store either the offset or the pointer to the allocated memory. If you want to implement range checking, you will have to either use the offset, or store separate hi and lo bounds. Anyway things become more complicated and it won't make a single C, C++ or Java programmer feel better about D.

-- 
Helmut Leitner    leitner@hls.via.at
Graz, Austria   www.hls-software.com

March 28, 2003

Re: Arrays, Slices, Cases

Posted by Ilya Minkov
in reply to Helmut Leitner

Ilya Minkov

Posted in reply to Helmut Leitner

Helmut Leitner wrote:
> 
> Ilya Minkov wrote:
>>No, it won't. In Pascal, functions disliked taking arrays of unknown
>>size. You could only make them take an array[SomeConstant..] of
>>SomeType, then you could process this array as SomeConstant-based. This
>>function would only have a run-time specification of an array length,
>>but not of a base, and thus wouldn't probably be any slower than a
>>0-based array. I'm not sure whether it was allowed to assume that arrays
>>had the same base, or type checking had caught such misuse.
> 
> 
> You are right about Pascal. It was a major PITA that it was impossible
> even to write a generic function to e.g . calculate the average of an array of arbitrary size because the array size was part of the parameter
> type and therefore part of the function definition.

It was possible. Only the low bound was fixed, and the length was passed implicitly to the function.

> But I don't think that you can compare this. If you want a base != 0,
> than you have to pay for it. Maybe not much, but you have to pay.

Let the base be in the calling code, and let the function accept a dynamic array "as if" it was placed on the certain offset. Consider: this all offset thingy is good for program readability. And while length usually depends on the run-time condition, the base usually depends upon readability considerations for algorithms. For example, in a string you usually have to iterate from 0 to length-1. That's what all the C guys do, they even have a "for" which allows to perfectly hide this fact or the oppsite so that a bug is not too easy to see. But isn't it neater to reference the string as 1-based?

> If you don't offset the array pointer you will have to add the offset
> when accessing the array elements.

Pointer offset can only be done as a short-term optimisation. Actually, i don't even think it's requiered, since almost every memory acess gets additive constants in algorithms.

Another example where it is useful, is a syntactic sugar for acessing arrays of constant size, like in Sean's example.

You can also make the function programmer handle it: in the upper example, he specifies the lower bound, but does not specify the upper bound. It is his responsibility to retrieve the upper bound and take it into account. The same can be done with a lower bound. The bad thing is that it would probably mean expanding the dynamic array specification, or would requiere extended function annotation by the compiler.

However, the problem may be of a more general nature, and might be better solved in a more generic way, as was pointed out by Mark. I'm not sure i exactly understand what specific features/ solution he means though.

-i.

March 28, 2003

Re: Arrays, Slices, Cases

Posted by Burton Radons
in reply to Carlos Santander B.

Burton Radons

Posted in reply to Carlos Santander B.

Carlos Santander B. wrote:
> "Pavel Minayev" <evilone@omen.ru> escribió en el mensaje
> news:a5jben$16o8$1@digitaldaemon.com...
> | "Walter" <walter@digitalmars.com> wrote in message
> | news:a56tc4$1534$2@digitaldaemon.com...
> |
> | > Having lower bounds specifiable will work with D (and even with C), but
> in
> | > my decades (!) of programming I've never found a use for it. I came to C
> | > from Basic, FORTRAN, and Pascal. I had some initial trouble getting used
> | to
> | > 0 based rather than 1 based, but never looked back. 0 based looked more
> | > 'right' to me.
> |
> | Well, there are actually cases where you'd prefer some base other than 0.
> | As you've seen, many people here consider 1 to be more suitable, and
> | I understand them... also there are some other cases, for example, suppose
> | you have an array of year income for 1990-2000, in Pascal you'd probably
> | declare it as "array[1990 .. 2000] of integer", and then index it like
> | income[1995], letting the compiler do his job and insert all the necessary
> | decrements; the result is clean code, easy to read and maintain. In C,
> | you have to do it all yourself, and probably define some const base =
> 1990,
> | and clutter all your code with things like income[1995 - base].
> |
> | After all, it's as simple as subtracting the base from the index,
> | a single SUB... ain't it worth the thing?
> |
> |
> 
> No one ever answered to this one. It seems very clever to me.

Putting aside the issue of implementation details, nobody has produced any practical examples for it, and I've never seen any good use of it in Pascal, either; it's always used static assumptions about the environment code is to be used in, as with Pavel's pseudo-example above.

March 28, 2003

Re: Arrays, Slices, Cases

Posted by Mark Evans
in reply to Helmut Leitner

Mark Evans

Posted in reply to Helmut Leitner

The array record can include both a real base pointer and a pseudo-base pointer (incorporating the offset). The pseudo-base pointer is computed only once, whether dynamically or statically.  Array access cost is identical with either pointer.

Beyond that, the array record could include an 'end' pointer making negative indexing equally simple.  It would be adjusted every time the length property is adjusted.

Mark


Helmut Leitner says...
>
>If you want a base != 0,
>than you have to pay for it. Maybe not much, but you have to pay.
>
>If you don't offset the array pointer you will have to add the offset when accessing the array elements.
>
>If you offset the array pointer you need to reset it before you free it's dynamical memory. So you have to store either the offset or the pointer to the allocated memory. If you want to implement range checking, you will have to either use the offset, or store separate hi and lo bounds. Anyway things become more complicated and it won't make a single C, C++ or Java programmer feel better about D.
>
>-- 
>Helmut Leitner    leitner@hls.via.at
>Graz, Austria   www.hls-software.com

March 29, 2003

Re: Arrays, Slices, Cases

Posted by Karl Bochert
in reply to Farmer

Karl Bochert

Posted in reply to Farmer

On Wed, 26 Mar 2003 23:45:19 +0000 (UTC), Farmer <itsFarmer.@freenet.de> wrote:
> "Carlos Santander B." <carlos8294@msn.com> wrote in news:b5k815$11ia$2@digitaldaemon.com:
> 
> > "Pavel Minayev" <evilone@omen.ru> escribió en el mensaje news:a5jben$16o8$1@digitaldaemon.com...
> >| "Walter" <walter@digitalmars.com> wrote in message
> >| news:a56tc4$1534$2@digitaldaemon.com...
> >|
> >| > Having lower bounds specifiable will work with D (and even with C),
> >| > but
> > in
> >| > my decades (!) of programming I've never found a use for it. I came
> >| > to C from Basic, FORTRAN, and Pascal. I had some initial trouble
> >| > getting used
> >| to
> >| > 0 based rather than 1 based, but never looked back. 0 based looked
> >| > more 'right' to me.
> >|
> >| Well, there are actually cases where you'd prefer some base other
> >| than 0. As you've seen, many people here consider 1 to be more
> >| suitable, and I understand them... also there are some other cases,
> >| for example, suppose you have an array of year income for 1990-2000,
> >| in Pascal you'd probably declare it as "array[1990 .. 2000] of
> >| integer", and then index it like income[1995], letting the compiler
> >| do his job and insert all the necessary decrements; the result is
> >| clean code, easy to read and maintain. In C, you have to do it all
> >| yourself, and probably define some const base =
> > 1990,
> >| and clutter all your code with things like income[1995 - base].
> 
> Having arrays with base 1 and zero-based index is likely to become a maintainer's nightmare: Whenever you see an array, you must look at it's declaration (or wait for a tooltip from you IDE) and check whether is zero- based or not. Array operations become much more bug prone. Once your brain adjusted to zero-based indices, you can easily write bug-free code for arrays or verify that a given code is bug-free. But switching between different bases is very difficult.
> 
> Zero-based indices are favoured over 1-based indices for implementations reasons. E.g. with a Byte you can address 256 array elements instead of only 255.
> 
> In D, there are many ways to express concepts, e.g. functions, classes, D- structs, templates. I believe that non-zero based arrays are not really required to express concepts in away that suitable for the problems, programmers have to solve.
> 
> 
> Farmer.

I always get annoyed when people refer to '1-based' arrays. An array whose first element is arr[1] is a very special beast. Its elements are being labled with their position in the array. Access is by ordinal rather than cardinal.

A quick test: Where is the character 's' in the word 'test'?

If you answered "the character offset 2 from the start" then maybe '0-based' arrays make sense. If you said "the third character" you ought to consider having arrays accessed by ordinals.

Karl


It is highly misleading to think of '0-based' and '1-based' arrays.  What they really are
is arrays accessed by index, and arrays accessed by position. An array may have
any arrangment of indices, but it always has a first position.
I am against 0-based arrays. I am against 1-based arrays. I am for positional arrays.

Last I looked, D had come up with an absolutely horrible approach to specifiying slices, brought on, I'm sure,  by this 'based' concept.

March 29, 2003

Re: Arrays, Slices, Cases

Posted by Mark Evans
in reply to Karl Bochert

Mark Evans

Posted in reply to Karl Bochert

Karl Bochert complains correctly:
>If you answered "the character offset 2 from the start" then maybe '0-based' arrays make sense.

Karl - welcome to the universe of design flaws we call the C programming language.  Confounding arrays with pointers with strings produces ... C.  Pull one string, and the whole building collapses.

Disentangling these concepts invites more than one answer to your question.  The Icon language defines string positions between characters, while general arrays are "1-based."  Icon is exceptionally adept at string processing.

From the Icon Handbook, "Icon's strings and tables make text processing much more convenient than in languages that only provide characters and character arrays....Unlike most languages where strings are implemented as arrays of characters, Icon provides strings as a primitive data type. They can be of any length. There are extensive facilities for searching and editing strings."

Unicode with its variable-byte-length encodings makes disentangling strings from arrays more urgent still.  D should promote strings to primitive type status with dedicated constructs.  Icon had it right a long time ago.  D is struggling with Unicode because the confused C model is not amenable to Unicode.  As a primitive type, Unicode string complexities would vanish under the hood.  I'm not holding my breath, but if you ask me, that is how to do strings right. (Those in love with C strings could still declare arrays of char.)

http://www.toolsofcomputing.com/IconHandbook/ http://unicon.sourceforge.net/index.html

Mark

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation