Possible rewrite of array operation spec - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Possible rewrite of array operation spec

Thread overview

Possible rewrite of array operation spec
Feb 15, 2005 Stewart Gordon
Feb 15, 2005 xs0
Feb 15, 2005 Stewart Gordon
Feb 15, 2005 xs0
Feb 16, 2005 Stewart Gordon
Feb 16, 2005 xs0
Feb 16, 2005 Stewart Gordon
Feb 16, 2005 xs0
Feb 17, 2005 Stewart Gordon
Feb 17, 2005 xs0
Feb 17, 2005 Stewart Gordon
Feb 17, 2005 xs0
Feb 18, 2005 Stewart Gordon
Feb 18, 2005 xs0
Feb 18, 2005 Stewart Gordon
Feb 18, 2005 xs0
Feb 21, 2005 Georg Wrede
Feb 18, 2005 Stewart Gordon
Feb 21, 2005 Stewart Gordon
Feb 17, 2005 Regan Heath
Feb 15, 2005 xs0
Feb 15, 2005 Regan Heath
Feb 16, 2005 Stewart Gordon
Feb 16, 2005 Regan Heath
Feb 17, 2005 Stewart Gordon
Feb 17, 2005 Regan Heath
Feb 17, 2005 Regan Heath
Feb 15, 2005 pragma
Feb 16, 2005 Unknown W. Brackets
Feb 16, 2005 Stewart Gordon
Feb 16, 2005 pragma
Feb 16, 2005 Regan Heath
Feb 17, 2005 Norbert Nemec
Feb 17, 2005 Regan Heath
Feb 18, 2005 Stewart Gordon
Feb 18, 2005 Norbert Nemec
Feb 16, 2005 Norbert Nemec
Feb 16, 2005 Stewart Gordon
Feb 17, 2005 Norbert Nemec
Feb 17, 2005 Dave
Feb 17, 2005 Norbert Nemec
Feb 17, 2005 Dave

February 15, 2005

Possible rewrite of array operation spec

Posted by Stewart Gordon

Stewart Gordon

This'll probably get people asking the prospect of array operations for 1.0 to be resurrected, but still....

Here is a possible specification for array operations that I feel is better-defined than the one in the current out-of-date spec.  Of course, there are still some open questions, which I've put at the bottom.


Array operations
----------------
Arithmetic and bitwise operators are defined on array operands.  An expression involving an array evaluates to a new array in which the operator has been applied to the elements of the operands in turn.

In essence, when an expression contains more than one array operation, a new array is created to hold the result of each operation.  However, a quality implementation will optimize the evaluation of the expression to eliminate temporaries where possible.

Unary operations
~~~~~~~~~~~~~~~~
For the unary operators +, - and ~, the expression evaluates to a new array containing the result of applying the operator to each element. For example, with the declaration

	int[] x, y;

then the statement

	y = -x;

is simply equivalent to

	y = new int[x.length];
	for (int i = 0; i < y.length; i++) {
		y[i] = -x[i];
	}

Binary operations
~~~~~~~~~~~~~~~~~
The binary operations supported are +, -, *, /, %, &, |, ^, <<, >> and >>>.

If the two arrays are of the same dimension and of compatible types, then the expression evaluates to a new array in which each element is the result of applying the operator to corresponding elements of the operands.  For example, with the declarations

    int[] x, y, z;

the statement

    z = x + y;

is equivalent to

	z = new int[x.length];
	for (int i = 0; i < z.length; i++) {
		z[i] = x[i] + y[i];
	}

Both operands must be of the same length.  If they are not, an ArrayBoundsError is thrown.

For higher dimensions, this definition is applied recursively.  For example, with

    int[][] x, y, z;

the statement

    z = x * y;

is equivalent to

	z = new int[x.length];
	for (int i = 0; i < z.length; i++) {
		z[i] = x[i] * y[i];
	}

which is in turn equivalent to

	z = new int[x.length];
	for (int i = 0; i < z.length; i++) {
		z[i] = new int[x[i].length];
		for (int j = 0; j < z[i].length; j++) {
			z[i][j] = x[i][j] * y[i][j];
		}
	}

If the operands do not match in dimension, then the operator is applied to each element of the higher-dimension operation with the whole of the lower-dimension one.  For example, with

	int[] x, z;
	int y;

the statement

	z = x - y;

is equivalent to

	z = new int[x.length];
	for (int i = 0; i < z.length; i++) {
		z[i] = x[i] - y;
	}

Similarly,

	z = y - x;

is equivalent to

	z = new int[x.length];
	for (int i = 0; i < z.length; i++) {
		z[i] = y - x[i];
	}

This definition is applied recursively if the dimensions differ by two or more.

Assignment operations
~~~~~~~~~~~~~~~~~~~~~
When x is an array, the assignment

	x op= y;

is taken as equivalent to

	x = x op y;

whether y is an array of matching dimension, an array of lower dimension or a scalar.  Thus the operation creates a new array and assigns it to x.  If a sliced lvalue is used, the array is modified in place, so that

	x[] op= y;

is equivalent to

	x[] = x[] op y;

The preincrement and predecrement operators are handled in the same way.

User-defined types
~~~~~~~~~~~~~~~~~~
A class, struct or union type may have operators overloaded with array types as parameters.  To avoid conflicts between overloaded operators and array operations, binary operations involving both array and user-defined types are resolved as follows:

1. The normal operator overloading rules are applied.
2. If no match is found, the array operation rules are applied until both operands are reduced to scalar type; operator overloading rules are then applied to the result.
3. If the expression still does not resolve, it is an error.


Open questions
~~~~~~~~~~~~~~
Should postincrement and postdecrement be allowed?  How should they be handled?

Should we generalise the concept to function calls?  If so, I guess that overload resolution would work in much the same way as for operations on user-defined types.

If we do allow it on function calls, should we allow it to work on functions of three or more parameters?  In this case, the highest-dimension argument would be reduced to the dimension of the second highest, and then these two reduced together to match the third highest, and so on.

Of course, these questions raise one more: how easy or hard would these ideas be to implement?


Any thoughts?

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.

February 15, 2005

Re: Possible rewrite of array operation spec

Posted by xs0
in reply to Stewart Gordon

xs0

Posted in reply to Stewart Gordon

Nice job on the spec (especially the different dimensions handling). I do feel, however, that it should not be necessary to always create new arrays for result -- I'd say that if the left-hand side is non-null and of the appropriate size (i.e. exactly equal to size of result), it need not be reallocated. Consider just iterating through a list of vectors to find a specific one (for example, by multiplying each with some constant vector (i.e. dot product))- you wouldn't want a gazillion arrays created..

> Open questions
> ~~~~~~~~~~~~~~
> Should postincrement and postdecrement be allowed?  How should they be handled?

I don't think so.. It would look weird and you can achieve the same by typing a+=1..


> Should we generalise the concept to function calls?  If so, I guess that overload resolution would work in much the same way as for operations on user-defined types.

Well, AFAIK, the whole point of having language-supported ops is run-time efficiency - you can otherwise easily wrap your arrays in classes like Vectors and Matrices and overload ops. Functions just don't make much sense in that respect (assuming you mean calling func(int, int) for each element of int[], int[]?)

> Of course, these questions raise one more: how easy or hard would these ideas be to implement?

No idea :)

If this is done, I'd also suggest adding a few util functions to arrays, like .min, .minIndex, .max, .maxIndex, .sum and .vectorLength (as in sqrt(a[0]^2 + a[1]^2 + ...)). Dot product is then simply (a*b).sum, and the compiler can detect this is being done and not even produce the result array..


xs0

February 15, 2005

Re: Possible rewrite of array operation spec

Posted by Stewart Gordon
in reply to xs0

Stewart Gordon

Posted in reply to xs0

xs0 wrote:
> Nice job on the spec (especially the different dimensions handling). I do feel, however, that it should not be necessary to always create new arrays for result -- I'd say that if the left-hand side is non-null and of the appropriate size (i.e. exactly equal to size of result), it need not be reallocated. Consider just iterating through a list of vectors to find a specific one (for example, by multiplying each with some constant vector (i.e. dot product))- you wouldn't want a gazillion arrays created..

That's basically what I said here:

>> However, a quality implementation will optimize the evaluation of the expression to eliminate temporaries where possible.

The essence of the essence is to keep the spec relatively simple and the semantics clear.

<snip>
> If this is done, I'd also suggest adding a few util functions to arrays, like .min, .minIndex, .max, .maxIndex, .sum and .vectorLength (as in sqrt(a[0]^2 + a[1]^2 + ...)). Dot product is then simply (a*b).sum, and the compiler can detect this is being done and not even produce the result array..

I started to mention this idea before.

http://www.digitalmars.com/drn-bin/wwwnews?D/21671

But calling them .min and .max is asking for confusion with the minimum and maximum allowable values of a type.  Maybe .minValue and .maxValue, making them parallel with the Index counterparts?

And if there is a tie for maximum or minimum, which index should minIndex and maxIndex return?  The first?  The last?  Any old one?  What would be worth any difference in computational cost?

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.

February 15, 2005

Re: Possible rewrite of array operation spec

Posted by Regan Heath
in reply to xs0

Regan Heath

Posted in reply to xs0

On Tue, 15 Feb 2005 19:15:45 +0100, xs0 <xs0@xs0.com> wrote:
>> Open questions
>> ~~~~~~~~~~~~~~
>> Should postincrement and postdecrement be allowed?  How should they be handled?
>
> I don't think so.. It would look weird and you can achieve the same by typing a+=1..

I think they should be allowed, I don't think they look weird, and I think they're useful eg.

int[] a;
int[] b;
int[] c;

b[] = a[]--;       //assigns b[x] = a[x], then a[x] = a[x]-1;
b[] = c[] + a[]--; //assigns b[x] = c[x] + a[x], then a[x] = a[x]-1;

Basically it's done as 2 operations, same as:

int i;
int j;

j = i--;

Regan

February 15, 2005

Re: Possible rewrite of array operation spec

Posted by xs0
in reply to Stewart Gordon

xs0

Posted in reply to Stewart Gordon


>> [snip - don't reallocate result array if it is already the 
>> correct size (and not null, of course)]
> 
> That's basically what I said here:
> 
>>> However, a quality implementation will optimize the evaluation of the expression to eliminate temporaries where possible.

But it's not just an issue of optimization - if the spec says it must be allocated, you can't change the semantics of that by optimization. Consider two objects holding a reference to the same array; if one of them does a vector op with the result stored in the variable holding the reference and if the spec says you must reallocate, they mustn't have the same reference after the op is finished (or actually just after it starts), and that can't be optimized away (at least according to such spec)..


> But calling them .min and .max is asking for confusion with the minimum and maximum allowable values of a type.  Maybe .minValue and .maxValue, making them parallel with the Index counterparts?

or .minEl and .maxEl (El for element)? and perhaps min/maxElIndex would be more obvious, although verbosity doesn't seem to be preferred in D :)


> And if there is a tie for maximum or minimum, which index should minIndex and maxIndex return?  The first?  The last?  Any old one?  What would be worth any difference in computational cost?

When implemented by hand, this is usually the first matching index, I think, so that'd be an acceptable spec. Any old one is not very good - if it's the first (or the last), you can easily check for multiples with the slice syntax, otherwise you can't..


xs0

February 15, 2005

Re: Possible rewrite of array operation spec

Posted by pragma
in reply to Stewart Gordon

pragma

Posted in reply to Stewart Gordon

In article <cussnt$fe8$1@digitaldaemon.com>, Stewart Gordon says...
>
>Should we generalise the concept to function calls?  If so, I guess that overload resolution would work in much the same way as for operations on user-defined types.

That makes sense to me.  Just allow arrays of function pointers, delegates and anything with opCall() defined to be callable en masse.

At a minimum, it would make event-based (1 to n messaging) programming a snap.

I would also think that it could be extended to object members and methods as well?

> interface Foo{
>  int foo();
>  void bar();
> }

> Foo[] test;
> test.foo(); // calls .foo() for each object
> int[] result = test.bar(); // calls .bar() for each object, result is an array.

Seems exotic at first, but it does lend to D's mantra of being 'context free'.

>If we do allow it on function calls, should we allow it to work on functions of three or more parameters?  In this case, the highest-dimension argument would be reduced to the dimension of the second highest, and then these two reduced together to match the third highest, and so on.

I'm not sure I follow you here.  Sounds like you're talking about using the function syntax for mapping to the array dimension space... is this correct?

>
>Of course, these questions raise one more: how easy or hard would these ideas be to implement?
>

Not hard, but would require a certain familiarity with the frontend that only Walter and an intrepid few programmers share.  There are also some holes, that would have to be handled, if one adhered to just this spec by itself.

It would be much easier to implement the above as templates first (as a proof-of-concept) simply because each of your examples are merely transforms based on type and dimension.  It could help with refining the spec.

One question: What about associative arrays?

- EricAnderton at yahoo

February 15, 2005

Re: Possible rewrite of array operation spec

Posted by xs0
in reply to Stewart Gordon

xs0

Posted in reply to Stewart Gordon

One more thing I just thought of - there's quite a difference between these:

a = b;  // copy _reference_
a = -b; // copy _contents_ and negate

Which might not be good.. However, using slice syntax doesn't quite look that good:

a[] = b[] + c[] * d[]; // abcd are somewhat lost in there

Perhaps a slightly modified syntax could be used? Something like

a <-- b + c * d;

I'm sure not many people will like that :) It does clearly differentiate between the usual meaning of = and the looping on elements done with arrays, though.. Any thoughts?


xs0

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by Unknown W. Brackets
in reply to pragma

Unknown W. Brackets

Posted in reply to pragma

>>Foo[] test;
>>test.foo(); // calls .foo() for each object
>>int[] result = test.bar(); // calls .bar() for each object, result is an array.
> 
> 
> Seems exotic at first, but it does lend to D's mantra of being 'context free'.  

That, and the events, looks very nice in my opinion.  Then again, would it conflict with the current array overloading?

int[] bar(Foo[] test);

That's the only flaw I can see in it at this point.  But it would make arrays of objects much more useful imho.

> One question: What about associative arrays?

Well, if the two had matching sets of keys it might be possible:

int[int] x, y;

x = -y;

foreach (int /* or char, etc. */ k; x.keys)
   x[k] = -y[k];

The question is, does this have to check that the keys all exist in both arrays, too?  Meaning, x.keys.length == y.keys.length and that none of the evaluations cause an exception?  Seems rather dangerous, to me...

-[Unknown]

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by Stewart Gordon
in reply to xs0

Stewart Gordon

Posted in reply to xs0

xs0 wrote:
>> That's basically what I said here:
>>
>>>> However, a quality implementation will optimize the evaluation of the expression to eliminate temporaries where possible.
> 
> But it's not just an issue of optimization - if the spec says it must be allocated, you can't change the semantics of that by optimization.

Well, the spec doesn't say that it _must_ be allocated.  The spec states what rather than how.  So what matters is that the resulting behaviour is consistent.

> Consider two objects holding a reference to the same array; if one of them does a vector op with the result stored in the variable holding the reference and if the spec says you must reallocate, they mustn't have the same reference after the op is finished (or actually just after it starts), and that can't be optimized away (at least according to such spec)..

The distinction between reference assignment and copying is already covered by the current spec.  In-place vector modification follows directly from copying.  For example, consider

    int[] x, y, z;
    ...
    y = x;
    ...
    y = z * 2;

then y refers to a new array, and x still refers to the original.  If you want to modify/repopulate y in place, you would do

    y[] = z * 2;

by which you are indicating that you want to preserve the state of x and y referring to the same data.  The same applies if op= is used instead of just =.

Of course, if x is never used again and has never been assigned to anything else by reference, the two would be effectively equivalent.

<snip>
>> And if there is a tie for maximum or minimum, which index should minIndex and maxIndex return?  The first?  The last?  Any old one?  What would be worth any difference in computational cost?
> 
> When implemented by hand, this is usually the first matching index, I think, so that'd be an acceptable spec. Any old one is not very good - if it's the first (or the last), you can easily check for multiples with the slice syntax, otherwise you can't..

How could I check for multiples with the slice syntax?

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by Norbert Nemec
in reply to Stewart Gordon

Norbert Nemec

Posted in reply to Stewart Gordon

Nice work, Stewart! This is definitely much more than I have come up with over the past months. If only, I could get my head together, sit down and write up the complete proposal on arrays, array operations, etc. that I have in my mind...

The proposal that you bring up seems mostly similar to the one that was in the specs over a long time - except of course, that it is a lot clearer and more detailed. For the time being, I see no fundamental flaw with it. However, the question that are left for me are:


* How does one extend these array operations?

Interpreting every function call on an array as an elementwise operation is not a good idea. The other extreme would be to limit array operations to the fixed set, predefined by the language. This would give you a toy solution that is nice for the occasional user but does not hold up for someone digging in deeper.


* How does this extend to multidimensional arrays?

The kind of nested arrays that you mention (like int[][]) is all we have at the moment, but I still feel that D needs true multidimensional (i.e. rectangular) arrays. And for these, there should be some more flexibility to specify which dimension should be looped over etc.

I know that rectangular arrays are a thing of the future, but I still think that they are needed, and I see a certain danger in specifying array operations without rectangular arrays in mind.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation