Possible rewrite of array operation spec (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Possible rewrite of array operation spec (page 2)

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by Stewart Gordon
in reply to Regan Heath

Stewart Gordon

Posted in reply to Regan Heath

Regan Heath wrote:
> On Tue, 15 Feb 2005 19:15:45 +0100, xs0 <xs0@xs0.com> wrote:
> 
>>> Open questions
>>> ~~~~~~~~~~~~~~
>>> Should postincrement and postdecrement be allowed?  How should they be  handled?
<snip>
> I think they should be allowed, I don't think they look weird, and I think  they're useful eg.
> 
> int[] a;
> int[] b;
> int[] c;
> 
> b[] = a[]--;       //assigns b[x] = a[x], then a[x] = a[x]-1;
> b[] = c[] + a[]--; //assigns b[x] = c[x] + a[x], then a[x] = a[x]-1;
<snip>

If you've got the []s to indicate in-place assignment, then this makes sense.  But what about

    b = a--;

?

Two possible interpretations:

    b = a;
    a = a.dup;
    foreach (inout x; a) x--;

or

    b = a.dup;
    foreach (inout x; a) x--;

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by Stewart Gordon
in reply to pragma

Stewart Gordon

Posted in reply to pragma

pragma wrote:
<snip>
> That makes sense to me.  Just allow arrays of function pointers, delegates and anything with opCall() defined to be callable en masse.  
> 
> At a minimum, it would make event-based (1 to n messaging) programming a snap.
> 
> I would also think that it could be extended to object members and methods as well?
> 
>> interface Foo{
>>     int foo();
>>     void bar();
>> }
> 
> 
>> Foo[] test;
>> test.foo(); // calls .foo() for each object
>> int[] result = test.bar(); // calls .bar() for each object, result is an array.

That would lead to such troubles as

    class Foo {
        int length;
    }

    Foo[] test;

and then is test.length the length of the array, or an array of lengths of the Foo objects?

<snip>
> I'm not sure I follow you here.  Sounds like you're talking about using the function syntax for mapping to the array dimension space... is this correct?

I'm not sure I follow you either.

You seem to be talking about the ability to call arrays of functions.  I was actually talking about the ability to pass array arguments to functions defined with scalar parameters.

<snip>
> One question: What about associative arrays?

Good question.  I guess we could extend the concept to AAs.  An operation on an AA would return a new AA in which the keys remain the same and the operation is applied to the values.  Trying to do it on binary ops, let alone functions of three or more parameters, would require that the two arrays have the same set of keys ... but is this likely to happen in the real world?

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by Stewart Gordon
in reply to Stewart Gordon

Stewart Gordon

Posted in reply to Stewart Gordon

Another open question: should we allow all this on char arrays?

I can imagine someone wanting to do Caesar or Vigenère cipher stuff with this.  But does it really make any sense to do it under the UTF constraints?

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by xs0
in reply to Stewart Gordon

xs0

Posted in reply to Stewart Gordon

> Well, the spec doesn't say that it _must_ be allocated.  The spec states what rather than how.  So what matters is that the resulting behaviour is consistent.

Well, I read that in "a new array is created to hold the result"; however, having re-read the paragraph, I see that it can be read both ways.. It should be clear, though, and I think that it should explicitly say that the result array will be reused if it exists (is non-null) and is exactly the right size. If you want a new array, you can always use (a+b*c).dup or set result=null before the expression (or we can introduce a new syntax: result = new(a+b*c) :). You should have the option of reusing the array, though, so it should be supported in the spec.


> The distinction between reference assignment and copying is already covered by the current spec.  In-place vector modification follows directly from copying.  For example, consider
> 
>     int[] x, y, z;
>     ...
>     y = x;
>     ...
>     y = z * 2;
> 
> then y refers to a new array, and x still refers to the original.  If you want to modify/repopulate y in place, you would do
> 
>     y[] = z * 2;
> 
> by which you are indicating that you want to preserve the state of x and y referring to the same data.  The same applies if op= is used instead of just =.

But why would you prefer a new array instead of in-place (when possible)? Considering arrays are references (with static arrays, you can't even create a new one, right?), it should be exactly the same if you say y=z*2 or x=z*2, as it is with objects, otherwise a whole lot of confusion and bugs will result from this, if you ask me..



>> When implemented by hand, this is usually the first matching index, I think, so that'd be an acceptable spec. Any old one is not very good - if it's the first (or the last), you can easily check for multiples with the slice syntax, otherwise you can't..
> 
> 
> How could I check for multiples with the slice syntax?

int[] data;

int idx=data.minElIndex;

if (data[idx+1..length].minEl == data[idx]) {
   // more than one
}


xs0

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by pragma
in reply to Stewart Gordon

pragma

Posted in reply to Stewart Gordon

In article <cuvgv4$4fd$1@digitaldaemon.com>, Stewart Gordon says...
>
>pragma wrote:
>>> Foo[] test;
>>> test.foo(); // calls .foo() for each object
>>> int[] result = test.bar(); // calls .bar() for each object, result is an array.
>
>That would lead to such troubles as
>
>     class Foo {
>         int length;
>     }
>
>     Foo[] test;
>
>and then is test.length the length of the array, or an array of lengths of the Foo objects?

That is a problem.  On the one hand, this is obviously a potential source of error, and could be flagged down by the compiler quite easily.  On the other, it could be handled via some precedence given to array properties over array expression resolution (not a very good idea IMO).

Perhaps an additional array pseudo-property could be added to avoid such conflicts? How about 'each' or 'every'?

> Foo[] test;
> test.each.foo(); // calls .foo() for each object
> int[] result = test.each.bar();

>> One question: What about associative arrays?
>
>Good question.  I guess we could extend the concept to AAs.  An operation on an AA would return a new AA in which the keys remain the same and the operation is applied to the values.  Trying to do it on binary ops, let alone functions of three or more parameters, would require that the two arrays have the same set of keys ... but is this likely to happen in the real world?

I see what you mean.  It reminds me of database operations, when performing arithmetic in SQL statements.  Perhaps the standard ruleset for dealing with null (non-existant) values might come into play here: any op performed against null is null.

So far, you've applied the spec to associative arrays already.  The dimensions thus far have merely been int's, so this much is done. I think that if you were to rewrite your translations based on AA's of int keys, you'd see that its most of the way there.

All that's really left is set-notation, which would be necessary the instant you get away from using scalars for your dimensions.  Overloading '+' for superset, or union, for example would be a bad move because objects may have operator overloads that are applicable.

So you'd need to extend the array properties to include operations apart form
minValue() and maxValue() like union(arr), superset(arr), exclusion(arr) and so
forth.

Fore example, assume that ValueType and KeyType can be *anything*, not just int or char[].

> alias ValueType[KeyType] ExampleAA;
> ExampleAA a,b,c;
> a = b + c;

Which translates to:
> a = b.dup;
> foreach(KeyType key,ValueType  value; c){
>   a[key] += value;
> }

The above fails if ValueType is an object w/o opAdd() or a string, so the compiler would have to see the types invovled ahead of time and generate a compiler error (Cannot add char[][int] to char[][int]).

This syntax would give the needed behavior:

> a = b.superset(c);

Which translates to:
> a = b.dup;
> foreach(KeyType key,ValueType  value; c){
>   a[key] = value; // subtle, but important
> }

- EricAnderton at yahoo

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by Stewart Gordon
in reply to xs0

Stewart Gordon

Posted in reply to xs0

xs0 wrote:
<snip>
> Well, I read that in "a new array is created to hold the result"; however, having re-read the paragraph, I see that it can be read both ways.. It should be clear, though, and I think that it should explicitly say that the result array will be reused if it exists (is non-null) and is exactly the right size. If you want a new array, you can always use (a+b*c).dup or set result=null before the expression (or we can introduce a new syntax: result = new(a+b*c) :). You should have the option of reusing the array, though, so it should be supported in the spec.

It's already supported.  At the moment,

    int[] a, b;
    ...
    a = b;

does a reference assignment.  To reuse the result array, you use

    a[] = b;

This would remain the same when array operations are involved.

>> The distinction between reference assignment and copying is already covered by the current spec.  In-place vector modification follows directly from copying.  For example, consider
>>
>>     int[] x, y, z;
>>     ...
>>     y = x;
>>     ...
>>     y = z * 2;
>>
>> then y refers to a new array, and x still refers to the original.  If you want to modify/repopulate y in place, you would do
>>
>>     y[] = z * 2;
>>
>> by which you are indicating that you want to preserve the state of x and y referring to the same data.  The same applies if op= is used instead of just =.
> 
> But why would you prefer a new array instead of in-place (when possible)?

Because you want x to still contain the same old data, of course.

> Considering arrays are references (with static arrays, you can't even create a new one, right?), it should be exactly the same if you say y=z*2 or x=z*2, as it is with objects, otherwise a whole lot of confusion and bugs will result from this, if you ask me..

Of course not.  Why would you declare two references to the same array in the same scope if (x === y) is going to remain true throughout?

<snip>
>> How could I check for multiples with the slice syntax?
> 
> int[] data;
> 
> int idx=data.minElIndex;
> 
> if (data[idx+1..length].minEl == data[idx]) {
>    // more than one
> }

Oh yes, that makes perfect sense.  But can the idiom be made efficient?

A further idea might be to add minElCount and maxElCount.  The compiler could optimise so that minEl, minElIndex and minElCount are calculated with one pass (a bit like common subexps) if two or more of them are used without the array being modified between retrievals.  (Or maybe there could be something that returns a structure of value, index and count.)

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by xs0
in reply to Stewart Gordon

xs0

Posted in reply to Stewart Gordon

Stewart Gordon wrote:
> To reuse the result array, you use
> 
>     a[] = b;
> 
> This would remain the same when array operations are involved.

but a[]=b doesn't look like a can be [re]allocated (and it can be, if it is the wrong size or null). anyhow, it's not that important :)

>> But why would you prefer a new array instead of in-place (when possible)?
> 
> Because you want x to still contain the same old data, of course.

But in this case you'd normally do

x=y.dup;
y=...;

Except in case of array expressions, where you suggest

x=y;
y=...; // y is new

> Of course not.  Why would you declare two references to the same array in the same scope if (x === y) is going to remain true throughout?

Who said anything about the same scope? Any anyhow, when I asked about your preference, I meant why do you think it's better to lose references as easily as possible, than the opposite? If there's only one reference, there is no argument - it's obviously better to reuse the array to avoid allocation/gc costs. If there's more than one reference, I'd definitely argue that the intention is usually to have them point to the same something (array or class) than to different somethings. If a snapshot of data is required, there's .dup (which is there in any case and serves exactly this purpose)

> Oh yes, that makes perfect sense.  But can the idiom be made efficient?
> 
> A further idea might be to add minElCount and maxElCount.  The compiler could optimise so that minEl, minElIndex and minElCount are calculated with one pass (a bit like common subexps) if two or more of them are used without the array being modified between retrievals.  (Or maybe there could be something that returns a structure of value, index and count.)

Sure, a struct would work, it might needlessly complicate things, though.. In most cases you just want the min/max value and checking for multiples is much rarer. I'd then prefer a .count(el) method which counts the number of occurences of _any_ value, not just min/max. The compiler can still easily optimize occurrences like

data.count(data.maxEl)

or even

int firstMinIdx;
int minValue;
int reps=data.count(minValue=data[firstMinIdx=data.minElIndex]);

while always avoiding counting and/or keeping min value/index when it's not necessary (and having these done as fast as possible is the whole point of compiler support - writing functions that do these things is trivial; and just consider how much faster a .count(1) can be on a bit[] than fetching each bit separately :). There's another problem with a struct - if you'd want the whole struct, it would have to be templatized to hold any kind of value, producing a new template instance (costing disk, memory, compilation time etc..) on each type of array where you'd want access to these properties.

xs0

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by Regan Heath
in reply to Stewart Gordon

Regan Heath

Posted in reply to Stewart Gordon

On Wed, 16 Feb 2005 12:58:46 +0000, Stewart Gordon <smjg_1998@yahoo.com> wrote:
> Regan Heath wrote:
>> On Tue, 15 Feb 2005 19:15:45 +0100, xs0 <xs0@xs0.com> wrote:
>>
>>>> Open questions
>>>> ~~~~~~~~~~~~~~
>>>> Should postincrement and postdecrement be allowed?  How should they be  handled?
> <snip>
>> I think they should be allowed, I don't think they look weird, and I think  they're useful eg.
>>  int[] a;
>> int[] b;
>> int[] c;
>>  b[] = a[]--;       //assigns b[x] = a[x], then a[x] = a[x]-1;
>> b[] = c[] + a[]--; //assigns b[x] = c[x] + a[x], then a[x] = a[x]-1;
> <snip>
>
> If you've got the []s to indicate in-place assignment, then this makes sense.

Which is exactly why I had them :)

> But what about
>
>      b = a--;
>
> ?

I probably should have thought about those too...

> Two possible interpretations:
>
>      b = a;
>      a = a.dup;
>      foreach (inout x; a) x--;

Not this, because "b = a" is a copy i.e. "b = a.dup"

> or
>
>      b = a.dup;
>      foreach (inout x; a) x--;

This is probably the most sensible.

  or

Error: postdecrement is not allowed on 'a' did you mean 'a[]'

Regan

February 16, 2005

Re: Possible rewrite of array operation spec

Posted by Regan Heath
in reply to pragma

Regan Heath

Posted in reply to pragma

On Wed, 16 Feb 2005 16:10:40 +0000 (UTC), pragma <pragma_member@pathlink.com> wrote:
> In article <cuvgv4$4fd$1@digitaldaemon.com>, Stewart Gordon says...
>>
>> pragma wrote:
>>>> Foo[] test;
>>>> test.foo(); // calls .foo() for each object
>>>> int[] result = test.bar(); // calls .bar() for each object, result is an array.
>>
>> That would lead to such troubles as
>>
>>     class Foo {
>>         int length;
>>     }
>>
>>     Foo[] test;
>>
>> and then is test.length the length of the array, or an array of lengths
>> of the Foo objects?
>
> That is a problem.  On the one hand, this is obviously a potential source of
> error, and could be flagged down by the compiler quite easily.  On the other, it
> could be handled via some precedence given to array properties over array
> expression resolution (not a very good idea IMO).
>
> Perhaps an additional array pseudo-property could be added to avoid such
> conflicts? How about 'each' or 'every'?

What about:

test.length    //length of array
test[].length  //length of each element in array

I think this idea could possibly be a source of bugs.

Regan

February 17, 2005

Re: Possible rewrite of array operation spec

Posted by Norbert Nemec
in reply to Stewart Gordon

Norbert Nemec

Posted in reply to Stewart Gordon

One more detail: it should be clear, that the order of evaluation is not defined and that the temporary array is not guaranteed to be created. Any code that depends on the order of evaluation or the existance of temporaries is to be considered erraneous, like:

	a[1:9] = a[0:8]+1;

(The compiler may not always be able to reliably detect such errors. Maybe a case for warnings?)

If the order is unnecessarily defined in the specs, this might seriously limit the optimizability of the code. Currently, the wording "is equivalent to" sounds very dangerous in that respect.

Ciao,
Norbert



Stewart Gordon schrieb:
> This'll probably get people asking the prospect of array operations for 1.0 to be resurrected, but still....
> 
> Here is a possible specification for array operations that I feel is better-defined than the one in the current out-of-date spec.  Of course, there are still some open questions, which I've put at the bottom.
> 
> 
> Array operations
> ----------------
> Arithmetic and bitwise operators are defined on array operands.  An expression involving an array evaluates to a new array in which the operator has been applied to the elements of the operands in turn.
> 
> In essence, when an expression contains more than one array operation, a new array is created to hold the result of each operation.  However, a quality implementation will optimize the evaluation of the expression to eliminate temporaries where possible.
> 
> Unary operations
> ~~~~~~~~~~~~~~~~
> For the unary operators +, - and ~, the expression evaluates to a new array containing the result of applying the operator to each element. For example, with the declaration
> 
>     int[] x, y;
> 
> then the statement
> 
>     y = -x;
> 
> is simply equivalent to
> 
>     y = new int[x.length];
>     for (int i = 0; i < y.length; i++) {
>         y[i] = -x[i];
>     }
> 
> Binary operations
> ~~~~~~~~~~~~~~~~~
> The binary operations supported are +, -, *, /, %, &, |, ^, <<, >> and >>>.
> 
> If the two arrays are of the same dimension and of compatible types, then the expression evaluates to a new array in which each element is the result of applying the operator to corresponding elements of the operands.  For example, with the declarations
> 
>     int[] x, y, z;
> 
> the statement
> 
>     z = x + y;
> 
> is equivalent to
> 
>     z = new int[x.length];
>     for (int i = 0; i < z.length; i++) {
>         z[i] = x[i] + y[i];
>     }
> 
> Both operands must be of the same length.  If they are not, an ArrayBoundsError is thrown.
> 
> For higher dimensions, this definition is applied recursively.  For example, with
> 
>     int[][] x, y, z;
> 
> the statement
> 
>     z = x * y;
> 
> is equivalent to
> 
>     z = new int[x.length];
>     for (int i = 0; i < z.length; i++) {
>         z[i] = x[i] * y[i];
>     }
> 
> which is in turn equivalent to
> 
>     z = new int[x.length];
>     for (int i = 0; i < z.length; i++) {
>         z[i] = new int[x[i].length];
>         for (int j = 0; j < z[i].length; j++) {
>             z[i][j] = x[i][j] * y[i][j];
>         }
>     }
> 
> If the operands do not match in dimension, then the operator is applied to each element of the higher-dimension operation with the whole of the lower-dimension one.  For example, with
> 
>     int[] x, z;
>     int y;
> 
> the statement
> 
>     z = x - y;
> 
> is equivalent to
> 
>     z = new int[x.length];
>     for (int i = 0; i < z.length; i++) {
>         z[i] = x[i] - y;
>     }
> 
> Similarly,
> 
>     z = y - x;
> 
> is equivalent to
> 
>     z = new int[x.length];
>     for (int i = 0; i < z.length; i++) {
>         z[i] = y - x[i];
>     }
> 
> This definition is applied recursively if the dimensions differ by two or more.
> 
> Assignment operations
> ~~~~~~~~~~~~~~~~~~~~~
> When x is an array, the assignment
> 
>     x op= y;
> 
> is taken as equivalent to
> 
>     x = x op y;
> 
> whether y is an array of matching dimension, an array of lower dimension or a scalar.  Thus the operation creates a new array and assigns it to x.  If a sliced lvalue is used, the array is modified in place, so that
> 
>     x[] op= y;
> 
> is equivalent to
> 
>     x[] = x[] op y;
> 
> The preincrement and predecrement operators are handled in the same way.
> 
> User-defined types
> ~~~~~~~~~~~~~~~~~~
> A class, struct or union type may have operators overloaded with array types as parameters.  To avoid conflicts between overloaded operators and array operations, binary operations involving both array and user-defined types are resolved as follows:
> 
> 1. The normal operator overloading rules are applied.
> 2. If no match is found, the array operation rules are applied until both operands are reduced to scalar type; operator overloading rules are then applied to the result.
> 3. If the expression still does not resolve, it is an error.
> 
> 
> Open questions
> ~~~~~~~~~~~~~~
> Should postincrement and postdecrement be allowed?  How should they be handled?
> 
> Should we generalise the concept to function calls?  If so, I guess that overload resolution would work in much the same way as for operations on user-defined types.
> 
> If we do allow it on function calls, should we allow it to work on functions of three or more parameters?  In this case, the highest-dimension argument would be reduced to the dimension of the second highest, and then these two reduced together to match the third highest, and so on.
> 
> Of course, these questions raise one more: how easy or hard would these ideas be to implement?
> 
> 
> Any thoughts?
> 
> Stewart.
>

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation