Jump to page: 1 2
Thread overview
Expression eval order, why unordered ?
Feb 06, 2003
Mike Wynn
Feb 06, 2003
Sean L. Palmer
Feb 07, 2003
Robert M. Münch
Feb 08, 2003
Scott Wood
Feb 08, 2003
Mike Wynn
Feb 09, 2003
Sean L. Palmer
Feb 14, 2003
Walter
Feb 09, 2003
Burton Radons
Feb 14, 2003
Walter
Feb 09, 2003
Burton Radons
Feb 09, 2003
Mike Wynn
Feb 13, 2003
Scott Wood
Feb 09, 2003
Sean L. Palmer
Feb 11, 2003
Burton Radons
Feb 11, 2003
Burton Radons
Feb 14, 2003
Walter
February 06, 2003
Walter,

I can see why you might (from the implementers point of veiw) has the
follwoing in the Docs
Evaluation Order
Unless otherwise specified, the implementation is free to evaluate the
components of an expression in any order. It is an error to depend on order
of evaluation when it is not specified. For example, the following are
illegal:
	i = ++i;
	c = a + (a = b);
	func(++i, ++i);

If the compiler can determine that the result of an expression is illegally dependent on the order of evaluation, it can issue an error (but is not required to). The ability to detect these kinds of errors is a quality of implementation issue.

have you read http://java.sun.com/docs/books/jls/first_edition/html/15.doc.html

the above is all valid Java

public class optest
{
 public static void func( int a, int b )
 {
  System.out.println("func(" + a + ", " + b + ")" );
 }

 public static void
 main( String[] argv )
 {
  int i = 1;
  int a, b, c;
  a = 10; b = 20; c = 30;
  System.out.println( "a:" + a + " b:" + b + " c:" + c + " i:" + i );
  i = ++i;
  c = a + (a = b);
  System.out.println( "a:" + a + " b:" + b + " c:" + c + " i:" + i );
  func(++i, ++i);
  System.out.println( "a:" + a + " b:" + b + " c:" + c + " i:" + i );
 }
}

which outputs
a:10 b:20 c:30 i:1
a:20 b:20 c:30 i:2
func(3, 4)
a:20 b:20 c:30 i:4

 i = ++i;
same as ++i i is assigned with the result of ++i (i+1)
 c = a + (a = b);
c is old a + b, a is set to b

 func(++i, ++i);
same as func( i+1, i+2); i+=2; or inc i; push i, inc i; push i; call func;
if `i` is visible to func then `i` must be updated before the call (if not,
who notices [exception] :)

this defined eval order for expressions is one of Java's features that I
personally love,
not that I use
  i = (a = b) + (b = c) + (c = i);
but if you put that in the above code just after the func call the result
is, and will always be:
a:20 b:30 c:4 i:54
it realy does make the programmers life easier, I believe undefined
behaviours are bad, things should either be defined or disallowed.

Mike.


February 06, 2003
Yes!

I'm a firm believer in what you just said. ("things should either be defined
or disallowed")

Think about it.  It makes the compiler vendors' lives a little easier. It makes every other programmer's life a living hell.

Vendors usually only have to worry about one platform.

Other programmers may have to worry about several, or using the same code on several vendors' compilers.

We need to be able to depend on the fact that our programs will always work.

Embedding subtle gotchas in the language (depending on order of evaluation between sequence points) just serves to make certain programs ill-formed, and certain programs well-formed, with no clear distinction.. usually the programmer cannot even tell that he made an ill-formed program because the compiler is not required to tell them about it.

I'm all for more things being defined behavior, and less things being undefined behavior.  Undefined behavior gives implementors lots of rope with which to hang the other programmers.

Sean

"Mike Wynn" <mike.wynn@l8night.co.uk> wrote in message news:b1u8gn$206n$1@digitaldaemon.com...
> Walter,
>
> I can see why you might (from the implementers point of veiw) has the
> follwoing in the Docs
> Evaluation Order
> Unless otherwise specified, the implementation is free to evaluate the
> components of an expression in any order. It is an error to depend on
order
> of evaluation when it is not specified. For example, the following are
> illegal:
> i = ++i;
> c = a + (a = b);
> func(++i, ++i);
>
> If the compiler can determine that the result of an expression is
illegally
> dependent on the order of evaluation, it can issue an error (but is not required to). The ability to detect these kinds of errors is a quality of implementation issue.
>
> have you read http://java.sun.com/docs/books/jls/first_edition/html/15.doc.html
>
> the above is all valid Java
>
> public class optest
> {
>  public static void func( int a, int b )
>  {
>   System.out.println("func(" + a + ", " + b + ")" );
>  }
>
>  public static void
>  main( String[] argv )
>  {
>   int i = 1;
>   int a, b, c;
>   a = 10; b = 20; c = 30;
>   System.out.println( "a:" + a + " b:" + b + " c:" + c + " i:" + i );
>   i = ++i;
>   c = a + (a = b);
>   System.out.println( "a:" + a + " b:" + b + " c:" + c + " i:" + i );
>   func(++i, ++i);
>   System.out.println( "a:" + a + " b:" + b + " c:" + c + " i:" + i );
>  }
> }
>
> which outputs
> a:10 b:20 c:30 i:1
> a:20 b:20 c:30 i:2
> func(3, 4)
> a:20 b:20 c:30 i:4
>
>  i = ++i;
> same as ++i i is assigned with the result of ++i (i+1)
>  c = a + (a = b);
> c is old a + b, a is set to b
>
>  func(++i, ++i);
> same as func( i+1, i+2); i+=2; or inc i; push i, inc i; push i; call func;
> if `i` is visible to func then `i` must be updated before the call (if
not,
> who notices [exception] :)
>
> this defined eval order for expressions is one of Java's features that I
> personally love,
> not that I use
>   i = (a = b) + (b = c) + (c = i);
> but if you put that in the above code just after the func call the result
> is, and will always be:
> a:20 b:30 c:4 i:54
> it realy does make the programmers life easier, I believe undefined
> behaviours are bad, things should either be defined or disallowed.
>
> Mike.


February 07, 2003
"Sean L. Palmer" <seanpalmer@directvinternet.com> schrieb im Newsbeitrag news:b1ua7i$2158$1@digitaldaemon.com...

> I'm a firm believer in what you just said. ("things should either be
defined
> or disallowed")
>
> Think about it.  It makes the compiler vendors' lives a little easier. It makes every other programmer's life a living hell.

Hi, I only can second this!

I never understood what this "the implementation is free..." is good for. To me it's a sign that the language specification people didn't finished their task... Robert


February 08, 2003
Robert M. Münch <robert.muench@robertmuench.de> wrote:
> "Sean L. Palmer" <seanpalmer@directvinternet.com> schrieb im Newsbeitrag news:b1ua7i$2158$1@digitaldaemon.com...
> 
>> I'm a firm believer in what you just said. ("things should either be
> defined
>> or disallowed")
>>
>> Think about it.  It makes the compiler vendors' lives a little easier. It makes every other programmer's life a living hell.
> 
> Hi, I only can second this!
> 
> I never understood what this "the implementation is free..." is good for. To me it's a sign that the language specification people didn't finished their task... Robert

What it's good for is optimization.  If the compiler has one and only one way to compile any given piece of code, it can't do much to speed it up.  Obviously, there must be some restrictions on the compiler's behavior, in order to make it possible to write programs with fixed semantics.  However, that does not mean that any random string of tokens that happens to parse must have the exact same semantic behavior on every implementation imaginable.

For example, take pointer/array aliasing.  If you don't have a way to make the results undefined when you have multiple arrays that overlap, there's no way to parallelize certain loops, as the compiler generally can't figure out for itself whether the arrays overlap (especially if they're provided by another function/module/whatever the optimization scope of the compiler is).  C's inability to loosen the semantics of overlapping arrays led to a significant performance degradation compared to FORTRAN, especially on vector machines.  This led to the adoption of the "restrict" keyword in C99.

With modern CPUs not only having significant superscalar capabilities, but also explicit SIMD instructions, the language must give the compiler sufficient room to extract parallelism from the code without having to worry about what it might do to broken code (and any code that depends on the order of subexpression evaluation is broken, even if the language provides well-defined semantics; people have to read the stuff too...).  This will become even more important with chips like Itanium, where all parallelism must be explicitly specified by the compiler.

BTW, I didn't see anything in the D spec about pointer/array aliasing (though I may have missed it), apart from a prohibition on overlapping array slice copies; what are D's semantics for this?

-Scott
February 08, 2003
"Scott Wood" <scott@buserror.net> wrote in message news:slrnb4afi7.2gm.scott@ti.buserror.net...
> Robert M. Münch <robert.muench@robertmuench.de> wrote:
> > "Sean L. Palmer" <seanpalmer@directvinternet.com> schrieb im Newsbeitrag news:b1ua7i$2158$1@digitaldaemon.com...
> >
> >> I'm a firm believer in what you just said. ("things should either be
> > defined
> >> or disallowed")
> >>
> >> Think about it.  It makes the compiler vendors' lives a little easier. It makes every other programmer's life a living hell.
> >
> > Hi, I only can second this!
> >
> > I never understood what this "the implementation is free..." is good
for. To
> > me it's a sign that the language specification people didn't finished
their
> > task... Robert
>
> What it's good for is optimization.  If the compiler has one and only one way to compile any given piece of code, it can't do much to speed it up.  Obviously, there must be some restrictions on the compiler's behavior, in order to make it possible to write programs with fixed semantics.  However, that does not mean that any random string of tokens that happens to parse must have the exact same semantic behavior on every implementation imaginable.

This is not right, what we are asking for is that a given section of code
ALWAYS gives the save result.
how the compiler achieves that is up to the compiler.

consider (give that java/C# style expression order where implemented)
int func( int * ptr ) { return *ptr++ + *ptr++ + foo( ptr ); }

this could be compiled as
int tmp = *ptr; ptr ++;
tmp += *ptr; ptr++;
int rv = func( ptr );
return tmp + rv;

or

int tmp = ptr[0] + ptr[1];
return tmp + foo( ptr+2 );

they are semantically the same. one is faster than the other.

lots of chance to optimise. even though the eval order is conceptually fixed.

Mike.



February 09, 2003
I realize that the compiler must be free to reorder certain things for performance, but in cases where it's only a matter of implementation convenience, I'd rather give the programmer that convenience than the implementor.  In general, the implementation is free to do whatever it wants so long as it ends up with the correct end result.

I believe any random string of input for which behavior is defined should always give the same result, on any platform, given the same external circumstances (disk files, external calls behaving the same way, etc).  In particular I think "i = ++i + ++i++;" should always evaluate to the same thing, on any platform.  Another situation which I encounter frequently is something like:

vector myvec(file.ReadFloat(), file.ReadFloat(), file.ReadFloat());

which is undefined behavior since you aren't guaranteed what order it will evaluate the function arguments to the vector constructor.

Anyway I'll live with it.  Not a deal-breaker.  ;)

Array aliasing is a big optimization issue.  At least in C you can tell the compiler to assume no aliasing.  Once you find all the places your code has aliasing, and remove it, the code generally runs much much faster.

This is why I like D's (to-be-implemented-in-future) array operation feature, where operations can be performed on entire arrays.  This doesn't guarantee any order of operation, and is implicitly parallelizable.  The more we get away from pointers and start telling the compiler what we want the end results to be, in broad terms, rather than telling it every little detail of how to perform the operation, the better the compiler will be able to generate optimal code.  The restrictions on array slice overlap behavior give the compiler the option of parallelizing slice operations.

When you write a for loop, you're building in dependence on the order of traversal, which is bad for parallelism.  The compiler would have to prove that no iteration depends on the results of any other iteration before it can parallelize the loop.

Sean

"Scott Wood" <scott@buserror.net> wrote in message news:slrnb4afi7.2gm.scott@ti.buserror.net...
> Robert M. Münch <robert.muench@robertmuench.de> wrote:
> > "Sean L. Palmer" <seanpalmer@directvinternet.com> schrieb im Newsbeitrag news:b1ua7i$2158$1@digitaldaemon.com...
> >
> >> I'm a firm believer in what you just said. ("things should either be
> > defined
> >> or disallowed")
> >>
> >> Think about it.  It makes the compiler vendors' lives a little easier. It makes every other programmer's life a living hell.
> >
> > Hi, I only can second this!
> >
> > I never understood what this "the implementation is free..." is good
for. To
> > me it's a sign that the language specification people didn't finished
their
> > task... Robert
>
> What it's good for is optimization.  If the compiler has one and only one way to compile any given piece of code, it can't do much to speed it up.  Obviously, there must be some restrictions on the compiler's behavior, in order to make it possible to write programs with fixed semantics.  However, that does not mean that any random string of tokens that happens to parse must have the exact same semantic behavior on every implementation imaginable.
>
> For example, take pointer/array aliasing.  If you don't have a way to make the results undefined when you have multiple arrays that overlap, there's no way to parallelize certain loops, as the compiler generally can't figure out for itself whether the arrays overlap (especially if they're provided by another function/module/whatever the optimization scope of the compiler is).  C's inability to loosen the semantics of overlapping arrays led to a significant performance degradation compared to FORTRAN, especially on vector machines.  This led to the adoption of the "restrict" keyword in C99.
>
> With modern CPUs not only having significant superscalar capabilities, but also explicit SIMD instructions, the language must give the compiler sufficient room to extract parallelism from the code without having to worry about what it might do to broken code (and any code that depends on the order of subexpression evaluation is broken, even if the language provides well-defined semantics; people have to read the stuff too...).  This will become even more important with chips like Itanium, where all parallelism must be explicitly specified by the compiler.
>
> BTW, I didn't see anything in the D spec about pointer/array aliasing (though I may have missed it), apart from a prohibition on overlapping array slice copies; what are D's semantics for this?
>
> -Scott


February 09, 2003
Scott Wood wrote:
> BTW, I didn't see anything in the D spec about pointer/array aliasing
> (though I may have missed it), apart from a prohibition on
> overlapping array slice copies; what are D's semantics for this?

There's no support at this point.  One thing we could do is figure out which sets of arrays being operated on could have a much faster loop if they weren't aliased.  Compile both versions, and select the SIMD version at runtime if they don't overlap.  This could use profiling information to see whether it was beneficial, whether other sets of arrays used in the loop were unaliased instead of the ones we optimised, and whether we always got small arrays so shouldn't do any SIMD or loop optimisation.

I don't know what Walter's thoughts on the matter are, though; I hope he weighs in, as I've wanted to know what he's thought of "restrict" for some time now.

February 09, 2003
Sean L. Palmer wrote:
> I'm a firm believer in what you just said. ("things should either be defined
> or disallowed")

So then you agree with the way D is currently setup; code which is undefined is invalid code and should not compile.  When Walter writes that identifying these errors is a "quality of implementation issue", he's not meaning noticing that "func (c ++, ++ c);" is shit code (which is easy), he's meaning that "*c = d ++;" is conditionally undefined. Incredibly complex compilers might be able to recognise it's undefined even though there's no casual way to do so, or could make runtime checks; asking all compilers to do this would be unreasonable and a good way to ensure that there are no conforming implementations of the language.

There should be a level of mandatory errors here.  If a variable in one sequence range is used for both reading and writing, there must be an error, just as surely as if there was a syntax error.

> Think about it.  It makes the compiler vendors' lives a little easier.
> It makes every other programmer's life a living hell.

I don't think that's why we have the undefined laws.  I think they're there because code like "func (++ c, ++ c);" is literally nonsensical: it's trying to perform two operations on the same piece of data simultaneously.  There's eight possible handlings of that code, none of which are more right than the other.  Hence it's undefined.

I'm not going to defend K&R's decision, regardless of why they made it (I'll ask Ritchie).  It was the right one to say that this code was bad, it was the wrong one to say that this code was C.  It was before lint was a part of the compiler, after all.  Now that those dark days are behind us, the language should reflect the current art.

February 09, 2003
> I don't think that's why we have the undefined laws.  I think they're there because code like "func (++ c, ++ c);" is literally nonsensical: it's trying to perform two operations on the same piece of data simultaneously.  There's eight possible handlings of that code, none of which are more right than the other.  Hence it's undefined.

that veiw depends on what you think operators do

with this same logic the code `func() + other()` is also undefined `other()`
may be called before `func()`
(these is nothing in the docs to say the order is defined, thus it is
undefined (that is in the docs))

a() + b() * c() the only thing defined is b() * c() is performed before the
+ and both function calls before *
but c() may be called before b();

Java and I belive C# take the human approach (well infact it's an in order
walk of the basic ast).

In know , op and , in func call are different because c pushes param in
reverse order.
I don't belive that is enough of a reason to make the eval order undefined.
the , operator when outside a func call does force order, it is defined as
such.

basically to write robust code that will perform correctly no matter which
compiler is used, the programmer has to break the program up into simple
statements
 func() * func2(); has to become tmp = func(); tmp *= func2();

I believe it actually makes code harder to optimise, because you can not write what you want done, you have to write so that it compiles to the right code.

It may be my simple view of the world, and use of single cpu sisd machines
that leads me to view all programs as a set of single threads of execution,
I do not see what is gained by thinking of  func( c++, c++ ) as two
concurrent increments it has not value so why consider it ? the whole rest
of the program has a defined order, why treat expressions so differently.
the whole point about eval order is - when there is a dependancy order is
defined.
func( c++, b++ ) is conceptually concurrent (no dependancy) even if eval
order is defined.

I find it natural to read and understand the eval order of Java code, and can not understand why people are adverse to such a simple concept especially as it is benificial, unordered eval is determental, by its nature you can not know what it happening, that is never good.





February 09, 2003
I'm ok with that too.  If it really is shit code, don't compile it.  Make it an error.  Otherwise some wise guy will use it and wonder how come it won't port to compiler X if it works on compiler Y.

If it causes that many problems, or is inherently nonportable, people shouldn't be using it anyway.  It's a good idea to formalize that opinion.

What I don't agree with is that "func (++ c, ++ c);" is literally nonsensical.  If there is a defined order of operations on evaluation of function arguments, it is perfectly rational and useful code.  It's the fact that C doesn't specify order of evaluation of function arguments that makes it junk.  You're thinking that a function call is an atomic operation, but you know better than most that there are a lot of steps involved in making a function call.  Adding a sequence point between each argument or otherwise specifying evaluation order could be useful, if nothing else it would aid portability in a language which encourages side-effects.  I understand why they left it unspecified, but don't necessarily agree with it.  There aren't any machines anymore that only have one register, or that can't address relative the stack pointer, so on modern cpu's it's really a non-issue.  May as well specify evaluation order.

Sean

"Burton Radons" <loth@users.sourceforge.net> wrote in message news:b24q49$1p1f$1@digitaldaemon.com...
> Sean L. Palmer wrote:
> > I'm a firm believer in what you just said. ("things should either be
defined
> > or disallowed")
>
> So then you agree with the way D is currently setup; code which is undefined is invalid code and should not compile.  When Walter writes that identifying these errors is a "quality of implementation issue", he's not meaning noticing that "func (c ++, ++ c);" is shit code (which is easy), he's meaning that "*c = d ++;" is conditionally undefined. Incredibly complex compilers might be able to recognise it's undefined even though there's no casual way to do so, or could make runtime checks; asking all compilers to do this would be unreasonable and a good way to ensure that there are no conforming implementations of the
language.
>
> There should be a level of mandatory errors here.  If a variable in one sequence range is used for both reading and writing, there must be an error, just as surely as if there was a syntax error.
>
> > Think about it.  It makes the compiler vendors' lives a little easier. It makes every other programmer's life a living hell.
>
> I don't think that's why we have the undefined laws.  I think they're there because code like "func (++ c, ++ c);" is literally nonsensical: it's trying to perform two operations on the same piece of data simultaneously.  There's eight possible handlings of that code, none of which are more right than the other.  Hence it's undefined.
>
> I'm not going to defend K&R's decision, regardless of why they made it (I'll ask Ritchie).  It was the right one to say that this code was bad, it was the wrong one to say that this code was C.  It was before lint was a part of the compiler, after all.  Now that those dark days are behind us, the language should reflect the current art.


« First   ‹ Prev
1 2