View mode: basic / threaded / horizontal-split · Log in · Help
December 12, 2012
Re: OT (partially): about promotion of integers
On 12/12/2012 10:35 PM, Walter Bright wrote:
> On 12/12/2012 12:01 PM, Timon Gehr wrote:
>> That is certainly fixable. It is a mere QOI issue.
>
> When you have a language that fundamentally disallows mutation,

It does not.

> some algorithms are doomed to be slower.

Here's a (real) quicksort:
http://stackoverflow.com/questions/5268156/how-do-you-do-an-in-place-quicksort-in-haskell

> I asked Erik Maijer, one of the developers of
> Haskell, if the implementation does mutation "under the hood" to make
> things go faster.

"under the hood", obviously there must be mutation as this is how the 
machine works.

>  He assured me that it does not, that it follows the
> "no mutation" all the way.
>

Maybe he misunderstood. i.e. DMD does not do this to immutable data 
either. eg. Control.Monad.ST allows in-place state mutation of data 
types eg. from Data.STRef and Data.Array.ST. Such operations are 
sequenced and crosstalk between multiple such 'threads' is excluded by 
the type system, as long as only safe operations are used.

It is somewhat similar to (the still quite broken) 'pure' in D, but 
stronger. (e.g. it is possible to pass mutable references into the rough 
equivalent of 'strongly pure' code, but that code won't be able to read 
their values, the references can appear as part of the return type, and 
the caller will be able to access them again -- Done using basically 
nothing but parametric polymorphism, which D lacks.)


Eg:

> runST $ do           -- ()pure{
    x <- newSTRef 0    -- auto x = 0;
    writeSTRef x 2     -- x = 2; // mutate x
    y <- readSTRef x   -- auto y = x;
    writeSTRef x 3     -- x = 3; // mutate x
    z <- readSTRef x   -- auto z = x;
    return (y,z)       -- return tuple(y,z);}();
(2,3)                  -- tuple(2,3)


This paper describes how this is implemented in GHC (in-place mutation)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.3299

The only reason I can see why this is not as fast as D is implementation 
simplicity on the compiler side.

Here is some of the library code. It makes use of primitives (intrinsics):
http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/GHC-ST.html#ST

>
>> I think the factor GHC/DMD cannot be more than about 2 or 3 for roughly
>> equivalently written imperative code.
>
> A factor of 2 or 3 is make or break for a large class of programs.
>
> Consider running a server farm. If you can make your code 5% faster, you
> need 5% fewer servers. That translates into millions of dollars.

Provided the code is correct.
December 12, 2012
Re: OT (partially): about promotion of integers
On 12/12/2012 10:25 PM, Walter Bright wrote:
> On 12/12/2012 4:51 AM, Araq wrote:
>> ...
>> So how does D improve on C's model? If signed integers are required to
>> wrap
>> around in D (no undefined behaviour), you also prevent some otherwise
>> possible
>> optimizations (there is a reason it's still undefined behaviour in C).
>
> D requires 2's complement arithmetic, it does not support 1's complement
> as C does.

I think what he is talking about is that in C, if after a few steps of 
inlining and constant propagation you end up with something like:

int x;
// ...
if(x>x+1) {
  // lots and lots of code
}else return 0;

Then a C compiler will assume that the addition does not overflow and 
reduce the code to 'return 0;', whereas a D compiler will not apply this 
optimization as it might change the semantics of valid D programs.
December 12, 2012
Re: OT (partially): about promotion of integers
On 12/12/2012 3:23 PM, Timon Gehr wrote:
> It is somewhat similar to (the still quite broken) 'pure' in D,

Broken how?

> Provided the code is correct.

No language or compiler can prove code correct.
December 12, 2012
Re: OT (partially): about promotion of integers
On 12/12/2012 3:23 PM, Timon Gehr wrote:
> On 12/12/2012 10:35 PM, Walter Bright wrote:
>> some algorithms are doomed to be slower.
>
> Here's a (real) quicksort:
> http://stackoverflow.com/questions/5268156/how-do-you-do-an-in-place-quicksort-in-haskell

Ok, I'll bite.

Here's a program in Haskell and D that reads from standard in, splits into 
lines, sorts the lines, and writes the result the standard out:

==============================
import Data.List
import qualified Data.ByteString.Lazy.Char8 as L
main = L.interact $ L.unlines . sort . L.lines
==============================
import std.stdio;
import std.array;
import std.algorithm;
void main() {
  stdin.byLine(KeepTerminator.yes)
  map!(a => a.idup).
  array.
  sort.
  copy(
    stdout.lockingTextWriter());
}
===============================

The D version runs twice as fast as the Haskell one. Note that there's nothing 
heroic going on with the D version - it's straightforward dumb code.
December 12, 2012
Re: OT (partially): about promotion of integers
On 12/12/2012 3:29 PM, Timon Gehr wrote:
> On 12/12/2012 10:25 PM, Walter Bright wrote:
>> On 12/12/2012 4:51 AM, Araq wrote:
>>> ...
>>> So how does D improve on C's model? If signed integers are required to
>>> wrap
>>> around in D (no undefined behaviour), you also prevent some otherwise
>>> possible
>>> optimizations (there is a reason it's still undefined behaviour in C).
>>
>> D requires 2's complement arithmetic, it does not support 1's complement
>> as C does.
>
> I think what he is talking about is that in C, if after a few steps of inlining
> and constant propagation you end up with something like:
>
> int x;
> // ...
> if(x>x+1) {
>    // lots and lots of code
> }else return 0;
>
> Then a C compiler will assume that the addition does not overflow and reduce the
> code to 'return 0;', whereas a D compiler will not apply this optimization as it
> might change the semantics of valid D programs.

You're right in that the D optimizer does not take advantage of C "undefined 
behavior" in its optimizations. The article mentioned that many bugs were caused 
not by the actual wraparound behavior, but by aggressive C optimizers that 
interpreted "undefined behavior" as not having to account for those cases.
December 13, 2012
Re: OT (partially): about promotion of integers
On Wednesday, 12 December 2012 at 23:47:26 UTC, Walter Bright 
wrote:
> On 12/12/2012 3:23 PM, Timon Gehr wrote:
>> On 12/12/2012 10:35 PM, Walter Bright wrote:
>>> some algorithms are doomed to be slower.
>>
>> Here's a (real) quicksort:
>> http://stackoverflow.com/questions/5268156/how-do-you-do-an-in-place-quicksort-in-haskell
>
> Ok, I'll bite.
>
> Here's a program in Haskell and D that reads from standard in, 
> splits into lines, sorts the lines, and writes the result the 
> standard out:
>
> ==============================
> import Data.List
> import qualified Data.ByteString.Lazy.Char8 as L
> main = L.interact $ L.unlines . sort . L.lines
> ==============================
> import std.stdio;
> import std.array;
> import std.algorithm;
> void main() {
>   stdin.byLine(KeepTerminator.yes)
>   map!(a => a.idup).
>   array.
>   sort.
>   copy(
>     stdout.lockingTextWriter());
> }
> ===============================
>
> The D version runs twice as fast as the Haskell one. Note that 
> there's nothing heroic going on with the D version - it's 
> straightforward dumb code.

You'll find a lot of trap into that in D, some that can kill your 
perfs. For instance :

stdin.byLine(KeepTerminator.yes).
  map!(a => a.idup).
  filter!(a => a).
  array

And bazinga, you just doubled the number of memory allocation 
made.
December 13, 2012
Re: OT (partially): about promotion of integers
On Wednesday, 12 December 2012 at 06:19:14 UTC, Walter Bright 
wrote:
> You're not going to get performance with overflow checking even 
> with the best compiler support. For example, much arithmetic 
> code is generated for the x86 using addressing mode 
> instructions, like:
>
>     LEA EAX,16[8*EBX][ECX]  for 16+8*b+c
>
> The LEA instruction does no overflow checking. If you wanted 
> it, the best code would be:
>
>     MOV EAX,16
>     IMUL EBX,8
>     JO overflow
>     ADD EAX,EBX
>     JO overflow
>     ADD EAX,ECX
>     JO overflow
>
> Which is considerably less efficient. (The LEA is designed to 
> run in one cycle). Plus, often more registers are modified 
> which impedes good register allocation.

Thanks for the tip. Of course, I don't need and wouldn't use 
overflow checking all the time--in fact, since I've written a big 
system in a language that can't do overflow checking, you might 
say I "never need" overflow checking, in the same way that C 
programmers "never need" constructors, destructors, generics or 
exceptions as demonstrated by the fact that they can and do build 
large systems without them.

Still, the cost of overflow checking is a lot bigger, and 
requires a lot more cleverness, without compiler support. Hence I 
work harder to avoid the need for it.

> If you desire overflows to be programming errors, then you want 
> an abort, not a thrown exception. I am perplexed by your desire 
> to continue execution when overflows happen regularly.

I explicitly say I want to handle overflows quickly, and you 
conclude that I want an unrecoverable abort? WTF! No, I think 
overflows should be handled efficiently, and should be nonfatal.

Maybe it would be better to think in terms of the carry flag: it 
seems to me that a developer needs access to the carry flag in 
order to do 128+bit arithmetic efficiently. I have written code 
to "make do" without the carry flag, it's just more efficient if 
it can be used. So imagine an intrinsic that gets the value of 
the carry flag*--obviously it wouldn't throw an exception. I just 
think overflow should be handled the same way. If the developer 
wants to react to overflow with an exception/abort, fine, but it 
should not be mandatory as it is in .NET.

* Yes, I know you'd usually just ADC instead of retrieving the 
actual value of the flag, but sometimes you do want to just get 
the flag.

Usually when there is an overflow I just want to discard one data 
point and move on, or set the result to the maximum/minimum 
integer, possibly make a note in a log, but only occasionally do 
I want the debugger to break.
December 13, 2012
Re: OT (partially): about promotion of integers
On 12/13/2012 12:43 AM, Walter Bright wrote:
> On 12/12/2012 3:23 PM, Timon Gehr wrote:
>> It is somewhat similar to (the still quite broken) 'pure' in D,
>
> Broken how?
>

- There is no way to specify that a delegate is strongly pure without 
resorting to type deduction, because
  - Member functions/local functions are handled inconsistently.
    - Delegate types legally obtained from certain member functions are 
illegal to declare.
    - 'pure' means 'weakly pure' for member functions and 'strongly 
pure' for local functions. Therefore it means 'weakly pure' for 
delegates, as those can be obtained from both.
- Delegates may break the transitivity of immutable, and by extension, 
shared.

A good first step in fixing up immutable/shared would be to make 
everything that is annotated 'error' pass, and the line annotated 'ok' 
should fail:

import std.stdio;

struct S{
	int x;
	int foo()pure{
		return x++;
	}
	int bar()immutable pure{
		// return x++; // error
		return 2;
	}
}

int delegate()pure s(){
	int x;
	int foo()pure{
		// return x++; // error
		return 2;
	}
	/+int bar()immutable pure{ // error
		return 2;
	}+/
	return &foo;
}

void main(){
	S s;
	int delegate()pure dg = &s.foo;
	// int delegate()pure immutable dg2 = &s.bar; // error
	writeln(dg(), dg(), dg(), dg()); // 0123
	immutable int delegate()pure dg3 = dg; // ok
	writeln(dg3(), dg3(), dg3(), dg3()); // 4567
	// static assert(is(typeof(cast()dg3)==int delegate() immutable pure)); 
// error
	auto bar = &s.bar;
	pragma(msg, typeof(bar)); // "int delegate() immutable pure"
}


>> Provided the code is correct.
>
> No language or compiler can prove code correct.

Sometimes it can. Certainly a compiler can check a user-provided proof.
eg: http://coq.inria.fr/

A minor issue with proving code correct is of course that the proven 
specification might contain an error. The formal specification is often 
far more explicit and easier to verify manually than the program though.
December 13, 2012
Re: OT (partially): about promotion of integers
Walter Bright:

> Java makes no attempt to detect integer overflows.

There are various kinds of code. In some kinds of programs you 
want to be more sure that the result is correct, while other 
kinds of programs this need is less strong.


> I personally know people who write high speed trading software. 
> These people are concerned with nanosecond delays. They write 
> code in C++. They even hack on the compiler trying to get it to 
> generate faster code.
>
> It doesn't surprise me a bit that some people who operate 
> server farms use slow languages like Ruby, Python, and Perl on 
> them. This does cost them money for extra hardware. There are 
> always going to be businesses that have inefficient operations, 
> poorly allocated resources, and who leave a lot of money on the 
> table.

One "important" firm uses OcaML for high speed trading because 
it's both very fast (C++-class fast, faster than Java on certain 
kinds of code, if well used) and apparently quite safer to use 
than C/C++. And it's harder to find OcaML programmers than C++ 
ones.

Bye,
bearophile
December 13, 2012
Re: OT (partially): about promotion of integers
On 12/13/2012 12:47 AM, Walter Bright wrote:
> On 12/12/2012 3:23 PM, Timon Gehr wrote:
>> On 12/12/2012 10:35 PM, Walter Bright wrote:
>>> some algorithms are doomed to be slower.
>>
>> Here's a (real) quicksort:
>> http://stackoverflow.com/questions/5268156/how-do-you-do-an-in-place-quicksort-in-haskell
>>
>
> Ok, I'll bite.
>
> Here's a program in Haskell and D that reads from standard in, splits
> into lines, sorts the lines, and writes the result the standard out:
>
> ==============================
> import Data.List
> import qualified Data.ByteString.Lazy.Char8 as L
> main = L.interact $ L.unlines . sort . L.lines
> ==============================
> import std.stdio;
> import std.array;
> import std.algorithm;
> void main() {
>    stdin.byLine(KeepTerminator.yes)
>    map!(a => a.idup).
>    array.
>    sort.
>    copy(
>      stdout.lockingTextWriter());
> }
> ===============================
>
> The D version runs twice as fast as the Haskell one.

You are testing some standard library functions that are implemented in 
wildly different ways in both languages. They are not the same 
algorithms. For example, looking at just the first element of the sorted 
list will run in O(length.) in Haskell. If you build a sort function 
with that property in D, it will be slower as well. (if a rather 
Haskell-inspired implementation strategy is chosen, it will be a lot 
slower.) The key difference is that the D version operates in a strict 
fashion on arrays, while the Haskell version operates in a lazy fashion 
on lazy lists.

This just means that Data.List.sort is inadequate for high-performance 
code in case the entire contents of the list get looked at.

This is a good treatment of the matter:
http://stackoverflow.com/questions/11481675/using-vectors-for-performance-improvement-in-haskell/11485331#11485331

You are using Data.List.sort. The best implementations shown there seem 
to be around 5 times faster. I do not know how large the I/O overhead is.

Certainly, you can argue that the faster version should be in a 
prominent place in the standard library, but the fact that it is not 
does not indicate a fundamental performance problem in the Haskell 
language. Also, note that I am completely ignoring what kind of code is 
idiomatic in both languages. Fast Haskell code often looks similar to C 
code.

> Note that there's
> nothing heroic going on with the D version - it's straightforward dumb
> code.

A significant part of the D code is spent arranging data into the right 
layout, while the Haskell code does nothing like that.
4 5 6 7 8 9 10 11
Top | Discussion index | About this forum | D home