December 12, 2012
On 12/12/2012 10:35 PM, Walter Bright wrote:
> On 12/12/2012 12:01 PM, Timon Gehr wrote:
>> That is certainly fixable. It is a mere QOI issue.
>
> When you have a language that fundamentally disallows mutation,

It does not.

> some algorithms are doomed to be slower.

Here's a (real) quicksort:
http://stackoverflow.com/questions/5268156/how-do-you-do-an-in-place-quicksort-in-haskell

> I asked Erik Maijer, one of the developers of
> Haskell, if the implementation does mutation "under the hood" to make
> things go faster.

"under the hood", obviously there must be mutation as this is how the machine works.

>  He assured me that it does not, that it follows the
> "no mutation" all the way.
>

Maybe he misunderstood. i.e. DMD does not do this to immutable data either. eg. Control.Monad.ST allows in-place state mutation of data types eg. from Data.STRef and Data.Array.ST. Such operations are sequenced and crosstalk between multiple such 'threads' is excluded by the type system, as long as only safe operations are used.

It is somewhat similar to (the still quite broken) 'pure' in D, but stronger. (e.g. it is possible to pass mutable references into the rough equivalent of 'strongly pure' code, but that code won't be able to read their values, the references can appear as part of the return type, and the caller will be able to access them again -- Done using basically nothing but parametric polymorphism, which D lacks.)


Eg:

> runST $ do           -- ()pure{
    x <- newSTRef 0    -- auto x = 0;
    writeSTRef x 2     -- x = 2; // mutate x
    y <- readSTRef x   -- auto y = x;
    writeSTRef x 3     -- x = 3; // mutate x
    z <- readSTRef x   -- auto z = x;
    return (y,z)       -- return tuple(y,z);}();
(2,3)                  -- tuple(2,3)


This paper describes how this is implemented in GHC (in-place mutation)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.3299

The only reason I can see why this is not as fast as D is implementation simplicity on the compiler side.

Here is some of the library code. It makes use of primitives (intrinsics):
http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/GHC-ST.html#ST

>
>> I think the factor GHC/DMD cannot be more than about 2 or 3 for roughly
>> equivalently written imperative code.
>
> A factor of 2 or 3 is make or break for a large class of programs.
>
> Consider running a server farm. If you can make your code 5% faster, you
> need 5% fewer servers. That translates into millions of dollars.

Provided the code is correct.
December 12, 2012
On 12/12/2012 10:25 PM, Walter Bright wrote:
> On 12/12/2012 4:51 AM, Araq wrote:
>> ...
>> So how does D improve on C's model? If signed integers are required to
>> wrap
>> around in D (no undefined behaviour), you also prevent some otherwise
>> possible
>> optimizations (there is a reason it's still undefined behaviour in C).
>
> D requires 2's complement arithmetic, it does not support 1's complement
> as C does.

I think what he is talking about is that in C, if after a few steps of inlining and constant propagation you end up with something like:

int x;
// ...
if(x>x+1) {
  // lots and lots of code
}else return 0;

Then a C compiler will assume that the addition does not overflow and reduce the code to 'return 0;', whereas a D compiler will not apply this optimization as it might change the semantics of valid D programs.
December 12, 2012
On 12/12/2012 3:23 PM, Timon Gehr wrote:
> It is somewhat similar to (the still quite broken) 'pure' in D,

Broken how?

> Provided the code is correct.

No language or compiler can prove code correct.
December 12, 2012
On 12/12/2012 3:23 PM, Timon Gehr wrote:
> On 12/12/2012 10:35 PM, Walter Bright wrote:
>> some algorithms are doomed to be slower.
>
> Here's a (real) quicksort:
> http://stackoverflow.com/questions/5268156/how-do-you-do-an-in-place-quicksort-in-haskell

Ok, I'll bite.

Here's a program in Haskell and D that reads from standard in, splits into lines, sorts the lines, and writes the result the standard out:

==============================
import Data.List
import qualified Data.ByteString.Lazy.Char8 as L
main = L.interact $ L.unlines . sort . L.lines
==============================
import std.stdio;
import std.array;
import std.algorithm;
void main() {
  stdin.byLine(KeepTerminator.yes)
  map!(a => a.idup).
  array.
  sort.
  copy(
    stdout.lockingTextWriter());
}
===============================

The D version runs twice as fast as the Haskell one. Note that there's nothing heroic going on with the D version - it's straightforward dumb code.
December 12, 2012
On 12/12/2012 3:29 PM, Timon Gehr wrote:
> On 12/12/2012 10:25 PM, Walter Bright wrote:
>> On 12/12/2012 4:51 AM, Araq wrote:
>>> ...
>>> So how does D improve on C's model? If signed integers are required to
>>> wrap
>>> around in D (no undefined behaviour), you also prevent some otherwise
>>> possible
>>> optimizations (there is a reason it's still undefined behaviour in C).
>>
>> D requires 2's complement arithmetic, it does not support 1's complement
>> as C does.
>
> I think what he is talking about is that in C, if after a few steps of inlining
> and constant propagation you end up with something like:
>
> int x;
> // ...
> if(x>x+1) {
>    // lots and lots of code
> }else return 0;
>
> Then a C compiler will assume that the addition does not overflow and reduce the
> code to 'return 0;', whereas a D compiler will not apply this optimization as it
> might change the semantics of valid D programs.

You're right in that the D optimizer does not take advantage of C "undefined behavior" in its optimizations. The article mentioned that many bugs were caused not by the actual wraparound behavior, but by aggressive C optimizers that interpreted "undefined behavior" as not having to account for those cases.
December 13, 2012
On Wednesday, 12 December 2012 at 23:47:26 UTC, Walter Bright wrote:
> On 12/12/2012 3:23 PM, Timon Gehr wrote:
>> On 12/12/2012 10:35 PM, Walter Bright wrote:
>>> some algorithms are doomed to be slower.
>>
>> Here's a (real) quicksort:
>> http://stackoverflow.com/questions/5268156/how-do-you-do-an-in-place-quicksort-in-haskell
>
> Ok, I'll bite.
>
> Here's a program in Haskell and D that reads from standard in, splits into lines, sorts the lines, and writes the result the standard out:
>
> ==============================
> import Data.List
> import qualified Data.ByteString.Lazy.Char8 as L
> main = L.interact $ L.unlines . sort . L.lines
> ==============================
> import std.stdio;
> import std.array;
> import std.algorithm;
> void main() {
>   stdin.byLine(KeepTerminator.yes)
>   map!(a => a.idup).
>   array.
>   sort.
>   copy(
>     stdout.lockingTextWriter());
> }
> ===============================
>
> The D version runs twice as fast as the Haskell one. Note that there's nothing heroic going on with the D version - it's straightforward dumb code.

You'll find a lot of trap into that in D, some that can kill your perfs. For instance :

stdin.byLine(KeepTerminator.yes).
  map!(a => a.idup).
  filter!(a => a).
  array

And bazinga, you just doubled the number of memory allocation made.
December 13, 2012
On Wednesday, 12 December 2012 at 06:19:14 UTC, Walter Bright wrote:
> You're not going to get performance with overflow checking even with the best compiler support. For example, much arithmetic code is generated for the x86 using addressing mode instructions, like:
>
>     LEA EAX,16[8*EBX][ECX]  for 16+8*b+c
>
> The LEA instruction does no overflow checking. If you wanted it, the best code would be:
>
>     MOV EAX,16
>     IMUL EBX,8
>     JO overflow
>     ADD EAX,EBX
>     JO overflow
>     ADD EAX,ECX
>     JO overflow
>
> Which is considerably less efficient. (The LEA is designed to run in one cycle). Plus, often more registers are modified which impedes good register allocation.

Thanks for the tip. Of course, I don't need and wouldn't use overflow checking all the time--in fact, since I've written a big system in a language that can't do overflow checking, you might say I "never need" overflow checking, in the same way that C programmers "never need" constructors, destructors, generics or exceptions as demonstrated by the fact that they can and do build large systems without them.

Still, the cost of overflow checking is a lot bigger, and requires a lot more cleverness, without compiler support. Hence I work harder to avoid the need for it.

> If you desire overflows to be programming errors, then you want an abort, not a thrown exception. I am perplexed by your desire to continue execution when overflows happen regularly.

I explicitly say I want to handle overflows quickly, and you conclude that I want an unrecoverable abort? WTF! No, I think overflows should be handled efficiently, and should be nonfatal.

Maybe it would be better to think in terms of the carry flag: it seems to me that a developer needs access to the carry flag in order to do 128+bit arithmetic efficiently. I have written code to "make do" without the carry flag, it's just more efficient if it can be used. So imagine an intrinsic that gets the value of the carry flag*--obviously it wouldn't throw an exception. I just think overflow should be handled the same way. If the developer wants to react to overflow with an exception/abort, fine, but it should not be mandatory as it is in .NET.

* Yes, I know you'd usually just ADC instead of retrieving the actual value of the flag, but sometimes you do want to just get the flag.

Usually when there is an overflow I just want to discard one data point and move on, or set the result to the maximum/minimum integer, possibly make a note in a log, but only occasionally do I want the debugger to break.
December 13, 2012
On 12/13/2012 12:43 AM, Walter Bright wrote:
> On 12/12/2012 3:23 PM, Timon Gehr wrote:
>> It is somewhat similar to (the still quite broken) 'pure' in D,
>
> Broken how?
>

- There is no way to specify that a delegate is strongly pure without resorting to type deduction, because
  - Member functions/local functions are handled inconsistently.
    - Delegate types legally obtained from certain member functions are illegal to declare.
    - 'pure' means 'weakly pure' for member functions and 'strongly pure' for local functions. Therefore it means 'weakly pure' for delegates, as those can be obtained from both.
- Delegates may break the transitivity of immutable, and by extension, shared.

A good first step in fixing up immutable/shared would be to make everything that is annotated 'error' pass, and the line annotated 'ok' should fail:

import std.stdio;

struct S{
	int x;
	int foo()pure{
		return x++;
	}
	int bar()immutable pure{
		// return x++; // error
		return 2;
	}
}

int delegate()pure s(){
	int x;
	int foo()pure{
		// return x++; // error
		return 2;
	}
	/+int bar()immutable pure{ // error
		return 2;
	}+/
	return &foo;
}

void main(){
	S s;
	int delegate()pure dg = &s.foo;
	// int delegate()pure immutable dg2 = &s.bar; // error
	writeln(dg(), dg(), dg(), dg()); // 0123
	immutable int delegate()pure dg3 = dg; // ok
	writeln(dg3(), dg3(), dg3(), dg3()); // 4567
	// static assert(is(typeof(cast()dg3)==int delegate() immutable pure)); // error
	auto bar = &s.bar;
	pragma(msg, typeof(bar)); // "int delegate() immutable pure"
}


>> Provided the code is correct.
>
> No language or compiler can prove code correct.

Sometimes it can. Certainly a compiler can check a user-provided proof.
eg: http://coq.inria.fr/

A minor issue with proving code correct is of course that the proven specification might contain an error. The formal specification is often far more explicit and easier to verify manually than the program though.
December 13, 2012
Walter Bright:

> Java makes no attempt to detect integer overflows.

There are various kinds of code. In some kinds of programs you want to be more sure that the result is correct, while other kinds of programs this need is less strong.


> I personally know people who write high speed trading software. These people are concerned with nanosecond delays. They write code in C++. They even hack on the compiler trying to get it to generate faster code.
>
> It doesn't surprise me a bit that some people who operate server farms use slow languages like Ruby, Python, and Perl on them. This does cost them money for extra hardware. There are always going to be businesses that have inefficient operations, poorly allocated resources, and who leave a lot of money on the table.

One "important" firm uses OcaML for high speed trading because it's both very fast (C++-class fast, faster than Java on certain kinds of code, if well used) and apparently quite safer to use than C/C++. And it's harder to find OcaML programmers than C++ ones.

Bye,
bearophile
December 13, 2012
On 12/13/2012 12:47 AM, Walter Bright wrote:
> On 12/12/2012 3:23 PM, Timon Gehr wrote:
>> On 12/12/2012 10:35 PM, Walter Bright wrote:
>>> some algorithms are doomed to be slower.
>>
>> Here's a (real) quicksort:
>> http://stackoverflow.com/questions/5268156/how-do-you-do-an-in-place-quicksort-in-haskell
>>
>
> Ok, I'll bite.
>
> Here's a program in Haskell and D that reads from standard in, splits
> into lines, sorts the lines, and writes the result the standard out:
>
> ==============================
> import Data.List
> import qualified Data.ByteString.Lazy.Char8 as L
> main = L.interact $ L.unlines . sort . L.lines
> ==============================
> import std.stdio;
> import std.array;
> import std.algorithm;
> void main() {
>    stdin.byLine(KeepTerminator.yes)
>    map!(a => a.idup).
>    array.
>    sort.
>    copy(
>      stdout.lockingTextWriter());
> }
> ===============================
>
> The D version runs twice as fast as the Haskell one.

You are testing some standard library functions that are implemented in wildly different ways in both languages. They are not the same algorithms. For example, looking at just the first element of the sorted list will run in O(length.) in Haskell. If you build a sort function with that property in D, it will be slower as well. (if a rather Haskell-inspired implementation strategy is chosen, it will be a lot slower.) The key difference is that the D version operates in a strict fashion on arrays, while the Haskell version operates in a lazy fashion on lazy lists.

This just means that Data.List.sort is inadequate for high-performance code in case the entire contents of the list get looked at.

This is a good treatment of the matter:
http://stackoverflow.com/questions/11481675/using-vectors-for-performance-improvement-in-haskell/11485331#11485331

You are using Data.List.sort. The best implementations shown there seem to be around 5 times faster. I do not know how large the I/O overhead is.

Certainly, you can argue that the faster version should be in a prominent place in the standard library, but the fact that it is not does not indicate a fundamental performance problem in the Haskell language. Also, note that I am completely ignoring what kind of code is idiomatic in both languages. Fast Haskell code often looks similar to C code.

> Note that there's
> nothing heroic going on with the D version - it's straightforward dumb
> code.

A significant part of the D code is spent arranging data into the right layout, while the Haskell code does nothing like that.