December 07, 2014
08-Dec-2014 00:36, John Colvin пишет:
> On Sunday, 7 December 2014 at 19:56:49 UTC, Dmitry Olshansky wrote:
>> 06-Dec-2014 18:33, H. S. Teoh via Digitalmars-d пишет:
>>> On Sat, Dec 06, 2014 at 03:26:08PM +0000, Russel Winder via
>>> Digitalmars-d wrote:
>>> [...]
>>>>>    primitive are passed by value; arrays and user defined types are
>>>>> passed by reference only (killing memory usage)
>>>>
>>>> Primitive types are scheduled for removal, leaving only reference
>>>> types.
>>> [...]
>>>
>>> Whoa. So they're basically going to rely on JIT to convert those boxed
>>> Integers into hardware ints for performance?
>>
>> With great success.
>>
>>> Sounds like I will never
>>> consider Java for computation-heavy tasks then...
>>
>> Interestingly working with JVM for the last 2 years the only problem
>> I've found is memory usage overhead of collections and non-trivial
>> objects. In my tests performance of simple numeric code was actually
>> better with Scala (not even plain Java) then with D (LDC), for instance.
>
> Got an example? I'd be interested to see a numerical-code example where
> the JVM can beat the llvm/gcc backends on a real calculation (even if
> it's a small one).

It was trivial Gaussian integration.
http://en.wikipedia.org/wiki/Gaussian_quadrature

I do not claim code is optimal or anything, but it's line for line.

// D version
import std.algorithm, std.stdio, std.datetime;

auto integrate(double function(double) f, double a, double b, int n){
    auto step = (b-a)/n;
    auto sum = 0.0;
    auto x = a;
    while(x<b)
    {
        sum += (f(x) + f(x+step))*step/2;
        x += step;
    }
    return sum;
}

long timeIt(){
    StopWatch sw;
    sw.start();
    auto r = integrate(x => x*x*x, 0.0, 1.0, 1000000);
    sw.stop();
    return sw.peek().usecs;
}

void main(){
    auto estimate = timeIt;
    foreach(_; 0..1000)
        estimate = min(estimate, timeIt);
    writef("%s sec\n", estimate/1e6);
}


// Scala version

def integrate(f: Double => Double, a: Double, b: Double, n : Int): Double = {
    val step = (b-a)/n;
    var sum = 0.0;
    var x = a;
    while(x<b)
    {
        sum += (f(x) + f(x+step))*step/2;
        x += step;
    }
    sum
}

def timeIt() = {
    val start = System.nanoTime();
    val r = integrate(x => x*x*x, 0.0, 1.0, 1000000);
    val end = System.nanoTime();
    end - start
}

var estimate = timeIt;
for ( _ <- 1 to 1000 )
    estimate = Math.min(estimate, timeIt)
printf("%s sec\n", estimate/1e9);



-- 
Dmitry Olshansky
December 07, 2014
On Sunday, 7 December 2014 at 22:13:50 UTC, Dmitry Olshansky wrote:
> 08-Dec-2014 00:36, John Colvin пишет:
>> On Sunday, 7 December 2014 at 19:56:49 UTC, Dmitry Olshansky wrote:
>>> 06-Dec-2014 18:33, H. S. Teoh via Digitalmars-d пишет:
>>>> On Sat, Dec 06, 2014 at 03:26:08PM +0000, Russel Winder via
>>>> Digitalmars-d wrote:
>>>> [...]
>>>>>>   primitive are passed by value; arrays and user defined types are
>>>>>> passed by reference only (killing memory usage)
>>>>>
>>>>> Primitive types are scheduled for removal, leaving only reference
>>>>> types.
>>>> [...]
>>>>
>>>> Whoa. So they're basically going to rely on JIT to convert those boxed
>>>> Integers into hardware ints for performance?
>>>
>>> With great success.
>>>
>>>> Sounds like I will never
>>>> consider Java for computation-heavy tasks then...
>>>
>>> Interestingly working with JVM for the last 2 years the only problem
>>> I've found is memory usage overhead of collections and non-trivial
>>> objects. In my tests performance of simple numeric code was actually
>>> better with Scala (not even plain Java) then with D (LDC), for instance.
>>
>> Got an example? I'd be interested to see a numerical-code example where
>> the JVM can beat the llvm/gcc backends on a real calculation (even if
>> it's a small one).
>
> It was trivial Gaussian integration.
> http://en.wikipedia.org/wiki/Gaussian_quadrature
>
> I do not claim code is optimal or anything, but it's line for line.
>
> // D version
> import std.algorithm, std.stdio, std.datetime;
>
> auto integrate(double function(double) f, double a, double b, int n){
>     auto step = (b-a)/n;
>     auto sum = 0.0;
>     auto x = a;
>     while(x<b)
>     {
>         sum += (f(x) + f(x+step))*step/2;
>         x += step;
>     }
>     return sum;
> }
>
> long timeIt(){
>     StopWatch sw;
>     sw.start();
>     auto r = integrate(x => x*x*x, 0.0, 1.0, 1000000);
>     sw.stop();
>     return sw.peek().usecs;
> }
>
> void main(){
>     auto estimate = timeIt;
>     foreach(_; 0..1000)
>         estimate = min(estimate, timeIt);
>     writef("%s sec\n", estimate/1e6);
> }
>
>
> // Scala version
>
> def integrate(f: Double => Double, a: Double, b: Double, n : Int): Double = {
>     val step = (b-a)/n;
>     var sum = 0.0;
>     var x = a;
>     while(x<b)
>     {
>         sum += (f(x) + f(x+step))*step/2;
>         x += step;
>     }
>     sum
> }
>
> def timeIt() = {
>     val start = System.nanoTime();
>     val r = integrate(x => x*x*x, 0.0, 1.0, 1000000);
>     val end = System.nanoTime();
>     end - start
> }
>
> var estimate = timeIt;
> for ( _ <- 1 to 1000 )
>     estimate = Math.min(estimate, timeIt)
> printf("%s sec\n", estimate/1e9);

on my machine (Haswell i5) I get scala as taking 1.6x as long as the ldc version.

I don't know scala though, I compiled using -optimise, are there other arguments I should be using?
December 07, 2014
08-Dec-2014 01:38, John Colvin пишет:
> On Sunday, 7 December 2014 at 22:13:50 UTC, Dmitry Olshansky wrote:
>> 08-Dec-2014 00:36, John Colvin пишет:
>>> On Sunday, 7 December 2014 at 19:56:49 UTC, Dmitry Olshansky wrote:
>>>> 06-Dec-2014 18:33, H. S. Teoh via Digitalmars-d пишет:
>>>>> On Sat, Dec 06, 2014 at 03:26:08PM +0000, Russel Winder via
>>>>> Digitalmars-d wrote:
>>>>> [...]
>>>>>>>   primitive are passed by value; arrays and user defined types are
>>>>>>> passed by reference only (killing memory usage)
>>>>>>
>>>>>> Primitive types are scheduled for removal, leaving only reference
>>>>>> types.
>>>>> [...]
>>>>>
>>>>> Whoa. So they're basically going to rely on JIT to convert those boxed
>>>>> Integers into hardware ints for performance?
>>>>
>>>> With great success.
>>>>
>>>>> Sounds like I will never
>>>>> consider Java for computation-heavy tasks then...
>>>>
>>>> Interestingly working with JVM for the last 2 years the only problem
>>>> I've found is memory usage overhead of collections and non-trivial
>>>> objects. In my tests performance of simple numeric code was actually
>>>> better with Scala (not even plain Java) then with D (LDC), for
>>>> instance.
>>>
>>> Got an example? I'd be interested to see a numerical-code example where
>>> the JVM can beat the llvm/gcc backends on a real calculation (even if
>>> it's a small one).
>>
>> It was trivial Gaussian integration.
>> http://en.wikipedia.org/wiki/Gaussian_quadrature
>>
>> I do not claim code is optimal or anything, but it's line for line.
>>

[snip]

> on my machine (Haswell i5) I get scala as taking 1.6x as long as the ldc
> version.
>
> I don't know scala though, I compiled using -optimise, are there other
> arguments I should be using?

There is no point in -optimise at least I do not recall using it.
What's your JVM ? It should be Oracle's HotSpot not OpenJDK.

-- 
Dmitry Olshansky
December 08, 2014
On Sun, 2014-12-07 at 21:36 +0000, John Colvin via Digitalmars-d wrote:
> […]

> Got an example? I'd be interested to see a numerical-code example where the JVM can beat the llvm/gcc backends on a real calculation (even if it's a small one).

π by quadrature (it's just a single loop) can show the effect very well, though currently anecdotally since I haven't set up proper benchmarking even after 7 years of tinkering.

https://github.com/russel/Pi_Quadrature

Of course JVM suffers a JIT warm up which native code languages do not, so you have to be careful with single data point comparisons.

As with any of these situation the convoluted hardcoded for a specific processor code, especially assembly language will always win. I don't care about that, I care about the fastest comprehensible code that is portable simply by compilation or execution. Based on this, Java does well, so does some Groovy perhaps surprisingly, also Scala.  C++ does well especially with TBB (though as an API it leaves a lot to be desired). D is OK but only using ldc2 or gdc, dmd sucks. Go has issues using gc but gccgo is fine. Rust does very well, but if using Cargo for build you have to be careful to use --release. A big winner here is Python, but only if you can get Numba working, Cython and Pythran for me are a bit icky. On the outside rails is Chapel, which if it could get some traction outside HPC would probably wipe the floor with all other languages, with X10 a good runner up.

Of course this is just a trivial microbenchmark, you may be looking for more real world actual codes.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

December 08, 2014
On Sun, 2014-12-07 at 13:57 -0800, Ziad Hatahet via Digitalmars-d wrote:
> 
> Are you referring to: http://openjdk.java.net/jeps/169 ?

That is one part of it, but it alone will not achieve the goal.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

December 08, 2014
On Sunday, 7 December 2014 at 22:46:02 UTC, Dmitry Olshansky wrote:
> 08-Dec-2014 01:38, John Colvin пишет:
>> On Sunday, 7 December 2014 at 22:13:50 UTC, Dmitry Olshansky wrote:
>>> 08-Dec-2014 00:36, John Colvin пишет:
>>>> On Sunday, 7 December 2014 at 19:56:49 UTC, Dmitry Olshansky wrote:
>>>>> 06-Dec-2014 18:33, H. S. Teoh via Digitalmars-d пишет:
>>>>>> On Sat, Dec 06, 2014 at 03:26:08PM +0000, Russel Winder via
>>>>>> Digitalmars-d wrote:
>>>>>> [...]
>>>>>>>>  primitive are passed by value; arrays and user defined types are
>>>>>>>> passed by reference only (killing memory usage)
>>>>>>>
>>>>>>> Primitive types are scheduled for removal, leaving only reference
>>>>>>> types.
>>>>>> [...]
>>>>>>
>>>>>> Whoa. So they're basically going to rely on JIT to convert those boxed
>>>>>> Integers into hardware ints for performance?
>>>>>
>>>>> With great success.
>>>>>
>>>>>> Sounds like I will never
>>>>>> consider Java for computation-heavy tasks then...
>>>>>
>>>>> Interestingly working with JVM for the last 2 years the only problem
>>>>> I've found is memory usage overhead of collections and non-trivial
>>>>> objects. In my tests performance of simple numeric code was actually
>>>>> better with Scala (not even plain Java) then with D (LDC), for
>>>>> instance.
>>>>
>>>> Got an example? I'd be interested to see a numerical-code example where
>>>> the JVM can beat the llvm/gcc backends on a real calculation (even if
>>>> it's a small one).
>>>
>>> It was trivial Gaussian integration.
>>> http://en.wikipedia.org/wiki/Gaussian_quadrature
>>>
>>> I do not claim code is optimal or anything, but it's line for line.
>>>
>
> [snip]
>
>> on my machine (Haswell i5) I get scala as taking 1.6x as long as the ldc
>> version.
>>
>> I don't know scala though, I compiled using -optimise, are there other
>> arguments I should be using?
>
> There is no point in -optimise at least I do not recall using it.
> What's your JVM ? It should be Oracle's HotSpot not OpenJDK.

hotspot.

After changing the benchmark to more carefully measure the integration function (ldc was unfairly taking advantage of knowing a and b at compile-time), scala does indeed win by a small margin.

I wonder what it's managing to achieve here? AFAICT there really isn't much scope for optimisation in that while loop without breaking IEEE-754 guarantees.
December 08, 2014
On Monday, 8 December 2014 at 10:31:46 UTC, John Colvin wrote:
> On Sunday, 7 December 2014 at 22:46:02 UTC, Dmitry Olshansky wrote:
>> 08-Dec-2014 01:38, John Colvin пишет:
>>> On Sunday, 7 December 2014 at 22:13:50 UTC, Dmitry Olshansky wrote:
>>>> 08-Dec-2014 00:36, John Colvin пишет:
>>>>> On Sunday, 7 December 2014 at 19:56:49 UTC, Dmitry Olshansky wrote:
>>>>>> 06-Dec-2014 18:33, H. S. Teoh via Digitalmars-d пишет:
>>>>>>> On Sat, Dec 06, 2014 at 03:26:08PM +0000, Russel Winder via
>>>>>>> Digitalmars-d wrote:
>>>>>>> [...]
>>>>>>>>> primitive are passed by value; arrays and user defined types are
>>>>>>>>> passed by reference only (killing memory usage)
>>>>>>>>
>>>>>>>> Primitive types are scheduled for removal, leaving only reference
>>>>>>>> types.
>>>>>>> [...]
>>>>>>>
>>>>>>> Whoa. So they're basically going to rely on JIT to convert those boxed
>>>>>>> Integers into hardware ints for performance?
>>>>>>
>>>>>> With great success.
>>>>>>
>>>>>>> Sounds like I will never
>>>>>>> consider Java for computation-heavy tasks then...
>>>>>>
>>>>>> Interestingly working with JVM for the last 2 years the only problem
>>>>>> I've found is memory usage overhead of collections and non-trivial
>>>>>> objects. In my tests performance of simple numeric code was actually
>>>>>> better with Scala (not even plain Java) then with D (LDC), for
>>>>>> instance.
>>>>>
>>>>> Got an example? I'd be interested to see a numerical-code example where
>>>>> the JVM can beat the llvm/gcc backends on a real calculation (even if
>>>>> it's a small one).
>>>>
>>>> It was trivial Gaussian integration.
>>>> http://en.wikipedia.org/wiki/Gaussian_quadrature
>>>>
>>>> I do not claim code is optimal or anything, but it's line for line.
>>>>
>>
>> [snip]
>>
>>> on my machine (Haswell i5) I get scala as taking 1.6x as long as the ldc
>>> version.
>>>
>>> I don't know scala though, I compiled using -optimise, are there other
>>> arguments I should be using?
>>
>> There is no point in -optimise at least I do not recall using it.
>> What's your JVM ? It should be Oracle's HotSpot not OpenJDK.
>
> hotspot.
>
> After changing the benchmark to more carefully measure the integration function (ldc was unfairly taking advantage of knowing a and b at compile-time), scala does indeed win by a small margin.
>
> I wonder what it's managing to achieve here? AFAICT there really isn't much scope for optimisation in that while loop without breaking IEEE-754 guarantees.

I don't think 'f' will be inlined in the D version. What happens if you make it an alias instead?
December 08, 2014
On Monday, 8 December 2014 at 11:02:21 UTC, Rene Zwanenburg wrote:
> On Monday, 8 December 2014 at 10:31:46 UTC, John Colvin wrote:
>> On Sunday, 7 December 2014 at 22:46:02 UTC, Dmitry Olshansky wrote:
>>> 08-Dec-2014 01:38, John Colvin пишет:
>>>> On Sunday, 7 December 2014 at 22:13:50 UTC, Dmitry Olshansky wrote:
>>>>> 08-Dec-2014 00:36, John Colvin пишет:
>>>>>> On Sunday, 7 December 2014 at 19:56:49 UTC, Dmitry Olshansky wrote:
>>>>>>> 06-Dec-2014 18:33, H. S. Teoh via Digitalmars-d пишет:
>>>>>>>> On Sat, Dec 06, 2014 at 03:26:08PM +0000, Russel Winder via
>>>>>>>> Digitalmars-d wrote:
>>>>>>>> [...]
>>>>>>>>>> primitive are passed by value; arrays and user defined types are
>>>>>>>>>> passed by reference only (killing memory usage)
>>>>>>>>>
>>>>>>>>> Primitive types are scheduled for removal, leaving only reference
>>>>>>>>> types.
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> Whoa. So they're basically going to rely on JIT to convert those boxed
>>>>>>>> Integers into hardware ints for performance?
>>>>>>>
>>>>>>> With great success.
>>>>>>>
>>>>>>>> Sounds like I will never
>>>>>>>> consider Java for computation-heavy tasks then...
>>>>>>>
>>>>>>> Interestingly working with JVM for the last 2 years the only problem
>>>>>>> I've found is memory usage overhead of collections and non-trivial
>>>>>>> objects. In my tests performance of simple numeric code was actually
>>>>>>> better with Scala (not even plain Java) then with D (LDC), for
>>>>>>> instance.
>>>>>>
>>>>>> Got an example? I'd be interested to see a numerical-code example where
>>>>>> the JVM can beat the llvm/gcc backends on a real calculation (even if
>>>>>> it's a small one).
>>>>>
>>>>> It was trivial Gaussian integration.
>>>>> http://en.wikipedia.org/wiki/Gaussian_quadrature
>>>>>
>>>>> I do not claim code is optimal or anything, but it's line for line.
>>>>>
>>>
>>> [snip]
>>>
>>>> on my machine (Haswell i5) I get scala as taking 1.6x as long as the ldc
>>>> version.
>>>>
>>>> I don't know scala though, I compiled using -optimise, are there other
>>>> arguments I should be using?
>>>
>>> There is no point in -optimise at least I do not recall using it.
>>> What's your JVM ? It should be Oracle's HotSpot not OpenJDK.
>>
>> hotspot.
>>
>> After changing the benchmark to more carefully measure the integration function (ldc was unfairly taking advantage of knowing a and b at compile-time), scala does indeed win by a small margin.
>>
>> I wonder what it's managing to achieve here? AFAICT there really isn't much scope for optimisation in that while loop without breaking IEEE-754 guarantees.
>
> I don't think 'f' will be inlined in the D version. What happens if you make it an alias instead?

The delegate is inlined, after the whole integrate function is inlined into timeIt.
December 08, 2014
On Monday, 8 December 2014 at 11:40:25 UTC, John Colvin wrote:
> On Monday, 8 December 2014 at 11:02:21 UTC, Rene Zwanenburg wrote:
>> On Monday, 8 December 2014 at 10:31:46 UTC, John Colvin wrote:
>>> On Sunday, 7 December 2014 at 22:46:02 UTC, Dmitry Olshansky wrote:
>>>> 08-Dec-2014 01:38, John Colvin пишет:
>>>>> On Sunday, 7 December 2014 at 22:13:50 UTC, Dmitry Olshansky wrote:
>>>>>> 08-Dec-2014 00:36, John Colvin пишет:
>>>>>>> On Sunday, 7 December 2014 at 19:56:49 UTC, Dmitry Olshansky wrote:
>>>>>>>> 06-Dec-2014 18:33, H. S. Teoh via Digitalmars-d пишет:
>>>>>>>>> On Sat, Dec 06, 2014 at 03:26:08PM +0000, Russel Winder via
>>>>>>>>> Digitalmars-d wrote:
>>>>>>>>> [...]
>>>>>>>>>>> primitive are passed by value; arrays and user defined types are
>>>>>>>>>>> passed by reference only (killing memory usage)
>>>>>>>>>>
>>>>>>>>>> Primitive types are scheduled for removal, leaving only reference
>>>>>>>>>> types.
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>> Whoa. So they're basically going to rely on JIT to convert those boxed
>>>>>>>>> Integers into hardware ints for performance?
>>>>>>>>
>>>>>>>> With great success.
>>>>>>>>
>>>>>>>>> Sounds like I will never
>>>>>>>>> consider Java for computation-heavy tasks then...
>>>>>>>>
>>>>>>>> Interestingly working with JVM for the last 2 years the only problem
>>>>>>>> I've found is memory usage overhead of collections and non-trivial
>>>>>>>> objects. In my tests performance of simple numeric code was actually
>>>>>>>> better with Scala (not even plain Java) then with D (LDC), for
>>>>>>>> instance.
>>>>>>>
>>>>>>> Got an example? I'd be interested to see a numerical-code example where
>>>>>>> the JVM can beat the llvm/gcc backends on a real calculation (even if
>>>>>>> it's a small one).
>>>>>>
>>>>>> It was trivial Gaussian integration.
>>>>>> http://en.wikipedia.org/wiki/Gaussian_quadrature
>>>>>>
>>>>>> I do not claim code is optimal or anything, but it's line for line.
>>>>>>
>>>>
>>>> [snip]
>>>>
>>>>> on my machine (Haswell i5) I get scala as taking 1.6x as long as the ldc
>>>>> version.
>>>>>
>>>>> I don't know scala though, I compiled using -optimise, are there other
>>>>> arguments I should be using?
>>>>
>>>> There is no point in -optimise at least I do not recall using it.
>>>> What's your JVM ? It should be Oracle's HotSpot not OpenJDK.
>>>
>>> hotspot.
>>>
>>> After changing the benchmark to more carefully measure the integration function (ldc was unfairly taking advantage of knowing a and b at compile-time), scala does indeed win by a small margin.
>>>
>>> I wonder what it's managing to achieve here? AFAICT there really isn't much scope for optimisation in that while loop without breaking IEEE-754 guarantees.
>>
>> I don't think 'f' will be inlined in the D version. What happens if you make it an alias instead?
>
> The delegate is inlined, after the whole integrate function is inlined into timeIt.

sorry, f is a function not a delegate.
December 08, 2014
On Monday, 8 December 2014 at 10:31:46 UTC, John Colvin wrote:
> On Sunday, 7 December 2014 at 22:46:02 UTC, Dmitry Olshansky wrote:
>> 08-Dec-2014 01:38, John Colvin пишет:
>>> On Sunday, 7 December 2014 at 22:13:50 UTC, Dmitry Olshansky wrote:
>>>> 08-Dec-2014 00:36, John Colvin пишет:
>>>>> On Sunday, 7 December 2014 at 19:56:49 UTC, Dmitry Olshansky wrote:
>>>>>> 06-Dec-2014 18:33, H. S. Teoh via Digitalmars-d пишет:
>>>>>>> On Sat, Dec 06, 2014 at 03:26:08PM +0000, Russel Winder via
>>>>>>> Digitalmars-d wrote:
>>>>>>> [...]
>>>>>>>>> primitive are passed by value; arrays and user defined types are
>>>>>>>>> passed by reference only (killing memory usage)
>>>>>>>>
>>>>>>>> Primitive types are scheduled for removal, leaving only reference
>>>>>>>> types.
>>>>>>> [...]
>>>>>>>
>>>>>>> Whoa. So they're basically going to rely on JIT to convert those boxed
>>>>>>> Integers into hardware ints for performance?
>>>>>>
>>>>>> With great success.
>>>>>>
>>>>>>> Sounds like I will never
>>>>>>> consider Java for computation-heavy tasks then...
>>>>>>
>>>>>> Interestingly working with JVM for the last 2 years the only problem
>>>>>> I've found is memory usage overhead of collections and non-trivial
>>>>>> objects. In my tests performance of simple numeric code was actually
>>>>>> better with Scala (not even plain Java) then with D (LDC), for
>>>>>> instance.
>>>>>
>>>>> Got an example? I'd be interested to see a numerical-code example where
>>>>> the JVM can beat the llvm/gcc backends on a real calculation (even if
>>>>> it's a small one).
>>>>
>>>> It was trivial Gaussian integration.
>>>> http://en.wikipedia.org/wiki/Gaussian_quadrature
>>>>
>>>> I do not claim code is optimal or anything, but it's line for line.
>>>>
>>
>> [snip]
>>
>>> on my machine (Haswell i5) I get scala as taking 1.6x as long as the ldc
>>> version.
>>>
>>> I don't know scala though, I compiled using -optimise, are there other
>>> arguments I should be using?
>>
>> There is no point in -optimise at least I do not recall using it.
>> What's your JVM ? It should be Oracle's HotSpot not OpenJDK.
>
> hotspot.
>
> After changing the benchmark to more carefully measure the integration function (ldc was unfairly taking advantage of knowing a and b at compile-time), scala does indeed win by a small margin.
>
> I wonder what it's managing to achieve here? AFAICT there really isn't much scope for optimisation in that while loop without breaking IEEE-754 guarantees.

You can check it, if you wish to do so.

With Oracle JVM and OpenJDK you have two options:

- https://github.com/AdoptOpenJDK/jitwatch/

-  Oracle Solaris Studio on Solaris, http://www.oracle.com/technetwork/articles/servers-storage-dev/profiling-java-studio-perf-2293553.html

- Plain text tools, https://wikis.oracle.com/display/HotSpotInternals/PrintAssembly

Other JVMs offer similar tooling.

--
Paulo