Thread overview
floating point performance
Sep 18, 2001
Laurentiu Pancescu
Sep 19, 2001
Walter
Sep 21, 2001
Laurentiu Pancescu
Sep 21, 2001
Walter
Sep 21, 2001
Laurentiu Pancescu
Sep 22, 2001
Walter
Sep 29, 2001
Laurentiu Pancescu
Sep 21, 2001
Jan Knepper
September 18, 2001
I'm writing a very numerically intensive application, that involves mainly integration.  Using the trapeze method (source at the end of msg), I got widely different execution times for different compilers (times are in seconds, and the OS is Win2k, except for gcc running on Linux, where specified):

gcc-2.95.2 Debian GNU/Linux => 81
bcc 5.5.1 => 176
gcc-2.95.3 (MinGW 1.0) => 255
gcc-2.95.3 (Cygwin) => 119
sc 8.1d (Win32) => 316
sc 8.1d (X32) => 383
lcc-win32 => 326

I'm using a 1.1GHz Athlon with 256 MB RAM.  It seems that
DigitalMars is not the best choice for numerical applications, or
maybe I just got into a particular case, into which DM is
behaving poorly?  The flags used at compiling are "-o+all -6 -ff" (-mn
-WA for Win32 and -mx for DOS extended version, of course).

It's strange to see such big differences between different
flavors of gcc.  Maybe performance is mainly affected by the
run-time libraries, which are more or less optimized?  MinGW is using
MSVCRT, so... ;)  Another thing: why the difference between
the Win32 and X32 versions of the DM generated exe?  It's only
pure calculations, no i/o calls that might involve switching
between protected and real mode...  Under real DOS (with EMM386, so
VCPI is involved) it's even slower!

Laurentiu

// integrate.cpp
#include <stdio.h>
#include <math.h>
#include <time.h>

double fn(double x)
{
    return 0.5 * exp (-x*x/2.0);
}

double integrate(double a, double b, double eps, double(*f)(double))
{
    time_t before, after;
    time(&before);
    unsigned points = 4;
    register unsigned i;
    double previous, x, dx;
    register double current = 0.0;
    do
    {
        previous = current;
        current = ((*f)(a) + (*f)(b)) / 2.0;
        x = a;
        dx = (b - a) / (points - 1);
        for (i = points - 3; i--; x += dx)
        {
            current += (*f)(x);
        }
        points <<= 1;
        current *= dx;
    }
    while(fabs((current - previous) / current) >= eps);
    time(&after);
    printf("value = %g\tpoints = %u\ttime = %g\n", current,
           points, difftime(after, before));
    return current;
}

int main()
{
    //fesetprec(FE_DBLPREC); // no speedup
    integrate(0.0, 1.0, 1e-9, fn);
    return 0;
}
September 19, 2001
DMC has significantly more accurate floating point than other compilers do. This is particularly apparent in the floating point library, exp() included. It involves correctly handling things like NaN's and Infinities, which requires some extra code to be executed. Many C compilers simply ignore those cases.

-Walter


Laurentiu Pancescu wrote in message <9o86u7$2oki$1@digitaldaemon.com>...
>
>I'm writing a very numerically intensive application, that involves mainly integration.  Using the trapeze method (source at the end of msg), I got widely different execution times for different compilers (times are in seconds, and the OS is Win2k, except for gcc running on Linux, where specified):
>
>gcc-2.95.2 Debian GNU/Linux => 81
>bcc 5.5.1 => 176
>gcc-2.95.3 (MinGW 1.0) => 255
>gcc-2.95.3 (Cygwin) => 119
>sc 8.1d (Win32) => 316
>sc 8.1d (X32) => 383
>lcc-win32 => 326
>
>I'm using a 1.1GHz Athlon with 256 MB RAM.  It seems that
>DigitalMars is not the best choice for numerical applications, or
>maybe I just got into a particular case, into which DM is
>behaving poorly?  The flags used at compiling are "-o+all -6 -ff" (-mn
>-WA for Win32 and -mx for DOS extended version, of course).
>
>It's strange to see such big differences between different
>flavors of gcc.  Maybe performance is mainly affected by the
>run-time libraries, which are more or less optimized?  MinGW is using
>MSVCRT, so... ;)  Another thing: why the difference between
>the Win32 and X32 versions of the DM generated exe?  It's only
>pure calculations, no i/o calls that might involve switching
>between protected and real mode...  Under real DOS (with EMM386, so
>VCPI is involved) it's even slower!
>
>Laurentiu
>
>// integrate.cpp
>#include <stdio.h>
>#include <math.h>
>#include <time.h>
>
>double fn(double x)
>{
>    return 0.5 * exp (-x*x/2.0);
>}
>
>double integrate(double a, double b, double eps, double(*f)(double))
>{
>    time_t before, after;
>    time(&before);
>    unsigned points = 4;
>    register unsigned i;
>    double previous, x, dx;
>    register double current = 0.0;
>    do
>    {
>        previous = current;
>        current = ((*f)(a) + (*f)(b)) / 2.0;
>        x = a;
>        dx = (b - a) / (points - 1);
>        for (i = points - 3; i--; x += dx)
>        {
>            current += (*f)(x);
>        }
>        points <<= 1;
>        current *= dx;
>    }
>    while(fabs((current - previous) / current) >= eps);
>    time(&after);
>    printf("value = %g\tpoints = %u\ttime = %g\n", current,
>           points, difftime(after, before));
>    return current;
>}
>
>int main()
>{
>    //fesetprec(FE_DBLPREC); // no speedup
>    integrate(0.0, 1.0, 1e-9, fn);
>    return 0;
>}


September 21, 2001
I rewrote completely all the numerically-intensive functions,
and I was amazed by the speed of DMC generated code: it's the
best compiler on Win32!!  Borland's free compiler generates a
crashing EXE, while Cygwin and MinGW generated code with about
half the speed of DMC's code - unbelievable!!  It seems that the
"-ff" switch is very effective (almost doubles execution speed
in this case).  Even more, after this code rewrite, the X32
version is exactly as fast as the Win32 version (which is normal, I
must have done some stupid things in the first version).  Only
gcc-2.95.2 on Debian GNU/Linux beats DMC, but the difference
is not so much (about 9% faster code)...

Congratulations, Walter!!  DMC is really great, and the ability of treating Infinity and NaN without inline assembly is extremely useful for mathematical applications.


Laurentiu

"Walter" <walter@digitalmars.com> wrote:

>DMC has significantly more accurate floating point than other compilers do. This is particularly apparent in the floating point library, exp() included. It involves correctly handling things like NaN's and Infinities, which requires some extra code to be executed. Many C compilers simply ignore those cases.
>
>-Walter


September 21, 2001
Thanks! - but I have to ask, what is gcc-2.95.2 doing that DMC is not to the code? -Walter

Laurentiu Pancescu wrote in message <9ofker$r28$1@digitaldaemon.com>...
>
>I rewrote completely all the numerically-intensive functions,
>and I was amazed by the speed of DMC generated code: it's the
>best compiler on Win32!!  Borland's free compiler generates a
>crashing EXE, while Cygwin and MinGW generated code with about
>half the speed of DMC's code - unbelievable!!  It seems that the
>"-ff" switch is very effective (almost doubles execution speed
>in this case).  Even more, after this code rewrite, the X32
>version is exactly as fast as the Win32 version (which is normal, I
>must have done some stupid things in the first version).  Only
>gcc-2.95.2 on Debian GNU/Linux beats DMC, but the difference
>is not so much (about 9% faster code)...
>
>Congratulations, Walter!!  DMC is really great, and the ability of treating Infinity and NaN without inline assembly is extremely useful for mathematical applications.
>
>
>Laurentiu
>
>"Walter" <walter@digitalmars.com> wrote:
>
>>DMC has significantly more accurate floating point than other compilers
do.
>>This is particularly apparent in the floating point library, exp()
included.
>>It involves correctly handling things like NaN's and Infinities, which requires some extra code to be executed. Many C compilers simply ignore those cases.
>>
>>-Walter
>
>


September 21, 2001
Laurentiu Pancescu wrote:

> I rewrote completely all the numerically-intensive functions, and I was amazed by the speed of DMC generated code: it's the best compiler on Win32!!

No Kidding!
An other winner!

Jan


September 21, 2001
"Walter" <walter@digitalmars.com> wrote:

>Thanks! - but I have to ask, what is gcc-2.95.2 doing that DMC is not to the code? -Walter
>

I don't know, the GCC gen'd assembly code is too large for
me... :(  But I did more tests (tweaking compiler options only, I
didn't touch the code), and managed to get the code compiled by
gcc-2.95.2 on GNU/Linux to be 22% faster than DMC's code.
Maybe I could get even more with pgcc (Pentium Compiler Group's
patch to gcc, see www.goof.com/pcg).

Actually, I think it's very dependent on the runtime libs: GNU/Linux has a very highly optimized math library (like most system code on GNU systems), which also handles Infinity, NaN and other oddities.  I used also gcc-2.95.2, in the DJGPP flavor, which has its own libm, and the code is just 50% slower than DMC's, not about 100%, as MinGW and Cygwin.  Cygwin uses Cygnus' library, while MinGW uses Microsoft's MSVCRT, and it's a little slower than Cygwin at exp() and friends.

To get a fair comparison, one should probably use "pure" user code, without any lib calls, so that a weak compiler wouldn't be advantaged by a highly optimized library (MSVC generates much slower code than DMC or gcc, but the first version of my app ran 139% faster than DMC compiled version and 52% faster than MinGW, probably due to a very good math library).


Regards,
  Laurentiu

September 22, 2001
If you have a billion dollars to spend on engineers, you can task them to coding the entire rtl in optimized assembly language!

You're right that you have to check if you're testing the rtl speed or the generated code speed. I was losing a benchmark to gcc once, and couldn't figure out why because in every case dmc generated better code. Turns out the time was all being sucked up in a strcpy() of a constant which gcc had inlined and essentially eliminated.

-Walter

Laurentiu Pancescu wrote in message <9og353$12og$1@digitaldaemon.com>...
>"Walter" <walter@digitalmars.com> wrote:
>
>>Thanks! - but I have to ask, what is gcc-2.95.2 doing that DMC is not to
the
>>code? -Walter
>>
>
>I don't know, the GCC gen'd assembly code is too large for
>me... :(  But I did more tests (tweaking compiler options only, I
>didn't touch the code), and managed to get the code compiled by
>gcc-2.95.2 on GNU/Linux to be 22% faster than DMC's code.
>Maybe I could get even more with pgcc (Pentium Compiler Group's
>patch to gcc, see www.goof.com/pcg).
>
>Actually, I think it's very dependent on the runtime libs: GNU/Linux has a very highly optimized math library (like most system code on GNU systems), which also handles Infinity, NaN and other oddities.  I used also gcc-2.95.2, in the DJGPP flavor, which has its own libm, and the code is just 50% slower than DMC's, not about 100%, as MinGW and Cygwin.  Cygwin uses Cygnus' library, while MinGW uses Microsoft's MSVCRT, and it's a little slower than Cygwin at exp() and friends.
>
>To get a fair comparison, one should probably use "pure" user code, without any lib calls, so that a weak compiler wouldn't be advantaged by a highly optimized library (MSVC generates much slower code than DMC or gcc, but the first version of my app ran 139% faster than DMC compiled version and 52% faster than MinGW, probably due to a very good math library).
>
>
>Regards,
>  Laurentiu
>


September 29, 2001
"Walter" <walter@digitalmars.com> wrote:

>You're right that you have to check if you're testing the rtl speed or the generated code speed.

I implemented my own exp function, using MacLaurin series expansion, and doing a sum after 10 million such calculated values (just to make sure no rtl is getting into way).  Here are the results (max optimizations on all compilers):

- bcc32 does it in 92 seconds (I also noticed that bcc32
doesn't handle INFINITY properly, so I modified the test not to get
into any Inf or NaN)
- DMC produces the correct result in 75 seconds
- GCC-2.95.3-6 (MinGW-special) gives correct result in 22 seconds.

I had different arguments for my exp(), so that no smart
compiler optimizes something away.

However, I don't think that my code has any relevance from a benchmark's point of view - it's too simple...  DMC seems to be by far the best commercial compiler for Win32, no matter which code I'm trying (33% improvement over BCC 5.5.1 isn't something any compiler can achieve, usually MSVC generated code a sesible slower than bcc32's).

Laurentiu