December 15, 2008
On Mon, Dec 15, 2008 at 2:13 PM, Walter Bright <newshound1@digitalmars.com> wrote:
> Jason House wrote:
>>
>> I have already hit long division related speed issues in my D code. Sometimes simple things can dominate a benchmark, but those same simple things can dominate user code too!
>
> I completely agree, and I'm in the process of fixing the long division. My point was it has nothing to do with the code generator, and that drawing conclusions from a benchmark result can be tricky.
>

That was fast! http://www.dsource.org/projects/phobos/changeset/884

--bb
December 15, 2008
Christian Kamm wrote:
>> Speaking of LDC, any chance that the exception handling on Win32 gets
>> fixed in the near future?  
> 
> No, unfortunately.
> 
> It's a problem with LLVM only supporting Dwarf2 exception handling. I'm
> pretty sure it'd work if we used ELF for the object files and GCC for
> linking, but Windows people tell me this is hardly acceptable.
> 
> We won't get 'real' exceptions working on Windows until someone adds SEH
> support to LLVM.
> 
> Volunteers?
> 
It's in progress for GCC so maybe it will help to get them on LLVM
December 15, 2008
dsimcha pisze:
> == Quote from Christian Kamm (kamm-incasoftware@removethis.de)'s article
>>> Speaking of LDC, any chance that the exception handling on Win32 gets
>>> fixed in the near future?
>> No, unfortunately.
>> It's a problem with LLVM only supporting Dwarf2 exception handling. I'm
>> pretty sure it'd work if we used ELF for the object files and GCC for
>> linking, but Windows people tell me this is hardly acceptable.
> 
> I think this solution is much better than nothing.  I assume it would at least
> work ok on standalone-type projects.
> 

Yeah... Also my thoughts...

Additionally maybe there are 3rd party object files converters, and "Windows people" work could be done using them as workaround?

BR
Marcin Kuszczak
(aarti_pl)
December 16, 2008
Jarrett Billingsley wrote:
> On Sat, Dec 13, 2008 at 11:16 AM, Tomas Lindquist Olsen
> <tomas@famolsen.dk> wrote:
>> I tried this out with Tango + DMD 1.033, Tango + LDC r847 and GCC 4.3.2, my
>> timings are as follows, best of three:
>>
>> $ dmd bench.d -O -release -inline
>> long arith:  55630 ms
>> nested loop:  5090 ms
>>
>>
>> $ ldc bench.d -O3 -release -inline
>> long arith:  13870 ms
>> nested loop:   120 ms
>>
>>
>> $ gcc bench.c -O3 -s -fomit-frame-pointer
>> long arith: 13600 ms
>> nested loop:  170 ms
>>
>>
>> My cpu is: Athlon64 X2 3800+
>>
> 
> Go LDC!
> 
> I hope bearophile will eventually understand that DMD is not good at
> optimizing code, and so comparing its output to GCC's is ultimately
> meaningless.

I must have missed the memo. How is dmd not good at optimizing code? Without knowing many details about it, my understanding is that dmd performs common optimization reasonably well and that this particular problem has to do with the long division routine.

Andrei
December 16, 2008
On Tue, Dec 16, 2008 at 11:09 AM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Jarrett Billingsley wrote:
>>
>> On Sat, Dec 13, 2008 at 11:16 AM, Tomas Lindquist Olsen <tomas@famolsen.dk> wrote:
>>>
>>> I tried this out with Tango + DMD 1.033, Tango + LDC r847 and GCC 4.3.2,
>>> my
>>> timings are as follows, best of three:
>>>
>>> $ dmd bench.d -O -release -inline
>>> long arith:  55630 ms
>>> nested loop:  5090 ms
>>>
>>>
>>> $ ldc bench.d -O3 -release -inline
>>> long arith:  13870 ms
>>> nested loop:   120 ms
>>>
>>>
>>> $ gcc bench.c -O3 -s -fomit-frame-pointer
>>> long arith: 13600 ms
>>> nested loop:  170 ms
>>>
>>>
>>> My cpu is: Athlon64 X2 3800+
>>>
>>
>> Go LDC!
>>
>> I hope bearophile will eventually understand that DMD is not good at optimizing code, and so comparing its output to GCC's is ultimately meaningless.
>
> I must have missed the memo. How is dmd not good at optimizing code? Without knowing many details about it, my understanding is that dmd performs common optimization reasonably well and that this particular problem has to do with the long division routine.

It's pretty well proven that for floating point code, DMD tends to generate code about 50% slower than GCC.

--bb
December 16, 2008
On Tue, 16 Dec 2008 05:28:16 +0300, Bill Baxter <wbaxter@gmail.com> wrote:

> On Tue, Dec 16, 2008 at 11:09 AM, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>> Jarrett Billingsley wrote:
>>>
>>> On Sat, Dec 13, 2008 at 11:16 AM, Tomas Lindquist Olsen
>>> <tomas@famolsen.dk> wrote:
>>>>
>>>> I tried this out with Tango + DMD 1.033, Tango + LDC r847 and GCC 4.3.2,
>>>> my
>>>> timings are as follows, best of three:
>>>>
>>>> $ dmd bench.d -O -release -inline
>>>> long arith:  55630 ms
>>>> nested loop:  5090 ms
>>>>
>>>>
>>>> $ ldc bench.d -O3 -release -inline
>>>> long arith:  13870 ms
>>>> nested loop:   120 ms
>>>>
>>>>
>>>> $ gcc bench.c -O3 -s -fomit-frame-pointer
>>>> long arith: 13600 ms
>>>> nested loop:  170 ms
>>>>
>>>>
>>>> My cpu is: Athlon64 X2 3800+
>>>>
>>>
>>> Go LDC!
>>>
>>> I hope bearophile will eventually understand that DMD is not good at
>>> optimizing code, and so comparing its output to GCC's is ultimately
>>> meaningless.
>>
>> I must have missed the memo. How is dmd not good at optimizing code? Without
>> knowing many details about it, my understanding is that dmd performs common
>> optimization reasonably well and that this particular problem has to do with
>> the long division routine.
>
> It's pretty well proven that for floating point code, DMD tends to
> generate code about 50% slower than GCC.
>
> --bb

But other than that it is pretty good.
And man, it is so fast!
December 16, 2008
On Tue, Dec 16, 2008 at 12:00 PM, Denis Koroskin <2korden@gmail.com> wrote:
> On Tue, 16 Dec 2008 05:28:16 +0300, Bill Baxter <wbaxter@gmail.com> wrote:
>
>> On Tue, Dec 16, 2008 at 11:09 AM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>>>
>>> Jarrett Billingsley wrote:
>>>>
>>>> On Sat, Dec 13, 2008 at 11:16 AM, Tomas Lindquist Olsen <tomas@famolsen.dk> wrote:
>>>>>
>>>>> I tried this out with Tango + DMD 1.033, Tango + LDC r847 and GCC
>>>>> 4.3.2,
>>>>> my
>>>>> timings are as follows, best of three:
>>>>>
>>>>> $ dmd bench.d -O -release -inline
>>>>> long arith:  55630 ms
>>>>> nested loop:  5090 ms
>>>>>
>>>>>
>>>>> $ ldc bench.d -O3 -release -inline
>>>>> long arith:  13870 ms
>>>>> nested loop:   120 ms
>>>>>
>>>>>
>>>>> $ gcc bench.c -O3 -s -fomit-frame-pointer
>>>>> long arith: 13600 ms
>>>>> nested loop:  170 ms
>>>>>
>>>>>
>>>>> My cpu is: Athlon64 X2 3800+
>>>>>
>>>>
>>>> Go LDC!
>>>>
>>>> I hope bearophile will eventually understand that DMD is not good at optimizing code, and so comparing its output to GCC's is ultimately meaningless.
>>>
>>> I must have missed the memo. How is dmd not good at optimizing code?
>>> Without
>>> knowing many details about it, my understanding is that dmd performs
>>> common
>>> optimization reasonably well and that this particular problem has to do
>>> with
>>> the long division routine.
>>
>> It's pretty well proven that for floating point code, DMD tends to generate code about 50% slower than GCC.
>>
>> --bb
>
> But other than that it is pretty good.

Yep, it's more than 100x faster than a straightforward Python ports of similar code, for instance.  (I did some benchmarking using a D port of the Laplace solver here http://www.scipy.org/PerformancePython  -- I think bearophile did these comparisons again himself more recently, too).  There I saw DMD about 50% slower than g++.  But I've seen figures in the neighborhood of 50% come up a few times since then in other float-intensive benchmarks, like the raytracer that someone ported from c++.

So it is certainly fast.  But one of the draws of D is precisely that, that it is fast.  If you're after code that runs as fast as possible, 50% slower than the competition is plenty justification for to go look elsewhere for your high-performance language.  A 50% hit may not really be relevant at the end of the day, but I know I used to avoid g++ like the plague because even it's output isn't that fast compared to MSVC++ or Intel's compiler, even though the difference is maybe only 10% or so.  I was working on interactive fluid simulation, so I wanted every bit of speed I could get out of the processor.  With interactive stuff, a 10% difference really can matter, I think.

> And man, it is so fast!

You mean compile times?

--bb
December 16, 2008
On Tue, 16 Dec 2008 06:23:14 +0300, Bill Baxter <wbaxter@gmail.com> wrote:

> On Tue, Dec 16, 2008 at 12:00 PM, Denis Koroskin <2korden@gmail.com> wrote:
>> On Tue, 16 Dec 2008 05:28:16 +0300, Bill Baxter <wbaxter@gmail.com> wrote:
>>
>>> On Tue, Dec 16, 2008 at 11:09 AM, Andrei Alexandrescu
>>> <SeeWebsiteForEmail@erdani.org> wrote:
>>>>
>>>> Jarrett Billingsley wrote:
>>>>>
>>>>> On Sat, Dec 13, 2008 at 11:16 AM, Tomas Lindquist Olsen
>>>>> <tomas@famolsen.dk> wrote:
>>>>>>
>>>>>> I tried this out with Tango + DMD 1.033, Tango + LDC r847 and GCC
>>>>>> 4.3.2,
>>>>>> my
>>>>>> timings are as follows, best of three:
>>>>>>
>>>>>> $ dmd bench.d -O -release -inline
>>>>>> long arith:  55630 ms
>>>>>> nested loop:  5090 ms
>>>>>>
>>>>>>
>>>>>> $ ldc bench.d -O3 -release -inline
>>>>>> long arith:  13870 ms
>>>>>> nested loop:   120 ms
>>>>>>
>>>>>>
>>>>>> $ gcc bench.c -O3 -s -fomit-frame-pointer
>>>>>> long arith: 13600 ms
>>>>>> nested loop:  170 ms
>>>>>>
>>>>>>
>>>>>> My cpu is: Athlon64 X2 3800+
>>>>>>
>>>>>
>>>>> Go LDC!
>>>>>
>>>>> I hope bearophile will eventually understand that DMD is not good at
>>>>> optimizing code, and so comparing its output to GCC's is ultimately
>>>>> meaningless.
>>>>
>>>> I must have missed the memo. How is dmd not good at optimizing code?
>>>> Without
>>>> knowing many details about it, my understanding is that dmd performs
>>>> common
>>>> optimization reasonably well and that this particular problem has to do
>>>> with
>>>> the long division routine.
>>>
>>> It's pretty well proven that for floating point code, DMD tends to
>>> generate code about 50% slower than GCC.
>>>
>>> --bb
>>
>> But other than that it is pretty good.
>
> Yep, it's more than 100x faster than a straightforward Python ports of
> similar code, for instance.  (I did some benchmarking using a D port
> of the Laplace solver here http://www.scipy.org/PerformancePython  --
> I think bearophile did these comparisons again himself more recently,
> too).  There I saw DMD about 50% slower than g++.  But I've seen
> figures in the neighborhood of 50% come up a few times since then in
> other float-intensive benchmarks, like the raytracer that someone
> ported from c++.
>
> So it is certainly fast.  But one of the draws of D is precisely that,
> that it is fast.  If you're after code that runs as fast as possible,
> 50% slower than the competition is plenty justification for to go look
> elsewhere for your high-performance language.  A 50% hit may not
> really be relevant at the end of the day, but I know I used to avoid
> g++ like the plague because even it's output isn't that fast compared
> to MSVC++ or Intel's compiler, even though the difference is maybe
> only 10% or so.  I was working on interactive fluid simulation, so I
> wanted every bit of speed I could get out of the processor.  With
> interactive stuff, a 10% difference really can matter, I think.
>
>> And man, it is so fast!
>
> You mean compile times?
>
> --bb

Yeah.
December 16, 2008
Aarti_pl pisze:
> dsimcha pisze:
>> == Quote from Christian Kamm (kamm-incasoftware@removethis.de)'s article
>>>> Speaking of LDC, any chance that the exception handling on Win32 gets
>>>> fixed in the near future?
>>> No, unfortunately.
>>> It's a problem with LLVM only supporting Dwarf2 exception handling. I'm
>>> pretty sure it'd work if we used ELF for the object files and GCC for
>>> linking, but Windows people tell me this is hardly acceptable.
>>
>> I think this solution is much better than nothing.  I assume it would at least
>> work ok on standalone-type projects.
>>
> 
> Yeah... Also my thoughts...
> 
> Additionally maybe there are 3rd party object files converters, and "Windows people" work could be done using them as workaround?
> 
> BR
> Marcin Kuszczak
> (aarti_pl)

I found such a converter (GPL licenced):
http://agner.org/optimize/#objconv

Can anyone comment if such a workaround will solve initial problem? (at least temporary).

If the answer is yes, then can we expect adding exception handling for LDC on windows? :-)

BR
Marcin Kuszczak
(aarti_pl)
December 16, 2008
Bill Baxter wrote:
> Anyway, all that said,  it's not clear that we really do have that
> mythical "uber backend" available right now.
> 
> According to my conversations on the clang mailing list, the current
> target is for LLVM to be able to fully support a C++ compiler by 2010.
>  I'm not quite sure what all that involves, but apparently it includes
> things like making exceptions work on Windows.

I wonder if there's any chance of getting a LLVM D compiler working before the LLVM C++ compiler works? <g>