August 21, 2015
On 8/20/2015 6:06 PM, H. S. Teoh via Digitalmars-d wrote:
> https://issues.dlang.org/show_bug.cgi?id=14943

Thanks!

August 21, 2015
On Friday, 21 August 2015 at 01:20:27 UTC, jmh530 wrote:
> On Friday, 21 August 2015 at 00:00:09 UTC, H. S. Teoh wrote:
>>
>> The gdc version, by contrast, inlines *everything*,
>
> This could be why I've observed performance differentials in dmd for doing some manual for loops rather than using the stuff in std.algorithms.

ldc and gdc typically produce output nearly the same as handwritten loops for ranges.
August 21, 2015
On Friday, 21 August 2015 at 01:29:12 UTC, H. S. Teoh wrote:
> Have you tried using gdc -O3 (or ldc) to see if there's a big
> difference?

How -Os and -march=native change the picture?
August 21, 2015
On 21 August 2015 at 10:49, Kagamin via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> On Friday, 21 August 2015 at 01:29:12 UTC, H. S. Teoh wrote:
>
>> Have you tried using gdc -O3 (or ldc) to see if there's a big
>> difference?
>>
>
> How -Os and -march=native change the picture?
>

There's a paper somewhere about optimisations on Intel processors that says that -O2 produces overall better results than -O3 (I'll have to dig it out).

In any case, -Ofast may give you better benchmark results because it permits skipping corners for IEEE and other standards.

Also, -march=native is cheating, as you'll have the most unportable binary created from it.  ;-)


August 21, 2015
On Friday, 21 August 2015 at 09:17:28 UTC, Iain Buclaw wrote:
> There's a paper somewhere about optimisations on Intel processors that says that -O2 produces overall better results than -O3 (I'll have to dig it out).

That being said, recently I compared performance of the datetime library using different algorithms. One function of interest was computing year from raw time: D1 had an implementation based on loop - it iterated over years until it matched the source raw time; and currently phobos has implementation without loop, which carefully reduces the time to year. I wrote two tests iterating over days and calling date-time conversion functions, the test which invoked yearFromDays directly showed that implementation without loop is faster, but the bigger test that called full conversion between date and time showed that version with loop is faster by 5%. Quite unintuitive. Could it be due to cache problems? The function with loop is smaller, but the whole executable is only 15kb - should fit in the processor cache entirely.
August 21, 2015
On Wednesday, 19 August 2015 at 17:25:13 UTC, deadalnix wrote:
> Apple is invested in LLVM. For other thing you mention, WebAssembly is an AST representation, which is both dumb and do not look like anything like LLVM IR.

For the time being asm.js is a starting point. Nobody knows what WebAssembly will look like, but if emscripten is anything to go buy it will most likely pay off being in the LLVM eco system.

>> Replicating a scalar SSA like LLVM does not make a lot of sense. What would make a lot of sense would be to start work

> WAT ?

When simplifying over scalars you make a trade off. By having a simplifier that is optimized for keeping everything in vector units you can get better results for some code sections.

>> Or a high level compile-time oriented IR for D that can boost templates semantics and compilation speed.
>>
>
> That's impossible in the state of template right now (I know I've been there and dropped it as the return on investement was too low).

What is it about D template mechanics that make JITing difficult?

August 21, 2015
On Friday, 21 August 2015 at 02:02:57 UTC, rsw0x wrote:
> On Friday, 21 August 2015 at 01:20:27 UTC, jmh530 wrote:
>> On Friday, 21 August 2015 at 00:00:09 UTC, H. S. Teoh wrote:
>>>
>>> The gdc version, by contrast, inlines *everything*,
>>
>> This could be why I've observed performance differentials in dmd for doing some manual for loops rather than using the stuff in std.algorithms.
>
> ldc and gdc typically produce output nearly the same as handwritten loops for ranges.

Which is really what we need to be happening with ranges. The fact that they make code so much more idiomatic helps a _lot_, making code faster to write and easier to understand and maintain, but if we're taking performance hits from it, then we start losing out to C++ code pretty quickly, which is _not_ what we want.

- Jonathan M Davis
August 21, 2015
On Friday, 21 August 2015 at 01:29:12 UTC, H. S. Teoh wrote:
> On Fri, Aug 21, 2015 at 01:20:25AM +0000, jmh530 via Digitalmars-d wrote:
>> On Friday, 21 August 2015 at 00:00:09 UTC, H. S. Teoh wrote:
>>>
>>> The gdc version, by contrast, inlines *everything*,
>> 
>> This could be why I've observed performance differentials in dmd for doing some manual for loops rather than using the stuff in std.algorithms.
>
> Very likely, I'd say. IME dmd tends to give up inlining rather easily. This is very much something that needs to improve, since ranges in D are supposed to be a big selling point. Wouldn't want them to perform poorly compared to hand-written loops.

Yeah, ranges should ideally be a zero-cost abstraction, at least in trivial cases.
August 21, 2015
On Fri, Aug 21, 2015 at 03:09:42PM +0000, Ivan Kazmenko via Digitalmars-d wrote:
> On Friday, 21 August 2015 at 01:29:12 UTC, H. S. Teoh wrote:
> >On Fri, Aug 21, 2015 at 01:20:25AM +0000, jmh530 via Digitalmars-d wrote:
> >>On Friday, 21 August 2015 at 00:00:09 UTC, H. S. Teoh wrote:
> >>>
> >>>The gdc version, by contrast, inlines *everything*,
> >>
> >>This could be why I've observed performance differentials in dmd for doing some manual for loops rather than using the stuff in std.algorithms.
> >
> >Very likely, I'd say. IME dmd tends to give up inlining rather easily.  This is very much something that needs to improve, since ranges in D are supposed to be a big selling point. Wouldn't want them to perform poorly compared to hand-written loops.
> 
> Yeah, ranges should ideally be a zero-cost abstraction, at least in trivial cases.

Definitely. Fortunately, gdc (and probably ldc) seems quite capable of achieving this. It's just dmd that needs some improvement in this area. This will quickly become a major issue once we switch to ddmd and start making use of range-based code in the compiler, esp. since compiler performance has been one of the selling points of D.


T

-- 
Democracy: The triumph of popularity over principle. -- C.Bond
August 21, 2015
On Friday, 21 August 2015 at 10:11:52 UTC, Ola Fosheim Grøstad wrote:
>>> Replicating a scalar SSA like LLVM does not make a lot of sense. What would make a lot of sense would be to start work
>
>> WAT ?
>
> When simplifying over scalars you make a trade off. By having a simplifier that is optimized for keeping everything in vector units you can get better results for some code sections.
>

That still do not make any sense.

>>> Or a high level compile-time oriented IR for D that can boost templates semantics and compilation speed.
>>>
>>
>> That's impossible in the state of template right now (I know I've been there and dropped it as the return on investement was too low).
>
> What is it about D template mechanics that make JITing difficult?

"Or a high level compile-time oriented IR for D that can boost templates semantics and compilation speed."

"What is it about D template mechanics that make JITing difficult?"

You are not even trying to make any sense, do you ?