Speed kills

Every time there is a D thread on reddit it feels like the new user is expecting mind-blowing speed from D. https://www.reddit.com/r/programming/comments/45v03g/porterstemmerd_an_implementation_of_the_porter/ This is the most recent one where John Colvin provided some pointers to speed it up significantly. Walter has done some good work taking the low-hanging fruit to speed up DMD code and there is a lot of effort going on with reference counting machinery but I wondered if some of the common errors people make that slow down D code can be addressed? Literals used to be a hidden speed bump but I think that was improved, now the append operator is one of the most common culprits, can this not be enhanced behind the scenes to work more like append? Do others notice common pitfalls between the article code and what the D community then suggests where we can bridge the gap so naive users get faster code?

On Monday, 15 February 2016 at 13:51:38 UTC, ixid wrote: > This is the most recent one where John Colvin provided some pointers to speed it up significantly. Walter has done some good work taking the low-hanging fruit to speed up DMD code and there is a lot of effort going on with reference counting machinery but I wondered if some of the common errors people make that slow down D code can be addressed? Something that annoyed me a bit is floating-point comparisons, DMD does not seem to be able to handle them from SSE registers, it will convert to FPU and do the comparison there IIRC.

On Monday, 15 February 2016 at 14:16:02 UTC, Guillaume Piolat wrote: > > Something that annoyed me a bit is floating-point comparisons, DMD does not seem to be able to handle them from SSE registers, it will convert to FPU and do the comparison there IIRC. I feel like this point comes up often, and that a lot of people have argued x87 FP should just not happen anymore. -Wyatt

On Monday, 15 February 2016 at 13:51:38 UTC, ixid wrote: > Every time there is a D thread on reddit it feels like the new user is expecting mind-blowing speed from D. > > [...] if you want better codegen, don't use dmd. use ldc, it's usualy only a version-ish behind dmd.

February 15, 2016

Re: Speed kills

Posted by Basile B.
in reply to Guillaume Piolat

Permalink

Basile B.

Posted in reply to Guillaume Piolat

Permalink

On Monday, 15 February 2016 at 14:16:02 UTC, Guillaume Piolat wrote:
> On Monday, 15 February 2016 at 13:51:38 UTC, ixid wrote:
>> This is the most recent one where John Colvin provided some pointers to speed it up significantly. Walter has done some good work taking the low-hanging fruit to speed up DMD code and there is a lot of effort going on with reference counting machinery but I wondered if some of the common errors people make that slow down D code can be addressed?
>
> Something that annoyed me a bit is floating-point comparisons, DMD does not seem to be able to handle them from SSE registers, it will convert to FPU and do the comparison there IIRC.

Same for std.math.lround

they use the FP way while for float and double it's only one sse instruction. Typically with 6 functions similar to this one:


int round(float value)
{
    asm
    {
        naked;
        cvtss2si EAX, XMM0;
        ret;
    }
}

we could get ceil/trunc/round/floor, also almost as easily fmod, hypoth.
classic but I dont get why thery're not in std.math.

Goddamnit, we're in 2016.

On Monday, 15 February 2016 at 22:29:00 UTC, Basile B. wrote: > we could get ceil/trunc/round/floor, also almost as easily fmod, hypoth. > classic but I dont get why thery're not in std.math. Seems like you know a lot about the subject, and I know you contributed to phobos before, so how about making a PR for this :)

On Monday, 15 February 2016 at 22:29:00 UTC, Basile B. wrote: > Same for std.math.lround > > they use the FP way while for float and double it's only one sse instruction. Typically with 6 functions similar to this one: > > > int round(float value) > { > asm > { > naked; > cvtss2si EAX, XMM0; > ret; > } > } > > we could get ceil/trunc/round/floor, also almost as easily fmod, hypoth. > classic but I dont get why thery're not in std.math. > > Goddamnit, we're in 2016. lround and friends have been a big performance problem at times. Everytime you can use cast(int) instead, it's way faster.

On Monday, 15 February 2016 at 23:19:44 UTC, Guillaume Piolat wrote: > > lround and friends have been a big performance problem at times. > Everytime you can use cast(int) instead, it's way faster. I didn't know this trick. It generates almost the same sse intruction (it truncates) and has the advantage to be inline-able. Is it documented somewhere ? If not it should.

On Monday, 15 February 2016 at 23:35:54 UTC, Basile B. wrote: > On Monday, 15 February 2016 at 23:19:44 UTC, Guillaume Piolat wrote: >> >> lround and friends have been a big performance problem at times. >> Everytime you can use cast(int) instead, it's way faster. > > I didn't know this trick. It generates almost the same sse intruction (it truncates) and has the advantage to be inline-able. > > Is it documented somewhere ? If not it should. In SSE3 you also get an instruction that does this without messing with the x87 control word: FISTTP.

On Monday, 15 February 2016 at 23:13:13 UTC, Jack Stouffer wrote: > On Monday, 15 February 2016 at 22:29:00 UTC, Basile B. wrote: >> we could get ceil/trunc/round/floor, also almost as easily fmod, hypoth. >> classic but I dont get why thery're not in std.math. > > Seems like you know a lot about the subject, and I know you contributed to phobos before, so how about making a PR for this :) In the meantime: https://github.com/BBasile/iz/blob/master/import/iz/math.d Actually when i've participated to this conversation I didn't remember that it was not good on X86. Using SSE rouding is really only good on AMD64, otherwise loading the input parameter "sucks" a lot (even for a 32 bit float since it's not directly in EAX or XMMO). Anyway, not good for phobos, why? When looking for documentation yesterday night I've landed on a post by Walter who explained that the library for a system programming language shouldn't be specific to an architecture.

Forums