Thread overview | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
January 30, 2014 Looking for more library optimization patterns | ||||
---|---|---|---|---|
| ||||
Hi all! LDC includes the SimplifyDRuntimeCalls pass which should replace D library calls with more efficient ones. This is a clone of a former LLVM pass which did the same for C runtime calls. An example is that printf("foo\n") is replaced by puts("foo"). Do you know of similar patterns in D which are worth to be implemented? E.g. replacing std.stdio.writefln("foo") with std.stdio.writeln("foo") Regards, Kai |
January 30, 2014 Re: Looking for more library optimization patterns | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kai Nacke | Kai Nacke: > LDC includes the SimplifyDRuntimeCalls pass which should replace D library calls with more efficient ones. This is a clone of a former LLVM pass which did the same for C runtime calls. > > An example is that printf("foo\n") is replaced by puts("foo"). > > Do you know of similar patterns in D which are worth to be implemented? > E.g. replacing std.stdio.writefln("foo") with std.stdio.writeln("foo") A de-optimization I'd really like in LDC (and dmd) is to not call the run-time functions when you perform a verctor operation on arrays statically known to be very short: void main() { int[3] a, b; a[] += b[]; } And just rewrite that with a for loop (as done for cases where the run time function is not available), and let ldc2 compile it on its own. See a thread in D.learn: http://forum.dlang.org/thread/lqmqsnucadaqlkxkoffc@forum.dlang.org I'd like to replace code like: size_t i = 0; foreach (immutable j, const ref bj; bodies) { foreach (const ref bk; bodies[j + 1 .. $]) { foreach (immutable m; TypeTuple!(0, 1, 2)) r[i][m] = bj.x[m] - bk.x[m]; i++; } } With: size_t i = 0; foreach (immutable j, const ref bj; bodies) foreach (const ref bk; bodies[j + 1 .. $]) r[i++][] = bj.x[] - bk.x[]; And keep the same performance, instead of seeing a significant slowdown. Bye, bearophile |
January 31, 2014 Re: Looking for more library optimization patterns | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | Being somewhat involved in aforementioned thread, I second that. There seems to be no logical reason to insert a call dSliceOpAssignOrWhateverItIsCalled when all the data is actually there. At least, in -release -O3 builds :) Granted, this may be tricky for arbitrarily-typed arrays, but for builtins with statically known array sizes? That's a definitive win. |
February 10, 2014 Re: Looking for more library optimization patterns | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stanislav Blinov | Thanks! That are good hints! I am not sure if I can implement them soon. But I put them on my list for the pass. If you have more of those hints: please post! Regards, Kai |
February 11, 2014 Re: Looking for more library optimization patterns | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kai Nacke | On Monday, 10 February 2014 at 19:51:33 UTC, Kai Nacke wrote:
> Thanks! That are good hints!
> I am not sure if I can implement them soon. But I put them on my list for the pass.
>
> If you have more of those hints: please post!
>
> Regards,
> Kai
Don't know if it is the case but what about using appender instead of concatenation (for example for concatenations inside loops...)?
Andrea
|
February 11, 2014 Re: Looking for more library optimization patterns | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kai Nacke | Kai Nacke: > If you have more of those hints: please post! There are plenty of them to find. But I think you need a strategy like this: http://en.wikipedia.org/wiki/Brainstorming to find them. D conferences should be good for this purpose. Bye, bearophile |
February 15, 2014 Re: Looking for more library optimization patterns | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Thursday, 30 January 2014 at 18:36:09 UTC, bearophile wrote:
> A de-optimization I'd really like in LDC (and dmd) is to not call the run-time functions when you perform a verctor operation on arrays statically known to be very short:
>
> void main() {
> int[3] a, b;
> a[] += b[];
> }
>
> And just rewrite that with a for loop (as done for cases where the run time function is not available), and let ldc2 compile it on its own.
For short loops, an unrolled version like
a[0] += b[0];
a[1] += b[1];
a[2] += b[2];
may well be faster than a simple loop as the following one:
foreach (immutable i; 0..3) {
a[i] += b[i];
}
At least on x86/64.
Will that optimization happen too, at a later stage?
Ivan Kazmenko.
|
February 15, 2014 Re: Looking for more library optimization patterns | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivan Kazmenko | Ivan Kazmenko:
> For short loops, an unrolled version like
> a[0] += b[0];
> a[1] += b[1];
> a[2] += b[2];
> may well be faster than a simple loop as the following one:
> foreach (immutable i; 0..3) {
> a[i] += b[i];
> }
> At least on x86/64.
Yes, but ldc is plenty able to unroll small loops with length known at compile time.
Bye,
bearophile
|
April 18, 2014 Re: Looking for more library optimization patterns | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | What about SSE? Can float[16] a, b; a[] += b[]; Be optimized by using only 4 addps ? |
July 30, 2014 Re: Looking for more library optimization patterns | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Thursday, 30 January 2014 at 18:36:09 UTC, bearophile wrote: > > A de-optimization I'd really like in LDC (and dmd) is to not call the run-time functions when you perform a verctor operation on arrays statically known to be very short: > I just learned about this issue while looking at http://rosettacode.org/wiki/Hamming_numbers#D (Alternative version 2, opEquals) It makes me sad to learn that slice / array operations get turned into monstrosities by the compiler. There are some huge wins to be had in this area. |
Copyright © 1999-2021 by the D Language Foundation