Thread overview | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
May 28, 2015 drastic slowdown for copies | ||||
---|---|---|---|---|
| ||||
I'm currently investigating the difference of speed between references and copies. And it seems that copies got a immense slowdown if they reach a size of >= 20 bytes. In the code below you can see if my struct has a size of < 20 bytes (e.g. 4 ints = 16 bytes) a copy is cheaper than a reference. But with 5 ints (= 20 bytes) it gets a slowdown of ~3 times. I got these results: 16 bytes: by ref: 49 by copy: 34 by move: 32 20 bytes: by ref: 51 by copy: 104 by move: 103 My question is: why? My system is Win 8.1, 64 Bit and I'm using dmd 2.067.1 (32 bit) Code: import std.stdio; import std.datetime; struct S { int[4] values; } pragma(msg, S.sizeof); void by_ref(ref const S s) { } void by_copy(const S s) { } enum size_t Loops = 10_000_000; void main() { StopWatch sw; sw.start(); for (size_t i = 0; i < Loops; i++) { S s = S(); by_ref(s); } sw.stop(); writeln("by ref: ", sw.peek().msecs); sw.reset(); sw.start(); for (size_t i = 0; i < Loops; i++) { S s = S(); by_copy(s); } sw.stop(); writeln("by copy: ", sw.peek().msecs); sw.reset(); sw.start(); for (size_t i = 0; i < Loops; i++) { by_copy(S()); } sw.stop(); writeln("by move: ", sw.peek().msecs); } |
May 28, 2015 Re: drastic slowdown for copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to Momo | 16 bytes is 64 bit - the same size as a reference. So copying it is overall a bit less work - sending a 64 bit struct is as small as a 64 bit reference and you don't go through the pointer. So up to them, it is a bit faster. Add another byte and now the copy is too big to fit in a register, so it needs to spill over into somewhere else which means a bunch more work for the cpu. |
May 28, 2015 Re: drastic slowdown for copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | On 05/28/2015 11:27 PM, Adam D. Ruppe wrote:
> 16 bytes is 64 bit
It's actually 128 bits.
|
May 28, 2015 Re: drastic slowdown for copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | On Thursday, 28 May 2015 at 21:27:42 UTC, Adam D. Ruppe wrote:
> 16 bytes is 64 bit - the same size as a reference. So copying it is overall a bit less work - sending a 64 bit struct is as small as a 64 bit reference and you don't go through the pointer.
>
> So up to them, it is a bit faster.
>
>
> Add another byte and now the copy is too big to fit in a register, so it needs to spill over into somewhere else which means a bunch more work for the cpu.
But even in release mode (and with optimizations turned on) it is > 3 times slower. Can I somehow enforce references, like in C++? I tried already in ref, const ref and immutable ref, nothing works.
|
May 29, 2015 Re: drastic slowdown for copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to Momo | On Thursday, 28 May 2015 at 21:23:11 UTC, Momo wrote:
> I'm currently investigating the difference of speed between references and copies. And it seems that copies got a immense slowdown if they reach a size of >= 20 bytes.
This is processor-specific, on different models of CPUs you might get different results. Here's what I see running your program with 4 and 5 ints in the struct:
C:\prog\D>dmd copyref.d -ofcopyref.exe -release -O -inline
16u
C:\prog\D>copyref.exe
by ref: 18
by copy: 85
by move: 84
C:\prog\D>copyref.exe
by ref: 18
by copy: 72
by move: 72
C:\prog\D>copyref.exe
by ref: 16
by copy: 72
by move: 72
C:\prog\D>dmd copyref.d -ofcopyref.exe -release -O -inline
20u
C:\prog\D>copyref.exe
by ref: 23
by copy: 98
by move: 91
C:\prog\D>copyref.exe
by ref: 20
by copy: 91
by move: 102
C:\prog\D>copyref.exe
by ref: 23
by copy: 91
by move: 91
I see these digits on an old Core 2 Quad and very similar on a Core i3. So your findings are not reproducible.
|
May 29, 2015 Re: drastic slowdown for copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to Momo | On Thursday, 28 May 2015 at 21:23:11 UTC, Momo wrote: Ah, actually it's more complicated, as it depends on inlining a lot. Indeed, without -O and -inline I was able to get by_ref to be slightly slower than by_copy for struct of 4 ints. But when inlining turns on, the numbers change in different directions. And for 5 ints inlining influence is quite different: 4 ints: 5 ints: -release by ref: 53 by ref: 53 by copy: 57 by copy: 137 by move: 54 by move: 137 -release -O by ref: 38 by ref: 34 by copy: 54 by copy: 137 by move: 49 by move: 137 -release -O -inline by ref: 15 by ref: 20 by copy: 72 by copy: 91 by move: 72 by move: 91 |
May 29, 2015 Re: drastic slowdown for copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to thedeemon | On Friday, 29 May 2015 at 07:51:31 UTC, thedeemon wrote: Above was on Core 2 Quad, here's for Core i3: 4 ints 5 ints -release by ref: 67 by ref: 66 by copy: 44 by copy: 142 by move: 45 by move: 137 -release -O by ref: 29 by ref: 29 by copy: 41 by copy: 141 by move: 40 by move: 142 -release -O -inline by ref: 16 by ref: 20 by copy: 83 by copy: 104 by move: 83 by move: 104 |
May 29, 2015 Re: drastic slowdown for copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to thedeemon | On Friday, 29 May 2015 at 07:51:31 UTC, thedeemon wrote: > On Thursday, 28 May 2015 at 21:23:11 UTC, Momo wrote: > > Ah, actually it's more complicated, as it depends on inlining a lot. Yes. And real functions are more complex and inlining is no reliable option. > Indeed, without -O and -inline I was able to get by_ref to be slightly slower than by_copy for struct of 4 ints. But when inlining turns on, the numbers change in different directions. And for 5 ints inlining influence is quite different: > > 4 ints: 5 ints: > -release > by ref: 53 by ref: 53 > by copy: 57 by copy: 137 > by move: 54 by move: 137 > > -release -O > by ref: 38 by ref: 34 > by copy: 54 by copy: 137 > by move: 49 by move: 137 > > -release -O -inline > by ref: 15 by ref: 20 > by copy: 72 by copy: 91 > by move: 72 by move: 91 So as you can see, it is 2-3 times slower. Is there an alternative? |
May 29, 2015 Re: drastic slowdown for copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | Perhaps you can give me another detailed answer. I get a slowdown for all parts (ref, copy and move) if I use uninitialized floats. I got these results from the following code: by ref: 2369 by copy: 2335 by move: 2341 Code: struct vec2f { float x; float y; } But if I assign 0 to them I got these results: by ref: 49 by copy: 22 by move: 25 Why? |
May 29, 2015 Re: drastic slowdown for copies | ||||
---|---|---|---|---|
| ||||
Posted in reply to Momo | On 05/29/2015 06:55 AM, Momo wrote: > Perhaps you can give me another detailed answer. > I get a slowdown for all parts (ref, copy and move) if I use > uninitialized floats. Floating point variables are initialized to .nan of their types (e.g. float.nan). Apparently, the CPU is slow when using those special values: http://stackoverflow.com/questions/3606054/how-slow-is-nan-arithmetic-in-the-intel-x64-fpu Ali |
Copyright © 1999-2021 by the D Language Foundation