Thread overview | ||||||
---|---|---|---|---|---|---|
|
September 07, 2017 performance cost of sample conversion | ||||
---|---|---|---|---|
| ||||
if I have a non-double buffer and temporarily convert to double then convert back, do I save many cycles rather than just using a double buffer? I know it will bea lot more memory, but I'm specifically talking about the cycles in converting to and from vs no conversion. Using a double for everything gives the highest precision and makes things much easier but is that the way to go or does it costs quite a bit in performance? |
September 06, 2017 Re: performance cost of sample conversion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Psychological Cleanup | On 09/06/2017 07:06 PM, Psychological Cleanup wrote:
> if I have a non-double buffer and temporarily convert to double then
> convert back, do I save many cycles rather than just using a double
> buffer? I know it will bea lot more memory, but I'm specifically talking
> about the cycles in converting to and from vs no conversion.
>
> Using a double for everything gives the highest precision and makes
> things much easier but is that the way to go or does it costs quite a
> bit in performance?
You have to measure. Here's a start:
import std.conv;
import std.range;
import std.datetime;
import std.stdio;
double workWithDouble(double d) {
return d * d / 7;
}
void workWithFloats(float[] floats) {
foreach (ref f; floats) {
f = workWithDouble(f).to!float;
}
}
void workWithDoubles(double[] doubles) {
foreach (ref d; doubles) {
d = workWithDouble(d);
}
}
void main() {
foreach (n; [ 1_000, 1_000_000, 10_000_000 ]) {
const beg = -1f;
const end = 1f;
const step = (end - beg) / n;
auto floats = iota(beg, end, step).array;
auto doubles = iota(double(beg), end, step).array;
{
auto sw = StopWatch(AutoStart.yes);
workWithDoubles(doubles);
writefln("%10s no conversion: %10s usecs", n, sw.peek().usecs);
}
{
auto sw = StopWatch(AutoStart.yes);
workWithFloats(floats);
writefln("%10s with conversion: %10s usecs", n, sw.peek().usecs);
}
}
}
Conversion seems to be more costly:
1000 no conversion: 27 usecs
1000 with conversion: 40 usecs
1000000 no conversion: 1715 usecs
1000000 with conversion: 5412 usecs
10000000 no conversion: 16280 usecs
10000000 with conversion: 47190 usecs
Ali
|
September 07, 2017 Re: performance cost of sample conversion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Thursday, 7 September 2017 at 05:45:58 UTC, Ali Çehreli wrote:
> On 09/06/2017 07:06 PM, Psychological Cleanup wrote:
>> if I have a non-double buffer and temporarily convert to double then
>> convert back, do I save many cycles rather than just using a double
>> buffer? I know it will bea lot more memory, but I'm specifically talking
>> about the cycles in converting to and from vs no conversion.
>>
>> Using a double for everything gives the highest precision and makes
>> things much easier but is that the way to go or does it costs quite a
>> bit in performance?
>
> You have to measure. Here's a start:
>
> import std.conv;
> import std.range;
> import std.datetime;
> import std.stdio;
>
> double workWithDouble(double d) {
> return d * d / 7;
> }
>
> void workWithFloats(float[] floats) {
> foreach (ref f; floats) {
> f = workWithDouble(f).to!float;
> }
> }
>
> void workWithDoubles(double[] doubles) {
> foreach (ref d; doubles) {
> d = workWithDouble(d);
> }
> }
>
> void main() {
> foreach (n; [ 1_000, 1_000_000, 10_000_000 ]) {
> const beg = -1f;
> const end = 1f;
> const step = (end - beg) / n;
> auto floats = iota(beg, end, step).array;
> auto doubles = iota(double(beg), end, step).array;
> {
> auto sw = StopWatch(AutoStart.yes);
> workWithDoubles(doubles);
> writefln("%10s no conversion: %10s usecs", n, sw.peek().usecs);
> }
> {
> auto sw = StopWatch(AutoStart.yes);
> workWithFloats(floats);
> writefln("%10s with conversion: %10s usecs", n, sw.peek().usecs);
> }
> }
> }
>
> Conversion seems to be more costly:
>
> 1000 no conversion: 27 usecs
> 1000 with conversion: 40 usecs
> 1000000 no conversion: 1715 usecs
> 1000000 with conversion: 5412 usecs
> 10000000 no conversion: 16280 usecs
> 10000000 with conversion: 47190 usecs
>
> Ali
Thanks. my results
dmd x86 debug
asserts on the line `auto floats = iota(beg, end, step).array;`
dmd x64 debug
1000 no conversion: 15 usecs
1000 with conversion: 5 usecs
1000000 no conversion: 2824 usecs
1000000 with conversion: 5689 usecs
10000000 no conversion: 24148 usecs
10000000 with conversion: 56335 usecs
dmd release x86
1000 no conversion: 1 usecs
1000 with conversion: 1 usecs
1000000 no conversion: 1903 usecs
1000000 with conversion: 1262 usecs
10000000 no conversion: 19156 usecs
10000000 with conversion: 12831 usecs
dmd release x64
1000 no conversion: 4 usecs
1000 with conversion: 17 usecs
1000000 no conversion: 4531 usecs
1000000 with conversion: 4516 usecs
10000000 no conversion: 45928 usecs
10000000 with conversion: 46080 usecs
ldc x86 debug
1000 no conversion: 3 usecs
1000 with conversion: 32 usecs
1000000 no conversion: 3563 usecs
1000000 with conversion: 19240 usecs
10000000 no conversion: 35986 usecs
10000000 with conversion: 192025 usecs
ldc x64 debug
1000 no conversion: 2 usecs
1000 with conversion: 10 usecs
1000000 no conversion: 2855 usecs
1000000 with conversion: 10309 usecs
10000000 no conversion: 28254 usecs
10000000 with conversion: 101380 usecs
ldc x86 release
1000 no conversion: 0 usecs
1000 with conversion: 0 usecs
1000000 no conversion: 1280 usecs
1000000 with conversion: 532 usecs
10000000 no conversion: 10403 usecs
10000000 with conversion: 5752 usecs
ldc x64 release
1000 no conversion: 0 usecs
1000 with conversion: 1 usecs
1000000 no conversion: 887 usecs
1000000 with conversion: 550 usecs
10000000 no conversion: 10730 usecs
10000000 with conversion: 5482 usecs
The results are strange, sometimes the conversion wins.
|
September 07, 2017 Re: performance cost of sample conversion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Thursday, 7 September 2017 at 05:45:58 UTC, Ali Çehreli wrote: > > You have to measure. Indeed. > Here's a start: The program has way too many things pre-defined, and the semantics are such that workWithDoubles can be completely eliminated... So you are not measuring what you want to be measuring. Make stuff depend on argc, and print the result of calculations or do something else such that the calculation must be performed. When measuring without LTO, probably attaching @weak onto the workWith* functions will work too. (pragma(inline, false) does not prevent reasoning about the function) -Johan |
Copyright © 1999-2021 by the D Language Foundation