September 29, 2010 Re: Switch implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bioinfornatics | I have a AMD Phenom(tm) II X4 955 Processor and this script works only on one core of my quad (800Mhz) | |||
September 29, 2010 Re: Switch implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bioinfornatics | bioinfornatics:
> with ldc and tango (up to date)
> $ ldc -O5 -release -enable-inlining test.d
> $ time ./test
> 1500000 1300000 497200000
>
> real 0m4.376s
> user 0m4.373s
> sys 0m0.001s
LDC has llvm back-end, that doesn't share that dmd problem.
But to give us meaningful data, I suggest you to time the C program too on your machine, so we can compare :-)
Bye,
bearophile
| |||
September 29, 2010 Re: Switch implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | == Quote from bearophile (bearophileHUGS@lycos.com)'s article > Iain Buclaw: > > Oh, it was my *original* intention to test runtime speed. However, the time to compile just stood out little more like a sore thumb than what I anticipated. > If your purpose is to test runtime speed, use a more natural number of cases like 10 or 20 or even 50 :-) > So I have done a better test, see below for the code. > Timings, NLOOPS = 100_000, best of 6, seconds: > DMD: 7.70 > GCC: 2.42 > gcc 4.5.1, -Wall -O3 -s > dmd 2.049, -O -release -inline --snip-- > Bye, > bearophile Well, this should give you a good comparison between case jump tables on and off in GCC. gdc-4.4 -O0 real 0m9.994s user 0m9.957s sys 0m0.028s gdc-4.4 -O0 -fno-jump-tables real 0m10.222s user 0m10.197s sys 0m0.012s gdc-4.4 -O0 -funroll-loops real 0m10.004s user 0m9.965s sys 0m0.032s gdc-4.4 -O0 -funroll-loops -fno-jump-tables real 0m10.011s user 0m9.981s sys 0m0.020s gdc-4.4 -O3 real 0m7.107s user 0m7.092s sys 0m0.008s gdc-4.4 -O3 -fno-jump-tables real 0m7.136s user 0m7.112s sys 0m0.008s gdc-4.4 -O3 -funroll-loops real 0m7.127s user 0m7.104s sys 0m0.008s gdc-4.4 -O3 -funroll-loops -fno-jump-tables real 0m7.237s user 0m7.184s sys 0m0.044s Differences are pretty minimal... In comparison to DMD: dmd -O real 0m15.049s user 0m14.989s sys 0m0.044s Iain | |||
September 29, 2010 Re: Switch implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Tue, 28 Sep 2010 20:45:27 -0400, bearophile <bearophileHUGS@lycos.com> wrote:
> retard:
>
>> Instead of O(n) linear search or O(ln n) binary search, why not use O(1)
>> jump tables in this case?
>
> I don't exactly know. But you must take into account the constants too, it's not just a matter of worst-case computational complexity. Probably when the density of a large jump table becomes too much low, its experimental performance on modern CPUs gets worse than a binary search among few entries. But I am not sure, I have not written&run benchmarks on this.
>
> Bye,
> bearophile
Well there are 28 labeled cases and ~16kb of jump table address space. (32kb on 64-bit platforms)
| |||
September 29, 2010 Re: Switch implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Robert Jacques | Robert Jacques:
> Well there are 28 labeled cases and ~16kb of jump table address space. (32kb on 64-bit platforms)
32 kb are enough to fill the code part of the L1 cache on most CPUs.
Bye,
bearophile
| |||
September 29, 2010 Re: Switch implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | Iain Buclaw: > gdc-4.4 -O3 -fno-jump-tables > real 0m7.136s > user 0m7.112s > sys 0m0.008s ... > gdc-4.4 -O3 -funroll-loops -fno-jump-tables > real 0m7.237s > user 0m7.184s > sys 0m0.044s > > Differences are pretty minimal... Probably because gcc/gdc is not using a jump table here, but a binary search tree :-) Bye, bearophile | |||
September 29, 2010 Re: Switch implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | Tue, 28 Sep 2010 22:00:06 -0400, bearophile wrote:
> Robert Jacques:
>> Well there are 28 labeled cases and ~16kb of jump table address space.
>> (32kb on 64-bit platforms)
>
> 32 kb are enough to fill the code part of the L1 cache on most CPUs.
28 cases and 32 kB of space seems like a waste. I'd use some sort of hashing before the jump to eliminate unnecessary blank slots. However, even if this kind of solution was cache friendly, I have no idea how this would affect the operation of modern branch predictors.
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply