Thread overview | |||||||
---|---|---|---|---|---|---|---|
|
August 10, 2008 Re: DMD 1.034 and 2.018 releases | ||||
---|---|---|---|---|
| ||||
Walter Bright:
> Can you make it faster?
Lot of people today have 2 (or even 4 cores), the order of the computation of those ops is arbitrary, so a major (nearly linear, hopefully) speedup will probably come as soon all the cores are used. This job splitting is probably an advantage even then the ops aren't computed by asm code.
I've taken a look at my code, and so far I don't see many spots where the array operations (once they actually give some speedup) can be useful (there are many other things I can find much more useful than such ops, see my wish lists). But if the array ops are useful for enough people, then it may be useful to burn some programming time to make those array ops use all the 2-4+ cores.
Bye,
bearophile
|
August 10, 2008 Re: DMD 1.034 and 2.018 releases | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | bearophile wrote:
> Walter Bright:
>> Can you make it faster?
>
> Lot of people today have 2 (or even 4 cores), the order of the computation of those ops is arbitrary, so a major (nearly linear, hopefully) speedup will probably come as soon all the cores are used. This job splitting is probably an advantage even then the ops aren't computed by asm code.
The overhead of creating a new thread for this would be significant. You'd probably be better off using a regular loop for arrays that are not huge.
|
August 10, 2008 Re: DMD 1.034 and 2.018 releases | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christopher Wright | "Christopher Wright" <dhasenan@gmail.com> wrote in message news:g7ljal$2i84$1@digitalmars.com... > bearophile wrote: >> Walter Bright: >>> Can you make it faster? >> >> Lot of people today have 2 (or even 4 cores), the order of the computation of those ops is arbitrary, so a major (nearly linear, hopefully) speedup will probably come as soon all the cores are used. This job splitting is probably an advantage even then the ops aren't computed by asm code. > > The overhead of creating a new thread for this would be significant. You'd probably be better off using a regular loop for arrays that are not huge. I think we could see a lot more improvement from using vector ops to perform SIMD operations. They are just begging for it. |
August 10, 2008 Re: DMD 1.034 and 2.018 releases | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christopher Wright | Christopher Wright wrote:
> bearophile wrote:
>> Walter Bright:
>>> Can you make it faster?
>>
>> Lot of people today have 2 (or even 4 cores), the order of the computation of those ops is arbitrary, so a major (nearly linear, hopefully) speedup will probably come as soon all the cores are used. This job splitting is probably an advantage even then the ops aren't computed by asm code.
>
> The overhead of creating a new thread for this would be significant. You'd probably be better off using a regular loop for arrays that are not huge.
I agree. I think a lot of profiling would be in order to see when certain things become an advantage to use. Then use a branch to jump to the best algorithm for the particular case (platform + length of array). Hopefully the compiler could inline the algorithm so that constant sized arrays don't pay for the additional overhead.
There would be a small cost for the extra branch for small dynamic arrays. Ideally one could argue that if this becomes a performance bottleneck then the program is doing a lot of operations on lots of small arrays. The user could change the design to group their small arrays into a larger array to get the performance they desire.
-Joel
|
September 07, 2008 Re: DMD 1.034 and 2.018 releases | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christopher Wright | Christopher Wright a écrit : > bearophile wrote: >> Walter Bright: >>> Can you make it faster? >> >> Lot of people today have 2 (or even 4 cores), the order of the computation of those ops is arbitrary, so a major (nearly linear, hopefully) speedup will probably come as soon all the cores are used. This job splitting is probably an advantage even then the ops aren't computed by asm code. > > The overhead of creating a new thread for this would be significant. Well for this kind of scheme, you wouldn't start a new set of thread each time! Just start a set of worker threads (one per cpu pinned to each cpu) which are created at startup of the program, and do nothing until they are woken up when there is an operation which can be accelerated through parallelism. > You'd probably be better off using a regular loop for arrays that are not huge. Sure, even with pre-created threads, using several cpu induce additional cost at startup and end cost so this would be worthwhile only with loops 'big enough'.. A pitfall also is to ensure that two cpu don't write to the same cache line, otherwise this 'false sharing' will reduce the performance. renoX |
Copyright © 1999-2021 by the D Language Foundation