View mode: basic / threaded / horizontal-split · Log in · Help
March 23, 2012
parallel optimizations based on number of memory controllers vs cpus
I believe the current std.parallelism default threadpool count is 
 number of cpus-1, according to some documentation.  When I was 
testing some concurrent vs threadpool parallel implementations I 
was seeing improvements on the concurrent operation up to about 
14 threads.  I didn't try to figure out how to change the 
threadpool.

While reading this article I noticed someone who reported similar 
improvements up to 14 threads on memory related operations, and 
explained it by the number of memory controllers being the 
limiting issue.  See his item number 4  where  significant gains 
were made in memory processing up to 14 threads.

So, I wonder if it wouldn't be good to have a couple of different 
built-in threadpool types ... one meant for memory operations, 
and one primarily for cpu crunching ... with different sizes.

http://stackoverflow.com/questions/4260602/how-to-increase-performance-of-memcpy
March 23, 2012
Re: parallel optimizations based on number of memory controllers vs cpus
On 03/23/2012 02:46 PM, Jay Norwood wrote:
> I believe the current std.parallelism default threadpool count is number
> of cpus-1, according to some documentation. When I was testing some
> concurrent vs threadpool parallel implementations I was seeing
> improvements on the concurrent operation up to about 14 threads. I
> didn't try to figure out how to change the threadpool.
>
> While reading this article I noticed someone who reported similar
> improvements up to 14 threads on memory related operations, and
> explained it by the number of memory controllers being the limiting
> issue. See his item number 4 where significant gains were made in memory
> processing up to 14 threads.
>
> So, I wonder if it wouldn't be good to have a couple of different
> built-in threadpool types ... one meant for memory operations, and one
> primarily for cpu crunching ... with different sizes.
>
> http://stackoverflow.com/questions/4260602/how-to-increase-performance-of-memcpy
>

On program startup, do:

ThreadPool.defaultPoolThreads(14); // or 13
March 26, 2012
Re: parallel optimizations based on number of memory controllers vs cpus
On Friday, 23 March 2012 at 13:56:09 UTC, Timon Gehr wrote:
> On program startup, do:
>
> ThreadPool.defaultPoolThreads(14); // or 13

Yes, thank you. I just tried adding that.  The gains aren't 
scaleable in this particular test, which is apparently dominated 
by cpu processing, but even here you can see incremental 
improvements at 13 vs 7 threads on all the numbers.  I'd probably 
have to identify operations that were being limited by memory 
accesses in order to see the type of gains stated in that other 
app.

 This is with default 7 threads

finished wcp_wcPointer! time: 98 ms
finished wcp_wcCtRegex! time: 1300 ms
finished wcp_wcRegex! time: 2946 ms
finished wcp_wcRegex2! time: 2687 ms
finished wcp_wcSlices! time: 157 ms
finished wcp_wcStdAscii! time: 225 ms


This is processing the same data with 1 thread

finished wcp_wcPointer! time: 188 ms
finished wcp_wcCtRegex! time: 2219 ms
finished wcp_wcRegex! time: 5951 ms
finished wcp_wcRegex2! time: 5502 ms
finished wcp_wcSlices! time: 318 ms
finished wcp_wcStdAscii! time: 446 ms

And this is processing the same data with 13 threads

finished wcp_wcPointer! time: 93 ms
finished wcp_wcCtRegex! time: 1110 ms
finished wcp_wcRegex! time: 2531 ms
finished wcp_wcRegex2! time: 2321 ms
finished wcp_wcSlices! time: 136 ms
finished wcp_wcStdAscii! time: 200 ms

These were from the tests uploaded at 
https://github.com/jnorwood/wc_test.

The only change in the program that is uploaded is to add the 
suggested
defaultPoolThreads(13) at the start of main;
at the start of main to change the ThreadPool default thread 
count.
Top | Discussion index | About this forum | D home