Thread overview
parallel optimizations based on number of memory controllers vs cpus
Mar 23, 2012
Jay Norwood
Mar 23, 2012
Timon Gehr
Mar 26, 2012
Jay Norwood
March 23, 2012
I believe the current std.parallelism default threadpool count is  number of cpus-1, according to some documentation.  When I was testing some concurrent vs threadpool parallel implementations I was seeing improvements on the concurrent operation up to about 14 threads.  I didn't try to figure out how to change the threadpool.

While reading this article I noticed someone who reported similar improvements up to 14 threads on memory related operations, and explained it by the number of memory controllers being the limiting issue.  See his item number 4  where  significant gains were made in memory processing up to 14 threads.

So, I wonder if it wouldn't be good to have a couple of different built-in threadpool types ... one meant for memory operations, and one primarily for cpu crunching ... with different sizes.

http://stackoverflow.com/questions/4260602/how-to-increase-performance-of-memcpy
March 23, 2012
On 03/23/2012 02:46 PM, Jay Norwood wrote:
> I believe the current std.parallelism default threadpool count is number
> of cpus-1, according to some documentation. When I was testing some
> concurrent vs threadpool parallel implementations I was seeing
> improvements on the concurrent operation up to about 14 threads. I
> didn't try to figure out how to change the threadpool.
>
> While reading this article I noticed someone who reported similar
> improvements up to 14 threads on memory related operations, and
> explained it by the number of memory controllers being the limiting
> issue. See his item number 4 where significant gains were made in memory
> processing up to 14 threads.
>
> So, I wonder if it wouldn't be good to have a couple of different
> built-in threadpool types ... one meant for memory operations, and one
> primarily for cpu crunching ... with different sizes.
>
> http://stackoverflow.com/questions/4260602/how-to-increase-performance-of-memcpy
>

On program startup, do:

ThreadPool.defaultPoolThreads(14); // or 13
March 26, 2012
On Friday, 23 March 2012 at 13:56:09 UTC, Timon Gehr wrote:
> On program startup, do:
>
> ThreadPool.defaultPoolThreads(14); // or 13

Yes, thank you. I just tried adding that.  The gains aren't scaleable in this particular test, which is apparently dominated by cpu processing, but even here you can see incremental improvements at 13 vs 7 threads on all the numbers.  I'd probably have to identify operations that were being limited by memory accesses in order to see the type of gains stated in that other app.

 This is with default 7 threads

finished wcp_wcPointer! time: 98 ms
finished wcp_wcCtRegex! time: 1300 ms
finished wcp_wcRegex! time: 2946 ms
finished wcp_wcRegex2! time: 2687 ms
finished wcp_wcSlices! time: 157 ms
finished wcp_wcStdAscii! time: 225 ms


This is processing the same data with 1 thread

finished wcp_wcPointer! time: 188 ms
finished wcp_wcCtRegex! time: 2219 ms
finished wcp_wcRegex! time: 5951 ms
finished wcp_wcRegex2! time: 5502 ms
finished wcp_wcSlices! time: 318 ms
finished wcp_wcStdAscii! time: 446 ms

And this is processing the same data with 13 threads

finished wcp_wcPointer! time: 93 ms
finished wcp_wcCtRegex! time: 1110 ms
finished wcp_wcRegex! time: 2531 ms
finished wcp_wcRegex2! time: 2321 ms
finished wcp_wcSlices! time: 136 ms
finished wcp_wcStdAscii! time: 200 ms

These were from the tests uploaded at https://github.com/jnorwood/wc_test.

The only change in the program that is uploaded is to add the suggested
defaultPoolThreads(13) at the start of main;
at the start of main to change the ThreadPool default thread count.