May 04, 2014 Re: Running Phobos unit tests in threads: I have data | ||||
---|---|---|---|---|
| ||||
Posted in reply to Atila Neves | On Sun, 2014-05-04 at 08:47 +0000, Atila Neves via Digitalmars-d wrote: > Like I mentioned afterwards, I tried a different number of threads. On my machine, at least, std.parallelism.totalCPUs returns 8, the number of virtual cores. As it should. If you can create a small example of the problem, and I can remember how to run std.parallelism as a separate module, I can try and take a look at this later next week. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder |
May 04, 2014 Re: Running Phobos unit tests in threads: I have data | ||||
---|---|---|---|---|
| ||||
Posted in reply to Atila Neves | On 5/4/14, 1:44 AM, Atila Neves wrote:
> On Saturday, 3 May 2014 at 22:46:03 UTC, Andrei Alexandrescu wrote:
>> On 5/3/14, 2:42 PM, Atila Neves wrote:
>>> gdc gave _very_ different results. I had to use different modules
>>> because at some point tests started failing, but with gdc the threaded
>>> version runs ~3x faster.
>>>
>>> On my own unit-threaded benchmarks, running the UTs for Cerealed over
>>> and over again was only slightly slower with threads than without. With
>>> dmd the threaded version was nearly 3x slower.
>>
>> Sounds like a severe bug in dmd or dependents. -- Andrei
>
> Seems like it. Just to be sure I swapped ld.gold for ld.bfd and the
> problem was still there.
>
> I'm not entirely sure how to file this bug: with just my simple example
> above?
The simpler the better. -- Andrei
|
May 04, 2014 Re: Running Phobos unit tests in threads: I have data | ||||
---|---|---|---|---|
| ||||
Posted in reply to Russel Winder | On 5/4/14, 3:06 AM, Russel Winder via Digitalmars-d wrote:
> On Sun, 2014-05-04 at 08:47 +0000, Atila Neves via Digitalmars-d wrote:
>> Like I mentioned afterwards, I tried a different number of
>> threads. On my machine, at least, std.parallelism.totalCPUs
>> returns 8, the number of virtual cores. As it should.
>
> If you can create a small example of the problem, and I can remember how
> to run std.parallelism as a separate module, I can try and take a look
> at this later next week.
This is an awesome offer, Russel. Thanks! -- Andrei
|
May 04, 2014 Re: Running Phobos unit tests in threads: I have data | ||||
---|---|---|---|---|
| ||||
On 04/05/14 09:49, Russel Winder via Digitalmars-d wrote:
> (*) Physical cores are not necessarily the number reported by the OS due
> to core hyperthreads. Quad core no hyperthreads, and dual core, two
> hyperthreads per core, both get reported as four processor systems.
> However if you benchmark them you get very, very different performance
> characteristics.
Yup. That bit me with a new laptop the first time I tried parallel programming with D :-)
|
May 04, 2014 Re: Running Phobos unit tests in threads: I have data | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Saturday, 3 May 2014 at 22:46:03 UTC, Andrei Alexandrescu wrote:
> On 5/3/14, 2:42 PM, Atila Neves wrote:
>> gdc gave _very_ different results. I had to use different modules
>> because at some point tests started failing, but with gdc the threaded
>> version runs ~3x faster.
>>
>> On my own unit-threaded benchmarks, running the UTs for Cerealed over
>> and over again was only slightly slower with threads than without. With
>> dmd the threaded version was nearly 3x slower.
>
> Sounds like a severe bug in dmd or dependents. -- Andrei
This reminds me of when I was parallelizing a project euler solution: atomic access was so much slower on DMD that it made performance worse than the single threaded version for one stage of the program.
I know that std.parallelism does make use of core.atomic under the hood, so this may be a factor when using DMD.
|
May 05, 2014 Re: Running Phobos unit tests in threads: I have data | ||||
---|---|---|---|---|
| ||||
Posted in reply to safety0ff | On Sunday, 4 May 2014 at 17:01:23 UTC, safety0ff wrote:
> On Saturday, 3 May 2014 at 22:46:03 UTC, Andrei Alexandrescu wrote:
>> On 5/3/14, 2:42 PM, Atila Neves wrote:
>>> gdc gave _very_ different results. I had to use different modules
>>> because at some point tests started failing, but with gdc the threaded
>>> version runs ~3x faster.
>>>
>>> On my own unit-threaded benchmarks, running the UTs for Cerealed over
>>> and over again was only slightly slower with threads than without. With
>>> dmd the threaded version was nearly 3x slower.
>>
>> Sounds like a severe bug in dmd or dependents. -- Andrei
>
> This reminds me of when I was parallelizing a project euler solution: atomic access was so much slower on DMD that it made performance worse than the single threaded version for one stage of the program.
>
> I know that std.parallelism does make use of core.atomic under the hood, so this may be a factor when using DMD.
Funny you should say that, a friend of mine tried porting a lock-free algorithm of his from Java to D a few weeks ago. The D version ran 3 orders of magnitude slower. Then I tried gdc and ldc on his code. ldc produced code running at around 80% of the speed of the Java version, fdc was around 30%. But dmd...
|
May 05, 2014 Re: Running Phobos unit tests in threads: I have data | ||||
---|---|---|---|---|
| ||||
Posted in reply to Rikki Cattermole | On Saturday, 3 May 2014 at 12:26:13 UTC, Rikki Cattermole wrote:
> On Saturday, 3 May 2014 at 12:24:59 UTC, Atila Neves wrote:
>>> Out of curiosity are you on Windows?
>>
>> No, Arch Linux 64-bit. I also just noticed a glaring threading bug in my code as well that somehow's never turned up. This is not a good day.
>>
>> Atila
>
> I'm surprised. Threads should be cheap on Linux. Something funky is definitely going on I bet.
Threads are never cheap.
|
May 05, 2014 Re: Running Phobos unit tests in threads: I have data | ||||
---|---|---|---|---|
| ||||
Posted in reply to Atila Neves | Going to take a wild guess, but as core.atomic.casImpl will never be inlined anywhere with DMD, due to it's inline assembly, you have the cost of building and destroying a stack frame, the cost of passing the args in, moving them into registers, saving potentially trashed registers, etc. every time it even attempts to acquire a lock, and the GC uses a single global lock for just about everything. As you can imagine, I suspect this is far from optimal, and, if I remember right, GDC uses intrinsics for the atomic operations.
On 5/5/14, Atila Neves via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Sunday, 4 May 2014 at 17:01:23 UTC, safety0ff wrote:
>> On Saturday, 3 May 2014 at 22:46:03 UTC, Andrei Alexandrescu wrote:
>>> On 5/3/14, 2:42 PM, Atila Neves wrote:
>>>> gdc gave _very_ different results. I had to use different
>>>> modules
>>>> because at some point tests started failing, but with gdc the
>>>> threaded
>>>> version runs ~3x faster.
>>>>
>>>> On my own unit-threaded benchmarks, running the UTs for
>>>> Cerealed over
>>>> and over again was only slightly slower with threads than
>>>> without. With
>>>> dmd the threaded version was nearly 3x slower.
>>>
>>> Sounds like a severe bug in dmd or dependents. -- Andrei
>>
>> This reminds me of when I was parallelizing a project euler solution: atomic access was so much slower on DMD that it made performance worse than the single threaded version for one stage of the program.
>>
>> I know that std.parallelism does make use of core.atomic under the hood, so this may be a factor when using DMD.
>
> Funny you should say that, a friend of mine tried porting a lock-free algorithm of his from Java to D a few weeks ago. The D version ran 3 orders of magnitude slower. Then I tried gdc and ldc on his code. ldc produced code running at around 80% of the speed of the Java version, fdc was around 30%. But dmd...
>
|
May 05, 2014 Re: Running Phobos unit tests in threads: I have data | ||||
---|---|---|---|---|
| ||||
On 5 May 2014 19:07, Orvid King via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> Going to take a wild guess, but as core.atomic.casImpl will never be inlined anywhere with DMD, due to it's inline assembly, you have the cost of building and destroying a stack frame, the cost of passing the args in, moving them into registers, saving potentially trashed registers, etc. every time it even attempts to acquire a lock, and the GC uses a single global lock for just about everything. As you can imagine, I suspect this is far from optimal, and, if I remember right, GDC uses intrinsics for the atomic operations.
>
Aye, and atomic intrinsics though they may be, it could even be improved by switching over to C++ atomic intrinsics, which map directly to core.atomics. :)
|
May 05, 2014 Re: Running Phobos unit tests in threads: I have data | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dicebot | On Monday, 5 May 2014 at 17:56:11 UTC, Dicebot wrote: > On Saturday, 3 May 2014 at 12:26:13 UTC, Rikki Cattermole wrote: >> On Saturday, 3 May 2014 at 12:24:59 UTC, Atila Neves wrote: >>>> Out of curiosity are you on Windows? >>> >>> No, Arch Linux 64-bit. I also just noticed a glaring threading bug in my code as well that somehow's never turned up. This is not a good day. >>> >>> Atila >> >> I'm surprised. Threads should be cheap on Linux. Something funky is definitely going on I bet. > > Threads are never cheap. Regarding this, I found this talk interesting: https://www.youtube.com/watch?v=KXuZi9aeGTw |
Copyright © 1999-2021 by the D Language Foundation