Jump to page: 1 2 3
Thread overview
[phobos] State of std.parallelism unit tests
May 29, 2011
Jonathan M Davis
May 29, 2011
David Simcha
May 29, 2011
Jonathan M Davis
Jun 01, 2011
SK
Jun 01, 2011
David Simcha
Jun 01, 2011
Sean Kelly
Jun 02, 2011
Jonathan M Davis
Jun 02, 2011
David Simcha
Jun 03, 2011
Jonathan M Davis
Jun 03, 2011
Jonathan M Davis
Jun 03, 2011
David Simcha
Jun 03, 2011
David Simcha
Jun 03, 2011
Jonathan M Davis
Jun 03, 2011
David Simcha
Jun 03, 2011
Brad Roberts
Jun 03, 2011
David Simcha
Jun 08, 2011
SK
Jun 08, 2011
David Simcha
Jun 08, 2011
SK
Jun 08, 2011
David Simcha
Jun 08, 2011
SK
May 28, 2011
I was wondering what the current state of the std.paralellism unit tests is. Everything is green on the autotester. And when they run properly, they pass on my system. But I find that they frequently freeze after printing out

totalCPUs = 6

(I have a 6-core Phenom II), and when they don't freeze, a segfault occurs at least some of the time. I'm running a pure 64-bit build stack (dmd, druntime, and Phobos are all 64-bit) on Linux. Is this expected at all? Should I do anything to try and track this down? I'm not actually using std.paralellism at the moment, so it doesn't really impact me beyond the irritation with the unit tests, but if there's a bug, then obviously it needs to be dealt with. Is this well-known, or is it something that I should be reporting? David, what is the situation with this?

- Jonathan M Davis
May 28, 2011
Thanks for letting me know.  I have no idea why this is happening.  I was seeing some weirdness on FreeBSD only, but I gave up trying to solve that until the FreeBSD port is more stable or I can reproduce it on some other OS.  I've run the unittests for std.parallelism on my (old) Athlon 64 X2 tens of thousands of times on Linux 64 without any issues, so unfortunately this is going to be hard to debug.  Two questions:

1.  Does it happen on the latest Git version or on the 2.053 release version?

2.  Could you try to figure out which test is failing?


On Sat, May 28, 2011 at 9:34 PM, Jonathan M Davis <jmdavisProg at gmx.com>wrote:

> I was wondering what the current state of the std.paralellism unit tests
> is.
> Everything is green on the autotester. And when they run properly, they
> pass
> on my system. But I find that they frequently freeze after printing out
>
> totalCPUs = 6
>
> (I have a 6-core Phenom II), and when they don't freeze, a segfault occurs
> at
> least some of the time. I'm running a pure 64-bit build stack (dmd,
> druntime,
> and Phobos are all 64-bit) on Linux. Is this expected at all? Should I do
> anything to try and track this down? I'm not actually using std.paralellism
> at
> the moment, so it doesn't really impact me beyond the irritation with the
> unit
> tests, but if there's a bug, then obviously it needs to be dealt with. Is
> this
> well-known, or is it something that I should be reporting? David, what is
> the
> situation with this?
>
> - Jonathan M Davis
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110528/0afaa99f/attachment.html>
May 28, 2011
On 2011-05-28 18:41, David Simcha wrote:
> Thanks for letting me know.  I have no idea why this is happening.  I was seeing some weirdness on FreeBSD only, but I gave up trying to solve that until the FreeBSD port is more stable or I can reproduce it on some other OS.  I've run the unittests for std.parallelism on my (old) Athlon 64 X2 tens of thousands of times on Linux 64 without any issues, so unfortunately this is going to be hard to debug.  Two questions:
> 
> 1.  Does it happen on the latest Git version or on the 2.053 release version?

It definitely happens on the latest Git version. I don't know if it happens on 2.053. I'll have to check.

> 2.  Could you try to figure out which test is failing?

I'll look into it. Unfortunately, as far as actually coming up with a fix though, I'll be of minimal help at the moment, since I have yet to even really look at std.parallelism's API, let alone the code. I should be able to figure out which test is segfaulting though.

- Jonathan M Davis
May 31, 2011
On Sat, May 28, 2011 at 6:41 PM, David Simcha <dsimcha at gmail.com> wrote:

> Thanks for letting me know.  I have no idea why this is happening.  I was seeing some weirdness on FreeBSD only, but I gave up trying to solve that until the FreeBSD port is more stable or I can reproduce it on some other OS.
>

I have test case that reliably segfaults on 64-bit Fedora 14.  I can trim it down to the minimal failing case if you're interested.  The particular problem I observe occurs when trying to spread many fibers over more than one thread.

Regards,
-steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110531/93048ee3/attachment.html>


June 01, 2011
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110601/0122c1db/attachment.html>
June 01, 2011
On Jun 1, 2011, at 5:24 AM, David Simcha wrote:
> 
> I'm not sure this is supposed to work.  std.parallelism uses thread-local (i.e. physical thread-local) storage in a few places. I'm not even sure how threads and fibers are supposed to interact.

If you use a thread-local variable in a fiber it will be the one local to the executing thread.  So at the moment, moving an executing fiber between threads is inadvisable.  It's fine to let it finish and restart it on another thread though.

June 02, 2011
On 2011-05-28 18:41, David Simcha wrote:
> Thanks for letting me know.  I have no idea why this is happening.  I was seeing some weirdness on FreeBSD only, but I gave up trying to solve that until the FreeBSD port is more stable or I can reproduce it on some other OS.  I've run the unittests for std.parallelism on my (old) Athlon 64 X2 tens of thousands of times on Linux 64 without any issues, so unfortunately this is going to be hard to debug.  Two questions:
> 
> 1.  Does it happen on the latest Git version or on the 2.053 release version?
> 
> 2.  Could you try to figure out which test is failing?

I added statements at the beginning and end of each unit test block, and when it segfaults, I get this:

std/parallelism.d(223)
std/parallelism.d(236)
std/parallelism.d(3155)
totalCPUs = 6
std/parallelism.d(3378)

So, that's the unit testblock that starts with

    poolInstance = new TaskPool(2);
    scope(exit) poolInstance.stop();

3378 is the line at the end of that unittest block, so I assume that there's either a destructor that's causing the problem or that poolInstance.stop() is causing the problem. Also, if it matters, it was running the unit tests in release mode when it died. However, I _have_ seen that test freeze while in debug mode (though it's quite possible that the program freezing and the segfault are completely separate). Also, the freeze definitely happens sometimes with 2.053, but I don't know if the segfault does. It happens infrequently enough that it's hard to tell. I expect that it does, but I haven't seen it yet. However, running dmd 2.053 (with the latest druntime and Phobos), I did get this test failure once:

core.exception.AssertError at std/parallelism.d(3244): [2, 4, 5, 6]

(which is probably 3241 if you don't have the extra print statements that I added at the beginning and end of the unittest blocks). But again, std.parallelism _usually_ succeeds, so it's kind of hard to know what's going on with the tests. I've seen them freeze after printing out totalCPUs = 6 (and before it gets to the end of that unittest block). I've seen it segfault. And I've seen that AssertError. So, it definitely has intermitent problems on my 64-bit, 6-core AMD system with a pure 64-bit stack. How much that has to do with my system or the architecture, I have no idea, but _something_ in std.parallelism still needs to be ironed out.

- Jonathan M Davis
June 02, 2011
Thanks.  A second look with fresher eyes revealed a subtle but serious bug in the ResubmittingTasks base mixin (which affects parallel foreach, map and amap) and a relatively minor corner-case bug in reduce().  Both were introduced after the 2.053 release as a result of over-aggressive optimizations.  I've checked in fixes for these.  I'm not sure whether they're the root cause of the issues you're seeing, though.  If it's not too much work, please try to reproduce your bugs again.

On 6/2/2011 3:10 AM, Jonathan M Davis wrote:
> On 2011-05-28 18:41, David Simcha wrote:
>> Thanks for letting me know.  I have no idea why this is happening.  I was seeing some weirdness on FreeBSD only, but I gave up trying to solve that until the FreeBSD port is more stable or I can reproduce it on some other OS.  I've run the unittests for std.parallelism on my (old) Athlon 64 X2 tens of thousands of times on Linux 64 without any issues, so unfortunately this is going to be hard to debug.  Two questions:
>>
>> 1.  Does it happen on the latest Git version or on the 2.053 release version?
>>
>> 2.  Could you try to figure out which test is failing?
> I added statements at the beginning and end of each unit test block, and when it segfaults, I get this:
>
> std/parallelism.d(223)
> std/parallelism.d(236)
> std/parallelism.d(3155)
> totalCPUs = 6
> std/parallelism.d(3378)
>
> So, that's the unit testblock that starts with
>
>      poolInstance = new TaskPool(2);
>      scope(exit) poolInstance.stop();
>
> 3378 is the line at the end of that unittest block, so I assume that there's either a destructor that's causing the problem or that poolInstance.stop() is causing the problem. Also, if it matters, it was running the unit tests in release mode when it died. However, I _have_ seen that test freeze while in debug mode (though it's quite possible that the program freezing and the segfault are completely separate). Also, the freeze definitely happens sometimes with 2.053, but I don't know if the segfault does. It happens infrequently enough that it's hard to tell. I expect that it does, but I haven't seen it yet. However, running dmd 2.053 (with the latest druntime and Phobos), I did get this test failure once:
>
> core.exception.AssertError at std/parallelism.d(3244): [2, 4, 5, 6]
>
> (which is probably 3241 if you don't have the extra print statements that I added at the beginning and end of the unittest blocks). But again, std.parallelism _usually_ succeeds, so it's kind of hard to know what's going on with the tests. I've seen them freeze after printing out totalCPUs = 6 (and before it gets to the end of that unittest block). I've seen it segfault. And I've seen that AssertError. So, it definitely has intermitent problems on my 64-bit, 6-core AMD system with a pure 64-bit stack. How much that has to do with my system or the architecture, I have no idea, but _something_ in std.parallelism still needs to be ironed out.
>
> - Jonathan M Davis
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>

June 02, 2011
On 2011-06-02 06:30, David Simcha wrote:
> Thanks.  A second look with fresher eyes revealed a subtle but serious bug in the ResubmittingTasks base mixin (which affects parallel foreach, map and amap) and a relatively minor corner-case bug in reduce().  Both were introduced after the 2.053 release as a result of over-aggressive optimizations.  I've checked in fixes for these.  I'm not sure whether they're the root cause of the issues you're seeing, though.  If it's not too much work, please try to reproduce your bugs again.

Well, I have yet to see the failure again. Unfortunately, that doesn't guarantee anything given the intermittent nature of the failure, but I think that there's a good chance that it's been fixed. I'll post about it if it happens again, but I think that for the moment, we can assume that it's been fixed.

- Jonathan M Davis
June 03, 2011
On 2011-06-02 23:52, Jonathan M Davis wrote:
> On 2011-06-02 06:30, David Simcha wrote:
> > Thanks.  A second look with fresher eyes revealed a subtle but serious bug in the ResubmittingTasks base mixin (which affects parallel foreach, map and amap) and a relatively minor corner-case bug in reduce().  Both were introduced after the 2.053 release as a result of over-aggressive optimizations.  I've checked in fixes for these.  I'm not sure whether they're the root cause of the issues you're seeing, though.  If it's not too much work, please try to reproduce your bugs again.
> 
> Well, I have yet to see the failure again. Unfortunately, that doesn't guarantee anything given the intermittent nature of the failure, but I think that there's a good chance that it's been fixed. I'll post about it if it happens again, but I think that for the moment, we can assume that it's been fixed.

I take that back. It just happened again. The std.parallelism tests segfaulted.

- Jonathan M Davis
« First   ‹ Prev
1 2 3