Thread overview | |||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
May 28, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
I was wondering what the current state of the std.paralellism unit tests is. Everything is green on the autotester. And when they run properly, they pass on my system. But I find that they frequently freeze after printing out totalCPUs = 6 (I have a 6-core Phenom II), and when they don't freeze, a segfault occurs at least some of the time. I'm running a pure 64-bit build stack (dmd, druntime, and Phobos are all 64-bit) on Linux. Is this expected at all? Should I do anything to try and track this down? I'm not actually using std.paralellism at the moment, so it doesn't really impact me beyond the irritation with the unit tests, but if there's a bug, then obviously it needs to be dealt with. Is this well-known, or is it something that I should be reporting? David, what is the situation with this? - Jonathan M Davis |
May 28, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | Thanks for letting me know. I have no idea why this is happening. I was seeing some weirdness on FreeBSD only, but I gave up trying to solve that until the FreeBSD port is more stable or I can reproduce it on some other OS. I've run the unittests for std.parallelism on my (old) Athlon 64 X2 tens of thousands of times on Linux 64 without any issues, so unfortunately this is going to be hard to debug. Two questions: 1. Does it happen on the latest Git version or on the 2.053 release version? 2. Could you try to figure out which test is failing? On Sat, May 28, 2011 at 9:34 PM, Jonathan M Davis <jmdavisProg at gmx.com>wrote: > I was wondering what the current state of the std.paralellism unit tests > is. > Everything is green on the autotester. And when they run properly, they > pass > on my system. But I find that they frequently freeze after printing out > > totalCPUs = 6 > > (I have a 6-core Phenom II), and when they don't freeze, a segfault occurs > at > least some of the time. I'm running a pure 64-bit build stack (dmd, > druntime, > and Phobos are all 64-bit) on Linux. Is this expected at all? Should I do > anything to try and track this down? I'm not actually using std.paralellism > at > the moment, so it doesn't really impact me beyond the irritation with the > unit > tests, but if there's a bug, then obviously it needs to be dealt with. Is > this > well-known, or is it something that I should be reporting? David, what is > the > situation with this? > > - Jonathan M Davis > _______________________________________________ > phobos mailing list > phobos at puremagic.com > http://lists.puremagic.com/mailman/listinfo/phobos > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110528/0afaa99f/attachment.html> |
May 28, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | On 2011-05-28 18:41, David Simcha wrote: > Thanks for letting me know. I have no idea why this is happening. I was seeing some weirdness on FreeBSD only, but I gave up trying to solve that until the FreeBSD port is more stable or I can reproduce it on some other OS. I've run the unittests for std.parallelism on my (old) Athlon 64 X2 tens of thousands of times on Linux 64 without any issues, so unfortunately this is going to be hard to debug. Two questions: > > 1. Does it happen on the latest Git version or on the 2.053 release version? It definitely happens on the latest Git version. I don't know if it happens on 2.053. I'll have to check. > 2. Could you try to figure out which test is failing? I'll look into it. Unfortunately, as far as actually coming up with a fix though, I'll be of minimal help at the moment, since I have yet to even really look at std.parallelism's API, let alone the code. I should be able to figure out which test is segfaulting though. - Jonathan M Davis |
May 31, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | On Sat, May 28, 2011 at 6:41 PM, David Simcha <dsimcha at gmail.com> wrote: > Thanks for letting me know. I have no idea why this is happening. I was seeing some weirdness on FreeBSD only, but I gave up trying to solve that until the FreeBSD port is more stable or I can reproduce it on some other OS. > I have test case that reliably segfaults on 64-bit Fedora 14. I can trim it down to the minimal failing case if you're interested. The particular problem I observe occurs when trying to spread many fibers over more than one thread. Regards, -steve -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110531/93048ee3/attachment.html> |
June 01, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to SK | An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110601/0122c1db/attachment.html> |
June 01, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | On Jun 1, 2011, at 5:24 AM, David Simcha wrote:
>
> I'm not sure this is supposed to work. std.parallelism uses thread-local (i.e. physical thread-local) storage in a few places. I'm not even sure how threads and fibers are supposed to interact.
If you use a thread-local variable in a fiber it will be the one local to the executing thread. So at the moment, moving an executing fiber between threads is inadvisable. It's fine to let it finish and restart it on another thread though.
|
June 02, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | On 2011-05-28 18:41, David Simcha wrote:
> Thanks for letting me know. I have no idea why this is happening. I was seeing some weirdness on FreeBSD only, but I gave up trying to solve that until the FreeBSD port is more stable or I can reproduce it on some other OS. I've run the unittests for std.parallelism on my (old) Athlon 64 X2 tens of thousands of times on Linux 64 without any issues, so unfortunately this is going to be hard to debug. Two questions:
>
> 1. Does it happen on the latest Git version or on the 2.053 release version?
>
> 2. Could you try to figure out which test is failing?
I added statements at the beginning and end of each unit test block, and when it segfaults, I get this:
std/parallelism.d(223)
std/parallelism.d(236)
std/parallelism.d(3155)
totalCPUs = 6
std/parallelism.d(3378)
So, that's the unit testblock that starts with
poolInstance = new TaskPool(2);
scope(exit) poolInstance.stop();
3378 is the line at the end of that unittest block, so I assume that there's either a destructor that's causing the problem or that poolInstance.stop() is causing the problem. Also, if it matters, it was running the unit tests in release mode when it died. However, I _have_ seen that test freeze while in debug mode (though it's quite possible that the program freezing and the segfault are completely separate). Also, the freeze definitely happens sometimes with 2.053, but I don't know if the segfault does. It happens infrequently enough that it's hard to tell. I expect that it does, but I haven't seen it yet. However, running dmd 2.053 (with the latest druntime and Phobos), I did get this test failure once:
core.exception.AssertError at std/parallelism.d(3244): [2, 4, 5, 6]
(which is probably 3241 if you don't have the extra print statements that I added at the beginning and end of the unittest blocks). But again, std.parallelism _usually_ succeeds, so it's kind of hard to know what's going on with the tests. I've seen them freeze after printing out totalCPUs = 6 (and before it gets to the end of that unittest block). I've seen it segfault. And I've seen that AssertError. So, it definitely has intermitent problems on my 64-bit, 6-core AMD system with a pure 64-bit stack. How much that has to do with my system or the architecture, I have no idea, but _something_ in std.parallelism still needs to be ironed out.
- Jonathan M Davis
|
June 02, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | Thanks. A second look with fresher eyes revealed a subtle but serious bug in the ResubmittingTasks base mixin (which affects parallel foreach, map and amap) and a relatively minor corner-case bug in reduce(). Both were introduced after the 2.053 release as a result of over-aggressive optimizations. I've checked in fixes for these. I'm not sure whether they're the root cause of the issues you're seeing, though. If it's not too much work, please try to reproduce your bugs again.
On 6/2/2011 3:10 AM, Jonathan M Davis wrote:
> On 2011-05-28 18:41, David Simcha wrote:
>> Thanks for letting me know. I have no idea why this is happening. I was seeing some weirdness on FreeBSD only, but I gave up trying to solve that until the FreeBSD port is more stable or I can reproduce it on some other OS. I've run the unittests for std.parallelism on my (old) Athlon 64 X2 tens of thousands of times on Linux 64 without any issues, so unfortunately this is going to be hard to debug. Two questions:
>>
>> 1. Does it happen on the latest Git version or on the 2.053 release version?
>>
>> 2. Could you try to figure out which test is failing?
> I added statements at the beginning and end of each unit test block, and when it segfaults, I get this:
>
> std/parallelism.d(223)
> std/parallelism.d(236)
> std/parallelism.d(3155)
> totalCPUs = 6
> std/parallelism.d(3378)
>
> So, that's the unit testblock that starts with
>
> poolInstance = new TaskPool(2);
> scope(exit) poolInstance.stop();
>
> 3378 is the line at the end of that unittest block, so I assume that there's either a destructor that's causing the problem or that poolInstance.stop() is causing the problem. Also, if it matters, it was running the unit tests in release mode when it died. However, I _have_ seen that test freeze while in debug mode (though it's quite possible that the program freezing and the segfault are completely separate). Also, the freeze definitely happens sometimes with 2.053, but I don't know if the segfault does. It happens infrequently enough that it's hard to tell. I expect that it does, but I haven't seen it yet. However, running dmd 2.053 (with the latest druntime and Phobos), I did get this test failure once:
>
> core.exception.AssertError at std/parallelism.d(3244): [2, 4, 5, 6]
>
> (which is probably 3241 if you don't have the extra print statements that I added at the beginning and end of the unittest blocks). But again, std.parallelism _usually_ succeeds, so it's kind of hard to know what's going on with the tests. I've seen them freeze after printing out totalCPUs = 6 (and before it gets to the end of that unittest block). I've seen it segfault. And I've seen that AssertError. So, it definitely has intermitent problems on my 64-bit, 6-core AMD system with a pure 64-bit stack. How much that has to do with my system or the architecture, I have no idea, but _something_ in std.parallelism still needs to be ironed out.
>
> - Jonathan M Davis
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
|
June 02, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | On 2011-06-02 06:30, David Simcha wrote:
> Thanks. A second look with fresher eyes revealed a subtle but serious bug in the ResubmittingTasks base mixin (which affects parallel foreach, map and amap) and a relatively minor corner-case bug in reduce(). Both were introduced after the 2.053 release as a result of over-aggressive optimizations. I've checked in fixes for these. I'm not sure whether they're the root cause of the issues you're seeing, though. If it's not too much work, please try to reproduce your bugs again.
Well, I have yet to see the failure again. Unfortunately, that doesn't guarantee anything given the intermittent nature of the failure, but I think that there's a good chance that it's been fixed. I'll post about it if it happens again, but I think that for the moment, we can assume that it's been fixed.
- Jonathan M Davis
|
June 03, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On 2011-06-02 23:52, Jonathan M Davis wrote:
> On 2011-06-02 06:30, David Simcha wrote:
> > Thanks. A second look with fresher eyes revealed a subtle but serious bug in the ResubmittingTasks base mixin (which affects parallel foreach, map and amap) and a relatively minor corner-case bug in reduce(). Both were introduced after the 2.053 release as a result of over-aggressive optimizations. I've checked in fixes for these. I'm not sure whether they're the root cause of the issues you're seeing, though. If it's not too much work, please try to reproduce your bugs again.
>
> Well, I have yet to see the failure again. Unfortunately, that doesn't guarantee anything given the intermittent nature of the failure, but I think that there's a good chance that it's been fixed. I'll post about it if it happens again, but I think that for the moment, we can assume that it's been fixed.
I take that back. It just happened again. The std.parallelism tests segfaulted.
- Jonathan M Davis
|
Copyright © 1999-2021 by the D Language Foundation