June 03, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | Ok, I'm running the main unit test in a loop in a debugger right now trying to reproduce this thing. 64-bit Ubuntu, 64-bit DMD, building 64-bit binary, release mode. So far I've been through ~175 runs and have nothing but successes. Can you give me a rough percentage of the time that it fails for you?
One thing you could do to help is to do basically the same thing I'm doing but on your setup. Execute the main unit test in a loop (just put a loop around the whole thing) inside GDB. If/when it crashes, send me a backtrace of all the threads (thread apply all bt), the register info (info registers) and the disassembly of the function where it crashed (disass).
On 6/3/2011 3:05 AM, Jonathan M Davis wrote:
> On 2011-06-02 23:52, Jonathan M Davis wrote:
>> On 2011-06-02 06:30, David Simcha wrote:
>>> Thanks. A second look with fresher eyes revealed a subtle but serious bug in the ResubmittingTasks base mixin (which affects parallel foreach, map and amap) and a relatively minor corner-case bug in reduce(). Both were introduced after the 2.053 release as a result of over-aggressive optimizations. I've checked in fixes for these. I'm not sure whether they're the root cause of the issues you're seeing, though. If it's not too much work, please try to reproduce your bugs again.
>> Well, I have yet to see the failure again. Unfortunately, that doesn't guarantee anything given the intermittent nature of the failure, but I think that there's a good chance that it's been fixed. I'll post about it if it happens again, but I think that for the moment, we can assume that it's been fixed.
> I take that back. It just happened again. The std.parallelism tests segfaulted.
>
> - Jonathan M Davis
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
|
June 03, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | Ok, I left the unit tests running in a loop on my home computer all day while I was at work. They ran about 13,000 times without failing once. Something about your setup is enabling the failures or at least making them drastically more likely.
On 6/3/2011 3:05 AM, Jonathan M Davis wrote:
> On 2011-06-02 23:52, Jonathan M Davis wrote:
>> On 2011-06-02 06:30, David Simcha wrote:
>>> Thanks. A second look with fresher eyes revealed a subtle but serious bug in the ResubmittingTasks base mixin (which affects parallel foreach, map and amap) and a relatively minor corner-case bug in reduce(). Both were introduced after the 2.053 release as a result of over-aggressive optimizations. I've checked in fixes for these. I'm not sure whether they're the root cause of the issues you're seeing, though. If it's not too much work, please try to reproduce your bugs again.
>> Well, I have yet to see the failure again. Unfortunately, that doesn't guarantee anything given the intermittent nature of the failure, but I think that there's a good chance that it's been fixed. I'll post about it if it happens again, but I think that for the moment, we can assume that it's been fixed.
> I take that back. It just happened again. The std.parallelism tests segfaulted.
>
> - Jonathan M Davis
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
|
June 03, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | On 2011-06-03 15:24, David Simcha wrote:
> Ok, I left the unit tests running in a loop on my home computer all day while I was at work. They ran about 13,000 times without failing once. Something about your setup is enabling the failures or at least making them drastically more likely.
Well, unfortunately, this does seem to be one of those modules where a difference in CPU or architecture can have a large impact on behavior. In any case, I'll look at setting up the tests to run in a loop and see how well they do. And I guess that I'll have to spend some time debugging std.parallelism to at least figure out exactly _what_ is causing a segfault, though I question that I'll be able to actually fix the problem without spending a considerable amount of time learning how std.parallelism works.
- Jonathan M Davis
|
June 03, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On 6/3/2011 6:43 PM, Jonathan M Davis wrote:
> On 2011-06-03 15:24, David Simcha wrote:
>> Ok, I left the unit tests running in a loop on my home computer all day while I was at work. They ran about 13,000 times without failing once. Something about your setup is enabling the failures or at least making them drastically more likely.
> Well, unfortunately, this does seem to be one of those modules where a difference in CPU or architecture can have a large impact on behavior. In any case, I'll look at setting up the tests to run in a loop and see how well they do. And I guess that I'll have to spend some time debugging std.parallelism to at least figure out exactly _what_ is causing a segfault, though I question that I'll be able to actually fix the problem without spending a considerable amount of time learning how std.parallelism works.
I'd sincerely appreciate if you did this. If you could tell me specifically where the problem is, there's a decent chance I could figure it out just by reading the code carefully, especially if I got a full stack trace, disassembly and registers.
|
June 03, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | One thought, change the code to use more threads than you have cpu's. Not as a permanent change, but to encourage more timing randomization.
On Fri, 3 Jun 2011, David Simcha wrote:
> Date: Fri, 03 Jun 2011 18:24:51 -0400
> From: David Simcha <dsimcha at gmail.com>
> Reply-To: Discuss the phobos library for D <phobos at puremagic.com>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Subject: Re: [phobos] State of std.parallelism unit tests
>
> Ok, I left the unit tests running in a loop on my home computer all day while I was at work. They ran about 13,000 times without failing once. Something about your setup is enabling the failures or at least making them drastically more likely.
>
> On 6/3/2011 3:05 AM, Jonathan M Davis wrote:
> > On 2011-06-02 23:52, Jonathan M Davis wrote:
> > > On 2011-06-02 06:30, David Simcha wrote:
> > > > Thanks. A second look with fresher eyes revealed a subtle but serious bug in the ResubmittingTasks base mixin (which affects parallel foreach, map and amap) and a relatively minor corner-case bug in reduce(). Both were introduced after the 2.053 release as a result of over-aggressive optimizations. I've checked in fixes for these. I'm not sure whether they're the root cause of the issues you're seeing, though. If it's not too much work, please try to reproduce your bugs again.
> > > Well, I have yet to see the failure again. Unfortunately, that doesn't guarantee anything given the intermittent nature of the failure, but I think that there's a good chance that it's been fixed. I'll post about it if it happens again, but I think that for the moment, we can assume that it's been fixed.
> > I take that back. It just happened again. The std.parallelism tests segfaulted.
> >
> > - Jonathan M Davis
> > _______________________________________________
> > phobos mailing list
> > phobos at puremagic.com
> > http://lists.puremagic.com/mailman/listinfo/phobos
> >
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
|
June 03, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brad Roberts | Tried it. I'll keep it running for a while but so far I'm at 300 iterations.
On 6/3/2011 7:08 PM, Brad Roberts wrote:
> One thought, change the code to use more threads than you have cpu's. Not as a permanent change, but to encourage more timing randomization.
>
> On Fri, 3 Jun 2011, David Simcha wrote:
>
>> Date: Fri, 03 Jun 2011 18:24:51 -0400
>> From: David Simcha<dsimcha at gmail.com>
>> Reply-To: Discuss the phobos library for D<phobos at puremagic.com>
>> To: Discuss the phobos library for D<phobos at puremagic.com>
>> Subject: Re: [phobos] State of std.parallelism unit tests
>>
>> Ok, I left the unit tests running in a loop on my home computer all day while I was at work. They ran about 13,000 times without failing once. Something about your setup is enabling the failures or at least making them drastically more likely.
>>
>> On 6/3/2011 3:05 AM, Jonathan M Davis wrote:
>>> On 2011-06-02 23:52, Jonathan M Davis wrote:
>>>> On 2011-06-02 06:30, David Simcha wrote:
>>>>> Thanks. A second look with fresher eyes revealed a subtle but serious bug in the ResubmittingTasks base mixin (which affects parallel foreach, map and amap) and a relatively minor corner-case bug in reduce(). Both were introduced after the 2.053 release as a result of over-aggressive optimizations. I've checked in fixes for these. I'm not sure whether they're the root cause of the issues you're seeing, though. If it's not too much work, please try to reproduce your bugs again.
>>>> Well, I have yet to see the failure again. Unfortunately, that doesn't guarantee anything given the intermittent nature of the failure, but I think that there's a good chance that it's been fixed. I'll post about it if it happens again, but I think that for the moment, we can assume that it's been fixed.
>>> I take that back. It just happened again. The std.parallelism tests segfaulted.
>>>
>>> - Jonathan M Davis
>>> _______________________________________________
>>> phobos mailing list
>>> phobos at puremagic.com
>>> http://lists.puremagic.com/mailman/listinfo/phobos
>>>
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
>>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
|
June 07, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | On Fri, Jun 3, 2011 at 4:30 PM, David Simcha <dsimcha at gmail.com> wrote: > Tried it. I'll keep it running for a while but so far I'm at 300 iterations. > > > I have a test (~100 loc) that fails immediately on 64-bit Linux without resorting to fiber migration. Using the FiberFixes branch allows it to pass. If you're interested I'd be happy to provide. Also, will I asked before but never got an answer: Is there any plan to merge FiberFixes with the the trunk? Thanks, -steve -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110607/9fa61dfa/attachment.html> |
June 08, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to SK | On 6/7/2011 11:04 PM, SK wrote:
> I have a test (~100 loc) that fails immediately on 64-bit Linux without resorting to fiber migration. Using the FiberFixes branch allows it to pass. If you're interested I'd be happy to provide. Also, will I asked before but never got an answer: Is there any plan to merge FiberFixes with the the trunk?
Yes, please provide. I know there is still an outstanding issue with std.parallelism, but I can't reproduce it so there's not much I can do about it. I'll take any help I can get.
Thank you,
Dave
|
June 08, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | On Wed, Jun 8, 2011 at 5:54 AM, David Simcha <dsimcha at gmail.com> wrote: > On 6/7/2011 11:04 PM, SK wrote: > >> I have a test (~100 loc) that fails immediately on 64-bit Linux without >> resorting to fiber migration. Using the FiberFixes branch allows it to >> pass. If you're interested I'd be happy to provide. >> Also, will I asked before but never got an answer: Is there any plan to >> merge FiberFixes with the the trunk? >> > Yes, please provide. I know there is still an outstanding issue with std.parallelism, but I can't reproduce it so there's not much I can do about it. I'll take any help I can get > > > This test launches 10 threads that each launch 100 fibers that each yield 1000 times. I compile with /usr/bin/dmd -w -wi -gc. HTH, -steve //import std.stdio; import core.thread; import std.stdio; import std.exception; shared uint join_done = 0; version( Windows ) { import core.sys.windows.windows; } class fiber_worker_t { this( uint yield_count ) { m_done = false; m_yield_count = yield_count; m_fiber = new Fiber( &func ); } bool call() { m_fiber.call(); return is_term(); } bool is_term() { return( m_fiber.state() == m_fiber.State.TERM ); } bool is_done() { return( m_done ); } protected: uint m_yield_count; Fiber m_fiber; bool m_done; void func() { uint i = m_yield_count; while( --i ) m_fiber.yield(); m_done = true; } } class thread_worker_t { this( uint fiber_worker_count, uint fiber_yield_count ) { m_fiber_worker_count = fiber_worker_count; m_fiber_yield_count = fiber_yield_count; m_thread = new Thread( &func ); /* // I moved this to the thread itself m_fib_array = new fiber_worker_t[fiber_worker_count]; foreach( ref f; m_fib_array ) f = new fiber_worker_t(fiber_yield_count); */ } void start() { m_thread.start(); } protected: uint m_fiber_worker_count; uint m_fiber_yield_count; Thread m_thread; // func() executes in each thread's context void func() { fiber_worker_t[] m_fib_array = new fiber_worker_t[m_fiber_worker_count]; foreach( ref f; m_fib_array ) f = new fiber_worker_t(m_fiber_yield_count); // fibers are cooperative and need a driver loop bool done; do { done = true; foreach( f; m_fib_array ) { done &= f.call(); // writeln( &this, " ", f, " ", &f ); } } while( !done ); // verify that fibers are really done foreach( f; m_fib_array ) enforce( f.is_done() ); } } void thread_fiber_test( const uint thread_count, const uint fiber_count, const uint fiber_yield_count ) { thread_worker_t[] thread_worker_array = new thread_worker_t[thread_count]; foreach( ref t; thread_worker_array ) t = new thread_worker_t(fiber_count, fiber_yield_count); foreach( t; thread_worker_array ) t.start(); thread_joinAll(); join_done = 1; } int main() { const uint thread_count = 10; const uint fiber_count = 100; const uint fiber_yield_count = 1000; thread_fiber_test( thread_count, fiber_count, fiber_yield_count ); return 0; } -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110608/6495f5ed/attachment.html> |
June 08, 2011 [phobos] State of std.parallelism unit tests | ||||
---|---|---|---|---|
| ||||
Posted in reply to SK | ??? This doesn't appear to use std.parallelism anywhere. On Wed, Jun 8, 2011 at 10:12 AM, SK <sk at metrokings.com> wrote: > On Wed, Jun 8, 2011 at 5:54 AM, David Simcha <dsimcha at gmail.com> wrote: > >> On 6/7/2011 11:04 PM, SK wrote: >> >>> I have a test (~100 loc) that fails immediately on 64-bit Linux without >>> resorting to fiber migration. Using the FiberFixes branch allows it to >>> pass. If you're interested I'd be happy to provide. >>> Also, will I asked before but never got an answer: Is there any plan to >>> merge FiberFixes with the the trunk? >>> >> Yes, please provide. I know there is still an outstanding issue with std.parallelism, but I can't reproduce it so there's not much I can do about it. I'll take any help I can get >> >> >> > > This test launches 10 threads that each launch 100 fibers that each yield > 1000 times. I compile with /usr/bin/dmd -w -wi -gc. > HTH, > -steve > > //import std.stdio; > import core.thread; > import std.stdio; > import std.exception; > shared uint join_done = 0; > version( Windows ) { import core.sys.windows.windows; } > class fiber_worker_t > { > this( uint yield_count ) > { > m_done = false; > m_yield_count = yield_count; > m_fiber = new Fiber( &func ); > } > bool call() > { > m_fiber.call(); > return is_term(); > } > bool is_term() { return( m_fiber.state() == m_fiber.State.TERM ); } > bool is_done() { return( m_done ); } > protected: > uint m_yield_count; > Fiber m_fiber; > bool m_done; > void func() > { > uint i = m_yield_count; > while( --i ) > m_fiber.yield(); > m_done = true; > } > } > class thread_worker_t > { > this( uint fiber_worker_count, uint fiber_yield_count ) > { > m_fiber_worker_count = fiber_worker_count; > m_fiber_yield_count = fiber_yield_count; > m_thread = new Thread( &func ); > /* > // I moved this to the thread itself > m_fib_array = new fiber_worker_t[fiber_worker_count]; > foreach( ref f; m_fib_array ) > f = new fiber_worker_t(fiber_yield_count); > */ > } > void start() > { > m_thread.start(); > } > protected: > uint m_fiber_worker_count; > uint m_fiber_yield_count; > Thread m_thread; > // func() executes in each thread's context > void func() > { > fiber_worker_t[] m_fib_array = new > fiber_worker_t[m_fiber_worker_count]; > foreach( ref f; m_fib_array ) > f = new fiber_worker_t(m_fiber_yield_count); > // fibers are cooperative and need a driver loop > bool done; > do > { > done = true; > foreach( f; m_fib_array ) > { > done &= f.call(); > // writeln( &this, " ", f, " ", &f ); > } > } while( !done ); > // verify that fibers are really done > foreach( f; m_fib_array ) > enforce( f.is_done() ); > } > } > void thread_fiber_test( const uint thread_count, const uint fiber_count, > const uint fiber_yield_count ) > { > thread_worker_t[] thread_worker_array = new > thread_worker_t[thread_count]; > foreach( ref t; thread_worker_array ) > t = new thread_worker_t(fiber_count, fiber_yield_count); > foreach( t; thread_worker_array ) t.start(); > thread_joinAll(); > join_done = 1; > } > int main() > { > const uint thread_count = 10; > const uint fiber_count = 100; > const uint fiber_yield_count = 1000; > thread_fiber_test( thread_count, fiber_count, fiber_yield_count ); > return 0; > } > > > _______________________________________________ > phobos mailing list > phobos at puremagic.com > http://lists.puremagic.com/mailman/listinfo/phobos > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110608/55bee2ca/attachment-0001.html> |
Copyright © 1999-2021 by the D Language Foundation