Thread overview
Re: Unsynchronized int access from threads
Jun 18, 2020
H. S. Teoh
Jun 19, 2020
Timon Gehr
Jun 19, 2020
H. S. Teoh
June 18, 2020
On Thu, Jun 18, 2020 at 03:07:54PM -0400, Steven Schveighoffer via Digitalmars-d wrote: [...]
> Let's say a value is 0xffff0000
> 
> Thread that is looking for 0s reads the lower half of the number -> 0x????0000
> 
> Context switches, thread 2 writes 0x0000ffff into that memory slot.
> 
> Back at the zero-scanning thread, it reads the second half -> 0x0000????
> 
> Combines with the first part, and now it sees a 0 where there isn't one.
[...]

Hmmm very good point, hadn't thought of that. :-/  Oh well, I'll just think of another way of parallelizing my code.  I think I can break the work up into large enough chunks that the overhead of using traditional synchronization primitives will be insignificant.

Thanks all for the replies!


T

-- 
A mathematician learns more and more about less and less, until he knows everything about nothing; whereas a philospher learns less and less about more and more, until he knows nothing about everything.
June 18, 2020
On 6/18/20 6:15 PM, H. S. Teoh wrote:
> On Thu, Jun 18, 2020 at 03:07:54PM -0400, Steven Schveighoffer via Digitalmars-d wrote:
> [...]
>> Let's say a value is 0xffff0000
>>
>> Thread that is looking for 0s reads the lower half of the number ->
>> 0x????0000
>>
>> Context switches, thread 2 writes 0x0000ffff into that memory slot.
>>
>> Back at the zero-scanning thread, it reads the second half ->
>> 0x0000????
>>
>> Combines with the first part, and now it sees a 0 where there isn't
>> one.
> [...]
> 
> Hmmm very good point, hadn't thought of that. :-/  Oh well, I'll just
> think of another way of parallelizing my code.  I think I can break the
> work up into large enough chunks that the overhead of using traditional
> synchronization primitives will be insignificant.

Once again the asinine response "all inter-thread communication must be done by a handshake in which both threads participate knowingly" turns out to be correct.
June 19, 2020
On 19.06.20 00:15, H. S. Teoh wrote:
> Oh well, I'll just
> think of another way of parallelizing my code.

I think Steven just says that your reads and writes have to be atomic for the behavior to be defined at language level, so using core.atomic with memory order `raw` may in fact do what you want. (Or not, I am not sure what your full use case is.) I don't think in terms of generated assembly this will differ from what you had in mind as aligned word stores and loads are atomic on x86.
June 18, 2020
On Fri, Jun 19, 2020 at 06:00:32AM +0200, Timon Gehr via Digitalmars-d wrote:
> On 19.06.20 00:15, H. S. Teoh wrote:
> > Oh well, I'll just think of another way of parallelizing my code.
> 
> I think Steven just says that your reads and writes have to be atomic for the behavior to be defined at language level, so using core.atomic with memory order `raw` may in fact do what you want. (Or not, I am not sure what your full use case is.) I don't think in terms of generated assembly this will differ from what you had in mind as aligned word stores and loads are atomic on x86.

What I meant is that I'm thinking of optimizing at a higher level, to arrange it in such a way that I don't need to lock. E.g., if scanning for zeroes can't easily multithread with updating the array, perhaps the better way to do it is to let them run sequentially, but multithread across multiple instances of the computation instead.

I did some experiments along this line, and I've been finding that I'm able to boost performance by 2x by multithreading at a higher level in the code, but this comes at a significant overhead for small problem sizes. I probably need to apply some kind of heuristic to switch between plain ole single-threaded code (for small problems where the overhead of parallel computations outweigh the benefits -- I'm seeing 2x slowdown for smaller problem sets), and the parallel version for larger problems where the overhead of multithreading is dwarfed by the overall performance gain.


T

-- 
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next. -- (Stolen from the net)
June 19, 2020
On 6/19/20 1:54 AM, H. S. Teoh wrote:
> What I meant is that I'm thinking of optimizing at a higher level, to
> arrange it in such a way that I don't need to lock. E.g., if scanning
> for zeroes can't easily multithread with updating the array, perhaps the
> better way to do it is to let them run sequentially, but multithread
> across multiple instances of the computation instead.

Atomic reads/writes are not as heavy as locks. Though certainly if there are better ways to organize the code, that might be more fruitful.

-Steve