May 31, 2021

On Monday, 31 May 2021 at 21:23:17 UTC, Max Haughton wrote:

>

This is orthogonal to the example I posted, what if the hardware can't perform the operation using simple atomic instructions, you might as well provide the fallback case anyway - both for easier correctness and to kill two birds with one API. Guaranteeing that the type uses the instructions anyway is up to the implementation, but the guarantee can be made nonetheless.

I am not sure I understand what you mean now. Locking operations may imply completely different algorithms.

In C++ you can either do a static compile time check using is_always_lock_free or a dynamic runtime check (then take an alternative path if it isn't). The dynamic check is to allow higher performance when it can be used, but that might require a completely different algorithm?

Or with C++20 you have optional
atomic_signed_lock_free and atomic_unsigned_lock_free, which I probably will use when I get them.

June 01, 2021
On Sunday, 30 May 2021 at 20:58:56 UTC, IGotD- wrote:
> Definitely, the D atomic library is cumbersome to use. C++ std::atomic supports operator overloading for example.
>
> atomicVar += 1;
>
> will create an atomic add as atomicVar is of the atomic type. D doesn't have this and I think D should add atomic types like std::atomic<T>.

That was a design choice.  It's because of this:

> I like this because then I can easily switch between atomic operations and normal operations by just changing the type and very few changes.

The trouble is that only works in a handful of simple cases (e.g., you just want a simple event counter that doesn't affect flow of control).  For anything else, you need to think carefully about exactly where the atomic operations are, so there's no point making them implicit.
June 02, 2021
On 31/05/2021 23:33, Ola Fosheim Grøstad wrote:
> On Monday, 31 May 2021 at 16:34:35 UTC, Guillaume Piolat wrote:
>> On Monday, 31 May 2021 at 09:26:36 UTC, rm wrote:
>>>
>>> I don't consider this a problem. In this case you have a load and a store. This is a non-atomic RMW. On the other hand, you do get sequential consistency synchronization from this process.
>>
>> I prefer atomicLoad and atomicStore then, because it's explicit and it's useless to hide the fact it's atomic behind nice syntax.
> 
> Yes, how often do people use this anyway? I try to avoid concurrency issues and have found that I tend to end up using compare-exchange when I have to.
> 

It's useful if you want to implement known concurrency algorithms with SC semantics. Such as lamports lock (which requires SC).

http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2
http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-2-of-2

It's there to nudge people away from using the weaker semantics and allow easy synchronization.
June 02, 2021
On 01/06/2021 5:50, sarn wrote:
> On Sunday, 30 May 2021 at 20:58:56 UTC, IGotD- wrote:
>> Definitely, the D atomic library is cumbersome to use. C++ std::atomic supports operator overloading for example.
>>
>> atomicVar += 1;
>>
>> will create an atomic add as atomicVar is of the atomic type. D doesn't have this and I think D should add atomic types like std::atomic<T>.
> 
> That was a design choice.  It's because of this:
> 
>> I like this because then I can easily switch between atomic operations and normal operations by just changing the type and very few changes.
> 
> The trouble is that only works in a handful of simple cases (e.g., you just want a simple event counter that doesn't affect flow of control).  For anything else, you need to think carefully about exactly where the atomic operations are, so there's no point making them implicit.

I agree about that. One shouldn't simply access the same memory location atomically and non-atomically interchangeably. That is a source for many bugs. Especially considering the kind of synchronization you'll have or not have as a result.

Still, there are cases where you *know* that your thread is the *only one* that can access this variable. In a case like this, only after you made sure to synchronize you can also allow for non atomic access to the variable (Though I'd still avoid this).
Alternatively, the other case is going from non-atomic to atomic. After initializing the location with an allocator in a non atomic manner, you move to use it atomically to synchronize between threads.

But regarding the design choice, if your intention is to prevent casting the atomic to non-atomic. You can simply wrap it in a struct and not allowing access to the raw value. That should be sufficient.

Anyway, I disagree about the simple cases. Because specifically the case of simple event counter that isn't require for synchronization, you should be using relaxed. There is no need for sequential consistency in this case.
June 02, 2021
On Wednesday, 2 June 2021 at 14:08:32 UTC, rm wrote:
> It's useful if you want to implement known concurrency algorithms with SC semantics. Such as lamports lock (which requires SC).

Have you ever used Lamport's Bakery, though?

Atomic inc/dec are obviously useful, but usually you want to know what the value was before/after the operation, so fetch_add/compare_exchange are easier to deal with IMO.




June 02, 2021
On 02/06/2021 17:59, Ola Fosheim Grøstad wrote:
> On Wednesday, 2 June 2021 at 14:08:32 UTC, rm wrote:
>> It's useful if you want to implement known concurrency algorithms with SC semantics. Such as lamports lock (which requires SC).
> 
> Have you ever used Lamport's Bakery, though?

Not Lamport's Bakery. But I did implement some primitives. betterC does limit the options to work with phobos.

For the other cases, I do start with explicit syntax. As I start with strong accesses and try to relax them as I progress. But that's mostly because I want try use the weaker memory semantics.

> Atomic inc/dec are obviously useful, but usually you want to know what the value was before/after the operation, so fetch_add/compare_exchange are easier to deal with IMO.

What's wrong with this?
```D
Atomic!int x = 5;
int a = x+; // a = 5
```
https://github.com/rymrg/drm/blob/9db88fb468e2b8babdf9bde488d28d733aea638f/atomic.d#L95

inc/dec are implemented in terms of fetch_add.
June 02, 2021
On Wednesday, 2 June 2021 at 15:09:54 UTC, rm wrote:
> inc/dec are implemented in terms of fetch_add.

IIRC some architectures provide more efficient inc/dec atomics without fetch? I haven't looked at that in years, so I have no idea what the contemporary situation is.




June 02, 2021
On Wednesday, 2 June 2021 at 15:19:59 UTC, Ola Fosheim Grøstad wrote:
> On Wednesday, 2 June 2021 at 15:09:54 UTC, rm wrote:
>> inc/dec are implemented in terms of fetch_add.
>
> IIRC some architectures provide more efficient inc/dec atomics without fetch? I haven't looked at that in years, so I have no idea what the contemporary situation is.

No, I think that was wrong, I think they usually return the original value (or set a flag or whatever). But it doesn't matter. We should just look at what the common contemporary processors provide and look at instructions per clock cycles throughput. I guess last generation ARM/Intel/AMD is sufficient?

June 02, 2021
On Wednesday, 2 June 2021 at 15:30:46 UTC, Ola Fosheim Grøstad wrote:
> On Wednesday, 2 June 2021 at 15:19:59 UTC, Ola Fosheim Grøstad wrote:
>> On Wednesday, 2 June 2021 at 15:09:54 UTC, rm wrote:
>>> inc/dec are implemented in terms of fetch_add.
>>
>> IIRC some architectures provide more efficient inc/dec atomics without fetch? I haven't looked at that in years, so I have no idea what the contemporary situation is.
>
> No, I think that was wrong, I think they usually return the original value (or set a flag or whatever). But it doesn't matter. We should just look at what the common contemporary processors provide and look at instructions per clock cycles throughput. I guess last generation ARM/Intel/AMD is sufficient?

Are they always fixed latency? No dependence on the load store queue state (etc.)  for example?
June 02, 2021
On 02/06/2021 20:33, Max Haughton wrote:
> On Wednesday, 2 June 2021 at 15:30:46 UTC, Ola Fosheim Grøstad wrote:
>> On Wednesday, 2 June 2021 at 15:19:59 UTC, Ola Fosheim Grøstad wrote:
>>> On Wednesday, 2 June 2021 at 15:09:54 UTC, rm wrote:
>>>> inc/dec are implemented in terms of fetch_add.
>>>
>>> IIRC some architectures provide more efficient inc/dec atomics without fetch? I haven't looked at that in years, so I have no idea what the contemporary situation is.
>>
>> No, I think that was wrong, I think they usually return the original value (or set a flag or whatever). But it doesn't matter. We should just look at what the common contemporary processors provide and look at instructions per clock cycles throughput. I guess last generation ARM/Intel/AMD is sufficient?
> 
> Are they always fixed latency? No dependence on the load store queue state (etc.)  for example?

At least on x86-TSO, an atomic operation forces the cache to be flushed to memory.