core.stdc.stdatomic (page 4)

On 16/11/2023 5:41 AM, Imperatorn wrote: > What's wrong with using the gc? Nothing. In fact it is one of if not the best way to implement a lock-free concurrent data structure without segfaults. It is of particular interest because it turns a nearly impossible problem into a hard problem that can be solved in a reasonable time frame.

On Thursday, 16 November 2023 at 04:25:52 UTC, Richard (Rikki) Andrew Cattermole wrote: >> >> But GC or not doesn't answer the main question, what LF algorithm depends on a a sequence of instructions being done immediately after a CAS? How do you ever enforce that on x86? > > That's the fun part, you can't enforce it on any ISA. I know, that's the point, you have zero timing guarantees since your instructions can be interrupted at any point. So any algorithm relying on "timing" is doomed to fall". IE, it makes no difference if the CAS is inline or wrapped in a function call. LF algorithms rely on a sequence of operations being done in a specific order, and that order being coherent across cores/threads.

On 17/11/2023 11:50 AM, claptrap wrote: > LF algorithms rely on a sequence of operations being done in a specific order, and that order being coherent across cores/threads. Yes but there is a condition on this: 1. Each operation must be atomic, or: 2. Operate on an atomically synchronized memory But most importantly: 3. Must be predictable When you don't inline you get additional steps added that may not hold this condition. Where it can be not atomic and not operating on non-atomically synchronized memory. This is why it matters that it doesn't inline. It adds variability between the steps so that it isn't doing what you think the algorithm is doing. It introduces unpredictability.

On Friday, 17 November 2023 at 04:12:32 UTC, Richard (Rikki) Andrew Cattermole wrote: > On 17/11/2023 11:50 AM, claptrap wrote: >> LF algorithms rely on a sequence of operations being done in a specific order, and that order being coherent across cores/threads. > > Yes but there is a condition on this: > > 1. Each operation must be atomic, or: > 2. Operate on an atomically synchronized memory > > But most importantly: > 3. Must be predictable > > When you don't inline you get additional steps added that may not hold this condition. Where it can be not atomic and not operating on non-atomically synchronized memory. > > This is why it matters that it doesn't inline. It adds variability between the steps so that it isn't doing what you think the algorithm is doing. It introduces unpredictability. when you rely on more than one operation being atomic you are already on the loosing team. Just because two operations are right next to each other in the machine code, it does not mean the will be executed right after each other. Another thread or processor might invalidate the condition you established with the first instruction that the other is relying on. Therefore your algorithm is only correct if you are not relying on predictable execution order.

November 17, 2023

Re: core.stdc.stdatomic

Posted by claptrap
in reply to Richard (Rikki) Andrew Cattermole

Permalink

claptrap

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Friday, 17 November 2023 at 04:12:32 UTC, Richard (Rikki) Andrew Cattermole wrote:
> On 17/11/2023 11:50 AM, claptrap wrote:
>> LF algorithms rely on a sequence of operations being done in a specific order, and that order being coherent across cores/threads.
>
> Yes but there is a condition on this:
>
> 1. Each operation must be atomic, or:
> 2. Operate on an atomically synchronized memory

That's a given, I mean the whole point is any variable sharing across threads needs to be done atomically.

> But most importantly:
> 3. Must be predictable
>
> When you don't inline you get additional steps added that may not hold this condition. Where it can be not atomic and not operating on non-atomically synchronized memory.

CAS operates on a specific memory location and that address will be passed into the function, there's no way for the function to change this to break alignment and hence break the atomicity. In fact on x86 alignment doesn't matter anyway if you have a lock prefix. On Arm it does IIRC. But arm also has less strict memory ordering.

What instructions the compiler inserts around the CAS instruction are irrelevant, none of what they do can break the CAS. The only thing that the function has that can be used to barf things up is the memory location, everything else it has is thread local. So the only way the it can break the CAS is by reading or writing to the memory location in a non atomic manner. It literally has no instructions to do so unless the programmer tells it to. I mean if the compiler emits instructions to read or write to a pointer without the programmer instructing it to do so, your compiler is broken and you'll be getting segfaults everywhere anyway.

On Friday, 17 November 2023 at 10:25:31 UTC, claptrap wrote: > > What instructions the compiler inserts around the CAS instruction are irrelevant, none of what they do can break the CAS. It is not irrelevant. You cannot break the CAS itself as it is usually implemented according to the SW ABI of the architecture. However, you must instruct the compiler so that reordering optimizations don't spill over the CAS implementations. If optimizations make instructions spill over to the wrong side of the CAS, then you possibly will end up in a non working algorithm. So you not only perhaps need to insert a memory barrier instruction depending on ISA, you also need to instruct the compiler not to trespass the atomic instructions. This is usually implicit as atomic operations are implemented as intrinsics which automatically do this for you.

On Friday, 17 November 2023 at 12:41:39 UTC, IGotD- wrote: > On Friday, 17 November 2023 at 10:25:31 UTC, claptrap wrote: >> >> What instructions the compiler inserts around the CAS instruction are irrelevant, none of what they do can break the CAS. > > It is not irrelevant. You cannot break the CAS itself as it is usually implemented according to the SW ABI of the architecture. However, you must instruct the compiler so that reordering optimizations don't spill over the CAS implementations. If optimizations make instructions spill over to the wrong side of the CAS, then you possibly will end up in a non working algorithm. You're missing the point. If your compiler is reordering instructions around a CAS it's broken. If it is doing that then its a problem whether or not you wrapped the CAS in a function call or not. And not only that but the instructions for the function are all thread local. The only shared thing the function has is the address of the atomic variable in memory. Theres nothing in that situation the compiler can reorder that would break multithreaded code that would not also break single thraded code. If your CAS intrinsic or instruction needs a fence, it needs it whether inline or wrapped in a function. Wrapping it in a function call doesnt somehow cause ordering issues.

Forums