Thread overview
TLS and x86-64 fs:
Jun 28, 2023
Cecil Ward
Jun 28, 2023
Cecil Ward
Jun 29, 2023
IGotD-
Jun 29, 2023
Cecil Ward
Jul 04, 2023
kinke
June 28, 2023
It seems that in x86-64 Linux (which I think is what godbolt.org uses - is that correct?), LDC emits a function call to get the address of your TLS static base. GDC seems to be vastly more efficient as it simply uses an fs: segment override and apart from that accesses the RAM directly. I don’t know what is happening if someone takes the address of a TLS static and mixes it with a comparison or pointer arithmetic with say an alloc cell or address in one of the stacks. How that works is a GDC question.

I wonder why the difference in method between the compilers ? Could LDC steal the GDC tech here in order to get much more speed ?
June 28, 2023
On Wednesday, 28 June 2023 at 23:39:43 UTC, Cecil Ward wrote:
> It seems that in x86-64 Linux (which I think is what godbolt.org uses - is that correct?), LDC emits a function call to get the address of your TLS static base. GDC seems to be vastly more efficient as it simply uses an fs: segment override and apart from that accesses the RAM directly. I don’t know what is happening if someone takes the address of a TLS static and mixes it with a comparison or pointer arithmetic with say an alloc cell or address in one of the stacks. How that works is a GDC question.
>
> I wonder why the difference in method between the compilers ? Could LDC steal the GDC tech here in order to get much more speed ?

Perhaps I should try taking the address of a TLS static with GDC and taking a look at its value?
June 29, 2023
On Wednesday, 28 June 2023 at 23:39:43 UTC, Cecil Ward wrote:
> It seems that in x86-64 Linux (which I think is what godbolt.org uses - is that correct?), LDC emits a function call to get the address of your TLS static base. GDC seems to be vastly more efficient as it simply uses an fs: segment override and apart from that accesses the RAM directly. I don’t know what is happening if someone takes the address of a TLS static and mixes it with a comparison or pointer arithmetic with say an alloc cell or address in one of the stacks. How that works is a GDC question.
>
> I wonder why the difference in method between the compilers ? Could LDC steal the GDC tech here in order to get much more speed ?

LDC doesn't need to steal anything because TLS access is a standard, for Linux in the ELF and runtime ABI for the CPU architecture standard. The difference is because they are several different ways accessing TLS. Typically, if you access TLS from the main executable, then the compiler can optimize TLS accesses using the fs segment on x86. I don't know why LDC chooses to use a function call but this is likely to be a setting in LLVM as it should support all types of accesses.

Function calls to obtain TLS access is typically used when the code is in a dynamically loaded library that was loaded by your code (not "statically" loaded library that the linker can determine at link time, I know it's a bit messy to understand this).

A system standard (operating system) can actually choose what types of TLS access is supposed to be used.

June 29, 2023
On Thursday, 29 June 2023 at 09:10:03 UTC, IGotD- wrote:
> On Wednesday, 28 June 2023 at 23:39:43 UTC, Cecil Ward wrote:
>> It seems that in x86-64 Linux (which I think is what godbolt.org uses - is that correct?), LDC emits a function call to get the address of your TLS static base. GDC seems to be vastly more efficient as it simply uses an fs: segment override and apart from that accesses the RAM directly. I don’t know what is happening if someone takes the address of a TLS static and mixes it with a comparison or pointer arithmetic with say an alloc cell or address in one of the stacks. How that works is a GDC question.
>>
>> I wonder why the difference in method between the compilers ? Could LDC steal the GDC tech here in order to get much more speed ?
>
> LDC doesn't need to steal anything because TLS access is a standard, for Linux in the ELF and runtime ABI for the CPU architecture standard. The difference is because they are several different ways accessing TLS. Typically, if you access TLS from the main executable, then the compiler can optimize TLS accesses using the fs segment on x86. I don't know why LDC chooses to use a function call but this is likely to be a setting in LLVM as it should support all types of accesses.
>
> Function calls to obtain TLS access is typically used when the code is in a dynamically loaded library that was loaded by your code (not "statically" loaded library that the linker can determine at link time, I know it's a bit messy to understand this).
>
> A system standard (operating system) can actually choose what types of TLS access is supposed to be used.

Thanks IGotD, this is making an exe, not a shared library. I can see it in x86-64 in godbolt.org in a minimal function that simply returns the content of a TLS static. Weird. Do you know how to control this behaviour then? I would like to see what is in that routine, to see the performance cost of it as is.

I’d like to understand if it is an error taking the address difference :
   static ubyte TLS_static;
  __gshared ubyte _g_shared_static;

     const diff = &TLS_static - &_g_shared_static;
     const comparison = &TLS_static < &_g_shared_static;
July 04, 2023

On Thursday, 29 June 2023 at 17:03:35 UTC, Cecil Ward wrote:

>

Do you know how to control this behaviour then?

With -fthread-model. E.g., -fthread-model=local-exec seems to be what GDC defaults to.