December 05, 2008 Re: Hard-to-reproduce GC bug | ||||
---|---|---|---|---|
| ||||
Posted in reply to dsimcha | == Quote from dsimcha (dsimcha@yahoo.com)'s article
>
> Thanks, though I'm way ahead of you in that I already did this. Works great,
> except it's a little bit slow.
> I'm actually working on an implementation of the SuperStack proposed by Andrei
> about a month ago, which was why I needed good TLS. It seems like with the
> current implementation (using the faster explicit key solution instead of the
> slower class-based solution), about 1/3 of my time is being spent on retrieving
> TLS. I got this number by caching the stuff from TLS on the stack of the calling
> function and passing it in as a parameter. This may become a semi-hidden feature
> for wringing out that last bit of performance from SuperStack. Is TLS inherently
> slow, or is the druntime implementation relatively quick and dirty and likely to
> improve in the future?
The druntime implementation is about as fast as user-level TLS can get, I'm afraid. If you look at the implementation:
class ThreadLocal
{
T val()
{
Wrap* wrap = cast(Wrap*) Thread.getLocal( m_key );
return wrap ? wrap.val : m_def;
}
}
class Thread
{
static void* getLocal( uint key )
{
return getThis().m_local[key];
}
static Thread getThis()
{
version( Posix )
return cast(Thread) pthread_getspecific( sm_this );
}
void*[LOCAL_MAX] m_local;
}
The OS-level TLS call is typically implemented as an array indexing operation, so to get a TLS value you're looking at indexing into two arrays, a cast, and then an additional cast and conditional jump if you use ThreadLocal. Error checking is even omitted for performance reasons. If I knew of a way to make it faster then I would :-)
Sean
|
December 05, 2008 Re: Hard-to-reproduce GC bug | ||||
---|---|---|---|---|
| ||||
Posted in reply to dsimcha | dsimcha wrote: > Thanks, though I'm way ahead of you in that I already did this. Works great, > except it's a little bit slow. TLS is always going to be slow. Beating the old drum about how freakin' useful a tool obj2asm is and why doesn't anyone use it, here's what it looks like: -------------------- __thread int foo; void main() { foo = 3; } --------------------- __Dmain comdat assume CS:__Dmain mov EAX,__tls_index mov ECX,FS:__tls_array mov EDX,[EAX*4][ECX] mov dword ptr _D5test63fooi[EDX],3 xor EAX,EAX ret __Dmain ends --------------------- So you see, it takes 4 instructions to reference a TLS global vs 1 instruction for regular static data. The lesson is to minimize directly referencing such globals. Instead, take a pointer to them, or cache the value into a local. As for whether __thread is completely implemented, yes it is completely implemented in the compiler. I obviously forgot about the gc, though, and I'm glad you found the problem so I can fix it. In the meantime, you can call the gc directly to register your __thread variable as a 'root', then the gc will recognize it properly. If you want to read about how TLS works under Windows, see: http://www.nynaeve.net/?p=180 It works an equivalent, but completely differently under the hood, way in Linux: http://people.redhat.com/drepper/tls.pdf |
December 05, 2008 Re: Hard-to-reproduce GC bug | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Fri, Dec 5, 2008 at 5:38 PM, Walter Bright <newshound1@digitalmars.com> wrote:
> TLS is always going to be slow. Beating the old drum about how freakin' useful a tool obj2asm is and why doesn't anyone use it, here's what it looks like:
Er, Walter, you realize it's not free, right? Meaning that even if the EUP is only $15 there's still going to be a lot of people who don't have it just because they don't feel bothered to buy it.
|
December 05, 2008 Re: Hard-to-reproduce GC bug | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Thanks, guys. I've found ways to speed things up a decent amount, and put an alpha of my SuperStack up on Scrapple, though I renamed it to TempAlloc because I don't like the name SuperStack. See D.announce and http://dsource.org/projects/scrapple/browser/trunk/tempAlloc |
December 06, 2008 Re: Hard-to-reproduce GC bug | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jarrett Billingsley | Jarrett Billingsley wrote:
> On Fri, Dec 5, 2008 at 5:38 PM, Walter Bright
> <newshound1@digitalmars.com> wrote:
>> TLS is always going to be slow. Beating the old drum about how freakin'
>> useful a tool obj2asm is and why doesn't anyone use it, here's what it looks
>> like:
>
> Er, Walter, you realize it's not free, right? Meaning that even if
> the EUP is only $15 there's still going to be a lot of people who
> don't have it just because they don't feel bothered to buy it.
That's like saying one works as an auto mechanic but prefers to use a rock rather than a hammer because a hammer costs $15 !! It's just far too useful to not buy at such a reasonable price.
Even so, obj2asm is free on the linux version.
|
December 06, 2008 Re: Hard-to-reproduce GC bug | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Fri, Dec 5, 2008 at 7:29 PM, Walter Bright <newshound1@digitalmars.com> wrote:
> That's like saying one works as an auto mechanic but prefers to use a rock rather than a hammer because a hammer costs $15 !! It's just far too useful to not buy at such a reasonable price.
>
> Even so, obj2asm is free on the linux version.
That doesn't really help Windows DMD users who are stuck using an outdated object format that almost nothing else seems to understand. Or on Linux, for that matter, since there are - and always have been - free disassemblers for ELF.
|
December 06, 2008 Re: Hard-to-reproduce GC bug | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jarrett Billingsley | Jarrett Billingsley wrote:
> On Fri, Dec 5, 2008 at 7:29 PM, Walter Bright
> <newshound1@digitalmars.com> wrote:
>> That's like saying one works as an auto mechanic but prefers to use a rock
>> rather than a hammer because a hammer costs $15 !! It's just far too useful
>> to not buy at such a reasonable price.
>>
>> Even so, obj2asm is free on the linux version.
>
> That doesn't really help Windows DMD users who are stuck using an
> outdated object format that almost nothing else seems to understand.
> Or on Linux, for that matter, since there are - and always have been -
> free disassemblers for ELF.
I updated Agner Fog's objconv so that the -fasm option now works with DMD .obj's on Windows. He still hasn't released it yet on his site, but I can give it to anyone who's interested.
It disassembles all instructions, even the newly defined ones that don't exist on any current processors. But it still has a few problems, and it won't give you D source code interleaved with the asm output.
|
December 07, 2008 Re: Hard-to-reproduce GC bug | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jarrett Billingsley | On Sat, 06 Dec 2008 02:57:20 +0200, Jarrett Billingsley <jarrett.billingsley@gmail.com> wrote: > On Fri, Dec 5, 2008 at 7:29 PM, Walter Bright > <newshound1@digitalmars.com> wrote: >> That's like saying one works as an auto mechanic but prefers to use a rock >> rather than a hammer because a hammer costs $15 !! It's just far too useful >> to not buy at such a reasonable price. >> >> Even so, obj2asm is free on the linux version. > > That doesn't really help Windows DMD users who are stuck using an > outdated object format that almost nothing else seems to understand. > Or on Linux, for that matter, since there are - and always have been - > free disassemblers for ELF. IDA 4.9 is now free for non-commercial purposes, and it understands DMD's .obj files. http://www.hex-rays.com/idapro/idadownfreeware.htm -- Best regards, Vladimir mailto:thecybershadow@gmail.com |
December 08, 2008 Re: Hard-to-reproduce GC bug | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright wrote:
> Steven Schveighoffer wrote:
>> I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GC allocated data, it doesn't see that it's still in use, and will collect.
>
> Looks like I need to do some research to see how the gc can discover the extent of tls data.
I've got this working now for Windows and Linux for the main program (not for dll's or shared libraries).
|
December 08, 2008 Re: Hard-to-reproduce GC bug | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright, el 7 de diciembre a las 16:04 me escribiste: > Walter Bright wrote: > >Steven Schveighoffer wrote: > >>I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GC allocated data, it doesn't see that it's still in use, and will collect. > >Looks like I need to do some research to see how the gc can discover the extent of tls data. > > I've got this working now for Windows and Linux for the main program (not for dll's or shared libraries). I saw the change[1] and I wonder why there are mentions to the DMD implementation. Shouldn't that be implementation agnostic, being in the "common" part of the runtime? I guess _tlsstart and _tlsend should be added to the runtime specification[2] too, right? BTW, the change broke the indentation style of druntime :S Thank you. [1] http://www.dsource.org/projects/druntime/changeset/57 [2] http://www.dsource.org/projects/druntime/wiki/RuntimeSpec -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- La máquina de la moneda, mirá como te queda! -- Sidharta Kiwi |
Copyright © 1999-2021 by the D Language Foundation