Thread overview | |||||
---|---|---|---|---|---|
|
January 10, 2012 Re: Cross module inlining in runtime | ||||
---|---|---|---|---|
| ||||
On 10 January 2012 19:49, Artur Skawina <art.08.09@gmail.com> wrote: >> I have porting the runtime/phobos asms to gcc asm on my to-do list, will try to get to that within two weeks. What would be the preferred way - version() guards? if yes - what version? Or would you prefer replacing the asms, if the changes are not going to be merged upstream anyway? > > So i decided to start with this today. As i have a case where turning on logging increases a programs run time from seconds to hours, while it spends most of the time in the GC, thought gcbits would be a good place to start. > > But after adding gdc asm support to GCBits.testClear() the only thing that > changed was this: > > XXXXXXXX <uint gc.gcbits.GCBits.testClear(uint)>: > push %eRX > mov %eRX,%eRX > mov XX(%eRX),%eRX > - push %eRX > - mov %eRX,%eRX > - shr $0x5,%eRX > - lea XX(,%eRX,4),%eRX > - mov XX(%eRX),%eRX > - add (%eRX),%eRX > - mov $0x1,%eRX > - shl %Rl,%eRX > - mov %eRX,%eRX > mov (%eRX),%eRX > - not %eRX > - and %eRX,%eRX > - and %eRX,%eRX > - mov %eRX,(%eRX) > - pop %eRX > + mov XX(%eRX),%eRX > + btr %eRX,XX(%eRX) > + sbb %eRX,%eRX > pop %eRX > ret > > OK, the function turned into ~ three instructions, good, but why didn't it then get inlined into any of the callers? Trying to force things with an attribute turned up this: > four instructions. :~) > ../../../libphobos/gc/gcx.d: In member function 'gc.gcx.Gcx.fullcollect': > BUILD32/gdc/dev/gcc-4.6.1/libphobos/gc/gcbits.d:119:0: sorry, unimplemented: inlining failed in call to 'testClear': function body not available > ../../../libphobos/gc/gcx.d:2647:0: sorry, unimplemented: called from here > BUILD32/gdc/dev/gcc-4.6.1/libphobos/gc/gcbits.d:119:0: sorry, unimplemented: inlining failed in call to 'testClear': function body not available > ../../../libphobos/gc/gcx.d:2729:0: sorry, unimplemented: called from here > make[3]: *** [gc/gcx.o] Error 1 > > Any way to make this work? Much of the asm gains will be lost when the code isn't inlined. > > artur > How is the function written? -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; |
January 10, 2012 Re: Cross module inlining in runtime | ||||
---|---|---|---|---|
| ||||
On 01/11/12 00:30, Iain Buclaw wrote: > On 10 January 2012 19:49, Artur Skawina <art.08.09@gmail.com> wrote: >>> I have porting the runtime/phobos asms to gcc asm on my to-do list, will try to get to that within two weeks. What would be the preferred way - version() guards? if yes - what version? Or would you prefer replacing the asms, if the changes are not going to be merged upstream anyway? >> >> So i decided to start with this today. As i have a case where turning on logging increases a programs run time from seconds to hours, while it spends most of the time in the GC, thought gcbits would be a good place to start. >> >> But after adding gdc asm support to GCBits.testClear() the only thing that >> changed was this: >> >> XXXXXXXX <uint gc.gcbits.GCBits.testClear(uint)>: >> push %eRX >> mov %eRX,%eRX >> mov XX(%eRX),%eRX >> - push %eRX >> - mov %eRX,%eRX >> - shr $0x5,%eRX >> - lea XX(,%eRX,4),%eRX >> - mov XX(%eRX),%eRX >> - add (%eRX),%eRX >> - mov $0x1,%eRX >> - shl %Rl,%eRX >> - mov %eRX,%eRX >> mov (%eRX),%eRX >> - not %eRX >> - and %eRX,%eRX >> - and %eRX,%eRX >> - mov %eRX,(%eRX) >> - pop %eRX >> + mov XX(%eRX),%eRX >> + btr %eRX,XX(%eRX) >> + sbb %eRX,%eRX >> pop %eRX >> ret >> >> OK, the function turned into ~ three instructions, good, but why didn't it then get inlined into any of the callers? Trying to force things with an attribute turned up this: >> > > four instructions. :~) I'm already imagining the inlined case, where the "mov" could be free. :) >> ../../../libphobos/gc/gcx.d: In member function 'gc.gcx.Gcx.fullcollect': >> BUILD32/gdc/dev/gcc-4.6.1/libphobos/gc/gcbits.d:119:0: sorry, unimplemented: inlining failed in call to 'testClear': function body not available >> ../../../libphobos/gc/gcx.d:2647:0: sorry, unimplemented: called from here >> BUILD32/gdc/dev/gcc-4.6.1/libphobos/gc/gcbits.d:119:0: sorry, unimplemented: inlining failed in call to 'testClear': function body not available >> ../../../libphobos/gc/gcx.d:2729:0: sorry, unimplemented: called from here >> make[3]: *** [gc/gcx.o] Error 1 >> >> Any way to make this work? Much of the asm gains will be lost when the code isn't inlined. > How is the function written? --- druntime/gc/gcbits.d.org 2012-01-10 19:56:11.580039157 +0100 +++ druntime/gc/gcbits.d 2012-01-10 23:19:50.046264596 +0100 @@ -29,7 +29,10 @@ version (DigitalMars) } else version (GNU) { - // use the unoptimized version + version(X86) + version = GNU_Asm_x86; + version(X86_64) + version = GNU_Asm_x86; } else version (D_InlineAsm_X86) { @@ -115,6 +118,7 @@ struct GCBits data[1 + (i >> BITS_SHIFT)] &= ~(BITS_1 << (i & BITS_MASK)); } + //pragma(attribute, always_inline) wordtype testClear(size_t i) { version (bitops) @@ -133,6 +137,19 @@ struct GCBits ret 4 ; } } + else version (GNU_Asm_x86) + { + wordtype result = void; + asm + { + "btr %2, %0; sbb %1,%1" + : "+m" *(data+1), "=r" result + : "Ir" i + : "memory" ; + + } + return result; + } else { //result = (cast(bit *)(data + 1))[i]; @@ -164,6 +181,19 @@ struct GCBits ret 4 ; } } + else version (GNU_Asm_x86) + { + wordtype result = void; + asm + { + "bts %2, %0; sbb %1,%1" + : "+m" *(data+1), "=r" result + : "Ir" i + : "memory" ; + + } + return result; + } else { //result = (cast(bit *)(data + 1))[i]; |
January 11, 2012 Re: Cross module inlining in runtime | ||||
---|---|---|---|---|
| ||||
On 10 January 2012 23:38, Artur Skawina <art.08.09@gmail.com> wrote: > On 01/11/12 00:30, Iain Buclaw wrote: >> On 10 January 2012 19:49, Artur Skawina <art.08.09@gmail.com> wrote: >>>> I have porting the runtime/phobos asms to gcc asm on my to-do list, will try to get to that within two weeks. What would be the preferred way - version() guards? if yes - what version? Or would you prefer replacing the asms, if the changes are not going to be merged upstream anyway? >>> >>> So i decided to start with this today. As i have a case where turning on logging increases a programs run time from seconds to hours, while it spends most of the time in the GC, thought gcbits would be a good place to start. >>> >>> But after adding gdc asm support to GCBits.testClear() the only thing that >>> changed was this: >>> >>> XXXXXXXX <uint gc.gcbits.GCBits.testClear(uint)>: >>> push %eRX >>> mov %eRX,%eRX >>> mov XX(%eRX),%eRX >>> - push %eRX >>> - mov %eRX,%eRX >>> - shr $0x5,%eRX >>> - lea XX(,%eRX,4),%eRX >>> - mov XX(%eRX),%eRX >>> - add (%eRX),%eRX >>> - mov $0x1,%eRX >>> - shl %Rl,%eRX >>> - mov %eRX,%eRX >>> mov (%eRX),%eRX >>> - not %eRX >>> - and %eRX,%eRX >>> - and %eRX,%eRX >>> - mov %eRX,(%eRX) >>> - pop %eRX >>> + mov XX(%eRX),%eRX >>> + btr %eRX,XX(%eRX) >>> + sbb %eRX,%eRX >>> pop %eRX >>> ret >>> >>> OK, the function turned into ~ three instructions, good, but why didn't it then get inlined into any of the callers? Trying to force things with an attribute turned up this: >>> >> >> four instructions. :~) > > I'm already imagining the inlined case, where the "mov" could be free. :) > > >>> ../../../libphobos/gc/gcx.d: In member function 'gc.gcx.Gcx.fullcollect': >>> BUILD32/gdc/dev/gcc-4.6.1/libphobos/gc/gcbits.d:119:0: sorry, unimplemented: inlining failed in call to 'testClear': function body not available >>> ../../../libphobos/gc/gcx.d:2647:0: sorry, unimplemented: called from here >>> BUILD32/gdc/dev/gcc-4.6.1/libphobos/gc/gcbits.d:119:0: sorry, unimplemented: inlining failed in call to 'testClear': function body not available >>> ../../../libphobos/gc/gcx.d:2729:0: sorry, unimplemented: called from here >>> make[3]: *** [gc/gcx.o] Error 1 >>> >>> Any way to make this work? Much of the asm gains will be lost when the code isn't inlined. > >> How is the function written? > > --- druntime/gc/gcbits.d.org 2012-01-10 19:56:11.580039157 +0100 > +++ druntime/gc/gcbits.d 2012-01-10 23:19:50.046264596 +0100 Ahh, I've just noticed that the functions are in a different module than the only you are calling from. GCC asm doesn't prevent inlining of functions, but as of yet you cannot inline functions across modules unless you compile both files at the same time, in which case, they get combined into one object file (must specify -o or this doesn't work). -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; |
Copyright © 1999-2021 by the D Language Foundation