September 06, 2012
Am 06.09.2012 01:10, schrieb Walter Bright:
> On 9/5/2012 4:03 AM, Benjamin Thaut wrote:
>> GC collection times:
>>
>> DMD GC Version: 8.9 ms
>> GDC GC Version: 4.1 ms
>
> I'd like it if you could add some instrumentation to see what accounts
> for the time difference. I presume they both use the same D source code.

The code is identical, I did not change anything in the GC code. So it uses whatever code comes with the MinGW GDC 2.058 release.

The problem with intstrumentation is, that I can not recompile druntime for the MinGW GDC, as this is not possible with the binary release of MinGW GDC and I did not go thorugh the effort to setup the whole build.
I'm open to suggestions though how I could profile the GC without recompiling druntime. If someone else wants to profile this, I can also provide precompiled versions of both versions.

-- 
Kind Regards
Benjamin Thaut
September 06, 2012
On 2012-09-06 14:12, Benjamin Thaut wrote:
> Am 06.09.2012 01:10, schrieb Walter Bright:
>> On 9/5/2012 4:03 AM, Benjamin Thaut wrote:
>>> GC collection times:
>>>
>>> DMD GC Version: 8.9 ms
>>> GDC GC Version: 4.1 ms
>>
>> I'd like it if you could add some instrumentation to see what accounts
>> for the time difference. I presume they both use the same D source code.
>
> The code is identical, I did not change anything in the GC code. So it
> uses whatever code comes with the MinGW GDC 2.058 release.
>
> The problem with intstrumentation is, that I can not recompile druntime
> for the MinGW GDC, as this is not possible with the binary release of
> MinGW GDC and I did not go thorugh the effort to setup the whole build.
> I'm open to suggestions though how I could profile the GC without
> recompiling druntime. If someone else wants to profile this, I can also
> provide precompiled versions of both versions.
>

I don't know what Windows has but on Mac OS X there's this application:

https://developer.apple.com/library/mac/#documentation/developertools/conceptual/InstrumentsUserGuide/Introduction/Introduction.html

It lets you instrument any running application.

-- 
/Jacob Carlborg
September 06, 2012
> The problem with intstrumentation is, that I can not recompile druntime for the MinGW GDC, as this is not possible with the binary release of MinGW GDC and I did not go thorugh the effort to setup the whole build.
> I'm open to suggestions though how I could profile the GC without recompiling druntime. If someone else wants to profile this, I can also provide precompiled versions of both versions.

You don't necessarily need to recompile anything with a sampling
profiler like AMD Code Analyst or Very Sleepy

September 06, 2012
Am 06.09.2012 15:30, schrieb ponce:
>> The problem with intstrumentation is, that I can not recompile
>> druntime for the MinGW GDC, as this is not possible with the binary
>> release of MinGW GDC and I did not go thorugh the effort to setup the
>> whole build.
>> I'm open to suggestions though how I could profile the GC without
>> recompiling druntime. If someone else wants to profile this, I can
>> also provide precompiled versions of both versions.
>
> You don't necessarily need to recompile anything with a sampling
> profiler like AMD Code Analyst or Very Sleepy
>

I just tried profiling it with Very Sleepy but basically it only tells me for both versions that most of the time is spend in gcx.fullcollect.
Just that the GDC version spends less time in gcx.fullcollect then the DMD version.

As I can not rebuild druntime with GDC it will be quite hard to get detailed profiling results.

I'm open for suggestions.

Kind Regards
Benjamin Thaut
September 06, 2012
> I just tried profiling it with Very Sleepy but basically it only tells me for both versions that most of the time is spend in gcx.fullcollect.
> Just that the GDC version spends less time in gcx.fullcollect then the DMD version.
>
> As I can not rebuild druntime with GDC it will be quite hard to get detailed profiling results.
>
> I'm open for suggestions.

> As I can not rebuild druntime with GDC it will be quite hard to get detailed profiling results.
> 
> I'm open for suggestions.
> 
> Kind Regards
> Benjamin Thaut

You might try AMD Code Analyst, it will highlight the bottleneck
in the assembly listing. Then use a disassembler like IDA to get
a feel of what the bottleneck could be.


September 06, 2012
On 9/6/2012 10:50 AM, Benjamin Thaut wrote:
> I just tried profiling it with Very Sleepy but basically it only tells me for
> both versions that most of the time is spend in gcx.fullcollect.
> Just that the GDC version spends less time in gcx.fullcollect then the DMD version.

Even so, that in itself is a good clue.

September 07, 2012
On Sep 6, 2012, at 10:50 AM, Benjamin Thaut <code@benjamin-thaut.de> wrote:

> Am 06.09.2012 15:30, schrieb ponce:
>>> The problem with intstrumentation is, that I can not recompile
>>> druntime for the MinGW GDC, as this is not possible with the binary
>>> release of MinGW GDC and I did not go thorugh the effort to setup the
>>> whole build.
>>> I'm open to suggestions though how I could profile the GC without
>>> recompiling druntime. If someone else wants to profile this, I can
>>> also provide precompiled versions of both versions.
>> 
>> You don't necessarily need to recompile anything with a sampling profiler like AMD Code Analyst or Very Sleepy
>> 
> 
> I just tried profiling it with Very Sleepy but basically it only tells me for both versions that most of the time is spend in gcx.fullcollect.
> Just that the GDC version spends less time in gcx.fullcollect then the DMD version.
> 
> As I can not rebuild druntime with GDC it will be quite hard to get detailed profiling results.
> 
> I'm open for suggestions.

What version flags are set by GDC vs. DMD in your target apps?  The way "stop the world" is done on Linux vs. Windows is different, for example.
September 07, 2012
On 2012-09-07 01:53, Sean Kelly wrote:

> What version flags are set by GDC vs. DMD in your target apps?  The way "stop the world" is done on Linux vs. Windows is different, for example.

He's using only Windows as far as I understand, GDC MinGW.

-- 
/Jacob Carlborg
September 07, 2012
On Thursday, 6 September 2012 at 20:44:29 UTC, Walter Bright wrote:
> On 9/6/2012 10:50 AM, Benjamin Thaut wrote:
>> I just tried profiling it with Very Sleepy but basically it only tells me for
>> both versions that most of the time is spend in gcx.fullcollect.
>> Just that the GDC version spends less time in gcx.fullcollect then the DMD version.
>
> Even so, that in itself is a good clue.

my bet is on, cross-module-inlining of bitop.btr failing...

https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gcbits.d

version (DigitalMars)
{
    version = bitops;
}
else version (GNU)
{
    // use the unoptimized version
}
else version (D_InlineAsm_X86)
{
    version = Asm86;
}

wordtype testClear(size_t i)
{
  version (bitops)
  {
    return core.bitop.btr(data + 1, i);   // this is faster!
  }

September 07, 2012
On 7 September 2012 07:28, Sven Torvinger <Sven@torvinger.se> wrote:
> On Thursday, 6 September 2012 at 20:44:29 UTC, Walter Bright wrote:
>>
>> On 9/6/2012 10:50 AM, Benjamin Thaut wrote:
>>>
>>> I just tried profiling it with Very Sleepy but basically it only tells me
>>> for
>>> both versions that most of the time is spend in gcx.fullcollect.
>>> Just that the GDC version spends less time in gcx.fullcollect then the
>>> DMD version.
>>
>>
>> Even so, that in itself is a good clue.
>
>
> my bet is on, cross-module-inlining of bitop.btr failing...
>
> https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gcbits.d
>
> version (DigitalMars)
> {
>     version = bitops;
> }
> else version (GNU)
> {
>     // use the unoptimized version
> }
> else version (D_InlineAsm_X86)
> {
>     version = Asm86;
> }
>
> wordtype testClear(size_t i)
> {
>   version (bitops)
>   {
>     return core.bitop.btr(data + 1, i);   // this is faster!
>   }
>

You would be wrong.  btr is a compiler intrinsic, so it is *always* inlined!

Leaning towards Walter here that I would very much like to see hard evidence of your claims.  :-)


On a side note of that though, GDC has bt, btr, bts, etc, as intrinsics to its compiler front-end.  So it would be no problem switching to version = bitops for version GNU.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';