July 25, 2012
I had a performance problem with std.xml some month ago. It takes me a
lot to point out that there was a default linker param (in gdc & dmd
under linux) that slow down the whole thing.
So maybe it's not a code-related issue, I mean :)


Il giorno mer, 25/07/2012 alle 15.53 +0200, David ha scritto:

> Am 25.07.2012 15:44, schrieb Andrea Fontana:
> > Have you checked your default compiler/linker args?
> >
> > Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:
> >> > I'll try a different compiler, too.
> >>
> >> It's the same issue with ldc
> >>
> >
> 
> They didn't change (of course I changed the args which are different for ldc), what do you exactly mean?




July 25, 2012
On 25-Jul-12 17:54, David wrote:
> Ok here we go:
>
> perf.data: http://dav1d.de/perf.data
>
> and a fancy image (showing the results of perf): http://dav1d.de/output.png
>
> I hope anyone knows where the time is spent.
>
> Most time spent:
> +  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b

Would be cool to have before/after graph.

-- 
Dmitry Olshansky
July 25, 2012
Am 25.07.2012 16:23, schrieb Dmitry Olshansky:
> On 25-Jul-12 17:54, David wrote:
>> Ok here we go:
>>
>> perf.data: http://dav1d.de/perf.data
>>
>> and a fancy image (showing the results of perf):
>> http://dav1d.de/output.png
>>
>> I hope anyone knows where the time is spent.
>>
>> Most time spent:
>> +  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b
>
> Would be cool to have before/after graph.
>

I don't know how to make comparisons with perf.data but here is the captured data of the "working" version:

http://dav1d.de/output_before.png
perf.data: http://dav1d.de/perf_before.data
July 25, 2012
On 25-Jul-12 19:32, David wrote:
> Am 25.07.2012 16:23, schrieb Dmitry Olshansky:
>> On 25-Jul-12 17:54, David wrote:
>>> Ok here we go:
>>>
>>> perf.data: http://dav1d.de/perf.data
>>>
>>> and a fancy image (showing the results of perf):
>>> http://dav1d.de/output.png
>>>
>>> I hope anyone knows where the time is spent.
>>>
>>> Most time spent:
>>> +  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b
>>
>> Would be cool to have before/after graph.
>>
>
> I don't know how to make comparisons with perf.data but here is the
> captured data of the "working" version:
>
> http://dav1d.de/output_before.png
> perf.data: http://dav1d.de/perf_before.data


It looks like a syscall/opengl issue. You somehow managed to hit a dark corner of GL driver. It's either a fallback to software (partial) or some extra translation layer.
I once had a cool table that showed which GL calls  are direct to hardware and which are not for various nvidia cards.

Now the trick is to get an idea why. The best idea to debug driver related stuff is to test on some other computer (like different version of OS, video card etc.).

Can't quite decipher output but I find it strange that it mentions _d_invariant. You'd better compiler with -release if you care for speed.


-- 
Dmitry Olshansky
July 25, 2012
> It looks like a syscall/opengl issue. You somehow managed to hit a dark
> corner of GL driver. It's either a fallback to software (partial) or
> some extra translation layer.
> I once had a cool table that showed which GL calls  are direct to
> hardware and which are not for various nvidia cards.
>
> Now the trick is to get an idea why. The best idea to debug driver
> related stuff is to test on some other computer (like different version
> of OS, video card etc.).

Worst case scenario ... driver issue.


> Can't quite decipher output but I find it strange that it mentions
> _d_invariant. You'd better compiler with -release if you care for speed.

I don't care about speed much, but 1000% less performance is just too bad.
July 25, 2012
On 26-Jul-12 00:52, David wrote:
>> It looks like a syscall/opengl issue. You somehow managed to hit a dark
>> corner of GL driver. It's either a fallback to software (partial) or
>> some extra translation layer.
>> I once had a cool table that showed which GL calls  are direct to
>> hardware and which are not for various nvidia cards.
>>
>> Now the trick is to get an idea why. The best idea to debug driver
>> related stuff is to test on some other computer (like different version
>> of OS, video card etc.).
>
> Worst case scenario ... driver issue.

Been there once. I any case I'd try to split coordinates into 2 or 3 interleaved arrays. (like vertex+norm and separately 2 UV). It's usually slower but not 10x ;)

>
>> Can't quite decipher output but I find it strange that it mentions
>> _d_invariant. You'd better compiler with -release if you care for speed.
>
> I don't care about speed much, but 1000% less performance is just too bad.


-- 
Dmitry Olshansky
July 25, 2012
Am 25.07.2012 23:03, schrieb Dmitry Olshansky:
> On 26-Jul-12 00:52, David wrote:
>>> It looks like a syscall/opengl issue. You somehow managed to hit a dark
>>> corner of GL driver. It's either a fallback to software (partial) or
>>> some extra translation layer.
>>> I once had a cool table that showed which GL calls  are direct to
>>> hardware and which are not for various nvidia cards.
>>>
>>> Now the trick is to get an idea why. The best idea to debug driver
>>> related stuff is to test on some other computer (like different version
>>> of OS, video card etc.).
>>
>> Worst case scenario ... driver issue.
>
> Been there once. I any case I'd try to split coordinates into 2 or 3
> interleaved arrays. (like vertex+norm and separately 2 UV). It's usually
> slower but not 10x ;)

Well the intersting question is, why is it slower? I checked it twice, the data passed to the GPU is 100% the same, no difference, the only difference is the stored format on the CPU (and that's just a matter of casting).

July 25, 2012
David:

> Well the intersting question is, why is it slower? I checked it twice, the data passed to the GPU is 100% the same, no difference, the only difference is the stored format on the CPU (and that's just a matter of casting).

It's not easy to answer similar general questions. Why don't you list the assembly of the two versions and compare?

Bye,
bearophile
July 25, 2012
On 07/24/2012 11:38 AM, David wrote:

> Well this change decreases my performance by 1000%.

Random guess: CPU cache misses?

Ali

July 25, 2012
Am 26.07.2012 00:12, schrieb Ali Çehreli:
> On 07/24/2012 11:38 AM, David wrote:
>
>  > Well this change decreases my performance by 1000%.
>
> Random guess: CPU cache misses?
>
> Ali
>

You're the 2nd one mentioning this, any ideas how to check this?