July 24, 2012
On Tue, Jul 24, 2012 at 10:53:05PM +0200, David wrote:
> Am 24.07.2012 21:46, schrieb David:
> >>Hmm. Could this be a GC-related issue?
> >
> >Actually this could be. They are stored inside a Vertex* array which is allocated which is allocated with `malloc`, maybe the GC scans all of the created vertex structs? Could this be?
> 
>     import core.memory;
>     GC.disable();
> 
> directly when entering main didn't help, so I guess it's not the GC

This is strange. You said that you profiled the program and the extra time spent is not in user code? Where is it spent then?


T

-- 
Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.
July 24, 2012
On Tue, 24 Jul 2012 22:53:05 +0200, David <d@dav1d.de> wrote:

> Am 24.07.2012 21:46, schrieb David:
>>> Hmm. Could this be a GC-related issue?
>>
>> Actually this could be. They are stored inside a Vertex* array which is
>> allocated which is allocated with `malloc`, maybe the GC scans all of
>> the created vertex structs? Could this be?
>
>      import core.memory;
>      GC.disable();
>
> directly when entering main didn't help, so I guess it's not the GC

As long as you're using malloc, the GC should leave it alone.

-- 
Simen
July 24, 2012
> This is strange. You said that you profiled the program and the extra
> time spent is not in user code? Where is it spent then?

This is a damn good question. I tried to debug it manually with writefln's, it showed that glfwSwapBuffers needed the time (which, I looked it up, is just a wrapper around glXSwapBuffers). `perf` showed me nothing, the time was used in some unresolved calls.

I will make new tests with perf tomorrow.


July 24, 2012
On Wednesday, July 25, 2012 00:12:19 David wrote:
> > This is strange. You said that you profiled the program and the extra time spent is not in user code? Where is it spent then?
> 
> This is a damn good question. I tried to debug it manually with writefln's, it showed that glfwSwapBuffers needed the time (which, I looked it up, is just a wrapper around glXSwapBuffers). `perf` showed me nothing, the time was used in some unresolved calls.
> 
> I will make new tests with perf tomorrow.

dmd comes with a profile built into it. Just compile -profile, and you'll get profile information when you run your program.

- Jonathan m Davis
July 24, 2012
On Tuesday, 24 July 2012 at 19:42:34 UTC, David wrote:
>> I agree. I don't know how the CPU handles misaligned floats, but from
>> what I understand, it will do two loads to fetch the two word-aligned
>> parts of the float, and then assemble it together. This may be what's
>> causing the slowdown.
>>
>>
>> T
>>
>
> Remvoing the `align(1)` changes nothing, not 1ms slower or faster, unfortunately.


[quote]
[code]
 Vertex[] data;
 foreach(i; 0..6) {
   data ~= Vertex(positions[i][0], positions[i][1], positions[i][2],
[/code]
[/quote]

Try using reserve? The new structure size looks like it's about 40 bytes, and aside from resizing I'm not sure why it would have issues.

[code]
 Vertex[] data;
 data.reserve(6); //following foreach...
[/code]
July 25, 2012
Am 25.07.2012 01:10, schrieb Era Scarecrow:
>> Remvoing the `align(1)` changes nothing, not 1ms slower or faster,
>> unfortunately.
>
>
> [quote]
> [code]
>   Vertex[] data;
>   foreach(i; 0..6) {
>     data ~= Vertex(positions[i][0], positions[i][1], positions[i][2],
> [/code]
> [/quote]
>
> Try using reserve? The new structure size looks like it's about 40
> bytes, and aside from resizing I'm not sure why it would have issues.
>
> [code]
>   Vertex[] data;
>   data.reserve(6); //following foreach...
> [/code]

Also not the problem, I returned the whole array at once and it didn't help. But thanks for your idea.


The strange thing is, this tessellation function(s) are just run once and then the data is passed to the GPU.
So my comment shouldn't have a direct impact on the speed (e.g. GC issue would explain it, but unfortunatly it isn't the GC).

I'll try a different compiler, too.
July 25, 2012
> I'll try a different compiler, too.

It's the same issue with ldc

July 25, 2012
Have you checked your default compiler/linker args?

Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:

> > I'll try a different compiler, too.
> 
> It's the same issue with ldc
> 




July 25, 2012
Am 25.07.2012 15:44, schrieb Andrea Fontana:
> Have you checked your default compiler/linker args?
>
> Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:
>> > I'll try a different compiler, too.
>>
>> It's the same issue with ldc
>>
>

They didn't change (of course I changed the args which are different for ldc), what do you exactly mean?
July 25, 2012
Ok here we go:

perf.data: http://dav1d.de/perf.data

and a fancy image (showing the results of perf): http://dav1d.de/output.png

I hope anyone knows where the time is spent.

Most time spent:
+  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b