July 25, 2012
> It's not easy to answer similar general questions. Why don't you list
> the assembly of the two versions and compare?

My assembly is pretty rusty and actually, I have no idea what to look for.

July 25, 2012
On 07/25/2012 03:26 PM, David wrote:
> Am 26.07.2012 00:12, schrieb Ali Çehreli:
>> On 07/24/2012 11:38 AM, David wrote:
>>
>> > Well this change decreases my performance by 1000%.
>>
>> Random guess: CPU cache misses?
>>
>> Ali
>>
>
> You're the 2nd one mentioning this, any ideas how to check this?

I have no experience. Pages like this look promising:


http://stackoverflow.com/questions/2486840/linux-c-how-to-profile-time-wasted-due-to-cache-misses

Ali
July 25, 2012
Am 26.07.2012 00:37, schrieb Ali Çehreli:
> On 07/25/2012 03:26 PM, David wrote:
>> Am 26.07.2012 00:12, schrieb Ali Çehreli:
>>> On 07/24/2012 11:38 AM, David wrote:
>>>
>>> > Well this change decreases my performance by 1000%.
>>>
>>> Random guess: CPU cache misses?
>>>
>>> Ali
>>>
>>
>> You're the 2nd one mentioning this, any ideas how to check this?
>
> I have no experience. Pages like this look promising:
>
>
> http://stackoverflow.com/questions/2486840/linux-c-how-to-profile-time-wasted-due-to-cache-misses
>
>
> Ali

From what I've seen everything is ok (I used `perf top -e L1-dcache-load-misses -e L1-dcache-loads` to see the hotspots, nothing too bad)
July 26, 2012
Ok, interesting thing.

I switched my buffer from Vertex* to void* and I cast every Vertex I get to void[] and add it to the buffer (slice → memcopy) and everything works fine now. I can live with that (once the basic functions are implemented it's not even a pain to use), but still, I wonder where the problem is.

July 26, 2012
On 26-Jul-12 14:14, David wrote:
> Ok, interesting thing.
>
> I switched my buffer from Vertex* to void* and I cast every Vertex I get
> to void[] and add it to the buffer (slice → memcopy) and everything
> works fine now. I can live with that (once the basic functions are
> implemented it's not even a pain to use), but still, I wonder where the
> problem is.
>

Hm. Do you ever do pointer arithmetic on Vertex*?  Is the size and offsets are correct (like in Vertex vs float)?

-- 
Dmitry Olshansky
July 26, 2012
> Hm. Do you ever do pointer arithmetic on Vertex*?  Is the size and
> offsets are correct (like in Vertex vs float)?

No, yes. I really have no idea why this happens, I saved the contents of my buffers and compared them with the buffers of the `float[]` version (thanks to `git checkout`) and they were exactly 100% the same.
It's a mystery.


July 27, 2012
Am 26.07.2012 21:18, schrieb David:
>> Hm. Do you ever do pointer arithmetic on Vertex*?  Is the size and
>> offsets are correct (like in Vertex vs float)?
>
> No, yes. I really have no idea why this happens, I saved the contents of
> my buffers and compared them with the buffers of the `float[]` version
> (thanks to `git checkout`) and they were exactly 100% the same.
> It's a mystery.
>
>

can you create a version of you code thats allows switching (version(Vertex) else ...) between array and Vertex? or provide both versions here again

you checked dmd and ldc output so it can't be a backend thing (maybe frontend or GC) - or mysterious GL bugs
August 24, 2012
Am 24.07.2012 20:38, schrieb David:
> I am writing a game engine, well I was using a float[] array to store my
> vertices, this worked well, but I have to send more and more uv
> coordinates (and other information) which needn't be stored as `float`'s
> so I moved from a float-Array to a Vertex Array:
> https://github.com/Dav1dde/BraLa/blob/master/brala/dine/builder/tessellator.d#L30
>
>
> align(1) struct Vertex {
>      float x;
>      float y;
>      float z;
>      float nx;
>      float ny;
>      float nz;
>      float u_terrain;
>      float v_terrain;
>      float u_biome;
>      float v_biome;
> }
>
> Everything is still a float, so it's easier. Nothing wrong with that or?
> Well this change decreases my performance by 1000%. My frame rate drops
> from ~12ms per frame to ~120ms per frame. I tried to find the bottleneck
> with `perf` but no results (the time is not spent in the game/engine).
>
> The commit:
> https://github.com/Dav1dde/BraLa/commit/02a37a0e46f195f5a46404747d659d26490e6c32
>
>
> I hope you can see anything wrong. I have no idea!

Check the dissassembly view of this line:
buffer[elements++] = Vertex(x, y, z, nx, ny, nz, u, v, u_biome, v_biome);

If you are using an old version of dmd it will allocate an block of memory which has the size of Vertex, then it will fill the date into that block of memory, and then memcpy it to your buffer array.

You could try working around this by doing:

buffer[elements++].__ctor(x, y, z, nx, ny, nz, u, v, u_biome, v_biome);

Kind Regards
Benjamin Thaut
August 24, 2012
> Check the dissassembly view of this line:
> buffer[elements++] = Vertex(x, y, z, nx, ny, nz, u, v, u_biome, v_biome);
>
> If you are using an old version of dmd it will allocate an block of
> memory which has the size of Vertex, then it will fill the date into
> that block of memory, and then memcpy it to your buffer array.
>
> You could try working around this by doing:
>
> buffer[elements++].__ctor(x, y, z, nx, ny, nz, u, v, u_biome, v_biome);
>
> Kind Regards
> Benjamin Thaut

That's not the problem. The problem has nothing to do with the tessellation, since the *rendering* is also 1000% slower (when all data is already processed).
August 27, 2012
On Aug 24, 2012, at 1:16 PM, David <d@dav1d.de> wrote:
> 
> That's not the problem. The problem has nothing to do with the tessellation, since the *rendering* is also 1000% slower (when all data is already processed).

Is the alignment different between one and the other? I would't think so since it's dynamic memory, but the performance difference suggests that it might be.