April 08, 2015
While I technically finished the 0.2 version of my graphics engine which has a reasonable speed at low internal resolutions and with only a couple of sprites, but it still gets bottlenecked a lot. First I'll throw out the "top-down determination algorhythm" as it requires constant memory paging (alrought it made much more sense when the engine was full OO and even slower).

Instead I'll use a overwriting ("bottom-up") method. It still needs constant updates and I have to remove the per sprite transparency key and use a per layer key, however it requires much less paging, and still have the ability of unbound layer numbers and sprite count with unbound sizes.

I also came up with the idea of reading slices out from the graphical elements to potentially speed up the process a bit, especially as the custom bitmaps it uses are 16bit for palette operations, so per pixel read operations would waste a portion of memory bus. So should I write a method for the bitmap class which gets a line from it? (an array slice as it contains the data in a single 1D array to avoid jagged arrays on a future expansion for a scaler) And can I write an array slice at a position of an array? (to reduce writeback calls)
April 09, 2015
Am Wed, 08 Apr 2015 17:01:43 +0000
schrieb "ZILtoid1991" <ziltoidtheomnicent@gmail.com>:

> While I technically finished the 0.2 version of my graphics engine which has a reasonable speed at low internal resolutions and with only a couple of sprites, but it still gets bottlenecked a lot. First I'll throw out the "top-down determination algorhythm" as it requires constant memory paging (alrought it made much more sense when the engine was full OO and even slower).
> 
> Instead I'll use a overwriting ("bottom-up") method. It still needs constant updates and I have to remove the per sprite transparency key and use a per layer key, however it requires much less paging, and still have the ability of unbound layer numbers and sprite count with unbound sizes.
> 
> I also came up with the idea of reading slices out from the graphical elements to potentially speed up the process a bit, especially as the custom bitmaps it uses are 16bit for palette operations, so per pixel read operations would waste a portion of memory bus. So should I write a method for the bitmap class which gets a line from it? (an array slice as it contains the data in a single 1D array to avoid jagged arrays on a future expansion for a scaler) And can I write an array slice at a position of an array? (to reduce writeback calls)

I don't get the whole picture, but an array slice is just a
pointer and a length, so it doesn't access the pixels at all.
If you know you will be going over the whole scanline, by all
means get an array slice (or a C style raw pointer) and
process it in as big chunks as possible. For example
copying with 128-bit SSE registers will be much faster than
16-bit ushorts. (SSE4 also offers a table lookup function.)
Beyond that - since it is a gfx engine - why not use OpenGL
and drop the palette lookup altogether ?

-- 
Marco