Jump to page: 1 2 3
Thread overview
ARSD PNG memory usage
Jun 17, 2016
Joerg Joergonson
Jun 17, 2016
thedeemon
Jun 17, 2016
Adam D. Ruppe
Jun 17, 2016
ketmar
Aug 16, 2016
Guillaume Piolat
Aug 16, 2016
Adam D. Ruppe
Aug 17, 2016
Guillaume Piolat
Aug 29, 2016
Guillaume Piolat
Jun 17, 2016
Adam D. Ruppe
Jun 17, 2016
Joerg Joergonson
Jun 17, 2016
kinke
Jun 18, 2016
Joerg Joergonson
Jun 17, 2016
Adam D. Ruppe
Jun 17, 2016
Joerg Joergonson
Jun 18, 2016
Joerg Joergonson
Jun 18, 2016
Joerg Joergonson
Jun 18, 2016
Adam D. Ruppe
Jun 18, 2016
Joerg Joergonson
Jun 18, 2016
Adam D. Ruppe
Jun 19, 2016
Joerg Joergonson
Jun 19, 2016
Joerg Joergonson
Jun 18, 2016
Joerg Joergonson
Jun 18, 2016
Adam D. Ruppe
Jun 18, 2016
Joerg Joergonson
June 17, 2016
Hi, so, do you have any idea why when I load an image with png.d it takes a ton of memory?

I have a 3360x2100 that should take around 26mb of memory uncompressed and a bunch of other smaller png files.

Are you keeping multiple buffers of the image around? A trueimage, a memoryimage, an opengl texture thing that might be in main memory, etc? Total file space of all the images is only about 3MB compressed and 40MB uncompressed. So it's using around 10x more memory than it should! I tried a GC collect and all that.

I don't think my program will have a chance in hell using that much memory. That's just a few images for gui work. I'll be loading full page png's later on that might have many pages(100+) that I would want to pre-cache. This would probably cause the program to use TB's of space.

I don't know where to begin diagnosing the problem. I am using openGL but I imagine that shouldn't really allocate anything new?

I have embedded the images using `import` but that shouldn't really add much size(since it is compressed) or change things.

You could try it out yourself on a test case to see? (might be a windows thing too) Create a high res image(3000x3000, say) and load it like

auto eImage = cast(ubyte[])import("mylargepng.png");

TrueColorImage image = imageFromPng(readPng(eImage)).getAsTrueColorImage;	
OpenGlTexture oGLimage = new OpenGlTexture(image); // Will crash without create2dwindow
//oGLimage.draw(0,0,3000,3000);


When I do a bare loop minimum project(create2dwindow + event handler) I get 13% cpu(on 8-core skylake 4ghz) and 14MB memory.

When I add the code above I get 291MB of memory(for one image.

Here's the full D code source:


module winmain;

import arsd.simpledisplay;
import arsd.png;
import arsd.gamehelpers;

void main()
{
			
		auto window = create2dWindow(1680, 1050, "Test");

		auto eImage = cast(ubyte[])import("Mock.png");

		TrueColorImage image = imageFromPng(readPng(eImage)).getAsTrueColorImage;   // 178MB	
		OpenGlTexture oGLimage = new OpenGlTexture(image);	 // 291MB
		//oGLimage.draw(0,0,3000,3000);

		window.eventLoop(50,
			delegate ()
			{
				window.redrawOpenGlSceneNow();
			},

		);
}

Note that I have modified create2dWindow to take the viewport and set it to 2x as large in my own code(I removed here). It shouldn't matter though as it's the png and OpenGlTexture that seem to have the issue.

Surely once the image is loaded by opengl we could potentially disregard the other images and virtually no extra memory would be required? I do use getpixel though, not sure it that could be used on OpenGLTexture's? I don't mind keeping a main memory copy though but I just need it to have a realistic size ;)

So two problems: 1 is the cpu usage, which I'll try to get more info on my side when I can profile and 2 is the 10x memory usage. If it doesn't happen on your machine can you try alternate(if 'nix, go for win, or vice versa). This way we can get an idea where the problem might be.

Thanks!  Also, when I try to run the app in 64-bit windows, RegisterClassW throws for some reason ;/ I haven't been able to figure that one out yet ;/












June 17, 2016
On Friday, 17 June 2016 at 01:51:41 UTC, Joerg Joergonson wrote:
> Hi, so, do you have any idea why when I load an image with png.d it takes a ton of memory?

I've bumped into this previously. It allocates a lot of temporary arrays for decoded chunks of data, and I managed to reduce those allocations a bit, here's the version I used:
http://stuff.thedeemon.com/png.d
(last changed Oct 2014, so may need some tweaks today)

But most of allocations are really caused by using std.zlib. This thing creates tons of temporary arrays/slices and they are not collected well by the GC. To deal with that I had to use GC arenas for each PNG file I decode. This way all the junk created during PNG decoding is eliminated completely after the decoding ends. See gcarena module here:
https://bitbucket.org/infognition/dstuff
You may see Adam's PNG reader was really the source of motivation for it. ;)
June 17, 2016
On Friday, 17 June 2016 at 02:55:43 UTC, thedeemon wrote:
> I've bumped into this previously. It allocates a lot of temporary arrays for decoded chunks of data, and I managed to reduce those allocations a bit, here's the version I used:

If you can PR any of it to me, I'll merge.

It actually has been on my todo list for a while to change the decoder to generate less garbage. I have had trouble in the past with temporary arrays being pinned by false pointers and the memory use ballooning from that, and the lifetime is really easy to manage so just malloc/freeing it would be an easy solution, just like you said, std.zlib basically sucks so I have to use the underlying C functions and I just haven't gotten around to it.


June 17, 2016
On Friday, 17 June 2016 at 01:51:41 UTC, Joerg Joergonson wrote:
> Are you keeping multiple buffers of the image around? A trueimage, a memoryimage, an opengl texture

MemoryImage and TrueImage are the same thing, memory is just the interface, true image is the implementation.

OpenGL texture is separate, but it references the same memory as a TrueColorImage, so it wouldn't be adding.


You might have pinned temporary buffers though. That shouldn't happen on 64 bit, but on 32 bit I have seen it happen a lot.

> When I do a bare loop minimum project(create2dwindow + event handler) I get 13% cpu(on 8-core skylake 4ghz) and 14MB memory.

I haven't seen that here.... but I have a theory now: you have some pinned temporary buffer on 32 bit (on 64 bit, the GC would actually clean it up) that keeps memory usage near the collection boundary.

Then, a small allocation in the loop - which shouldn't be happening, I don't see any in here... - but if there is a small allocation I'm missing, it could be triggering a GC collection cycle each time, eating CPU to scan all that wasted memory without being able to free anything.

If you can run it in the debugger and just see where it is by breaking at random, you might be able to prove it.

That's a possible theory.... I can reproduce the memory usage here, but not the CPU usage though. Sitting idle, it is always <1% here (0 if doing nothing, like 0.5% if I move the mouse in the window to generate some activity)

 I need to get to bed though, we'll have to check this out in more detail later.


> Thanks!  Also, when I try to run the app in 64-bit windows, RegisterClassW throws for some reason ;/ I haven't been able to figure that one out yet ;/

errrrrr this is a mystery to me too... a hello world on 64 bit seems to work fine, but your program tells me error 998 (invalid memory access) when I run it. WTF, both register class the same way.

I'm kinda lost on that.
June 17, 2016
On Friday, 17 June 2016 at 04:32:02 UTC, Adam D. Ruppe wrote:
> On Friday, 17 June 2016 at 01:51:41 UTC, Joerg Joergonson wrote:
>> Are you keeping multiple buffers of the image around? A trueimage, a memoryimage, an opengl texture
>
> MemoryImage and TrueImage are the same thing, memory is just the interface, true image is the implementation.
>
> OpenGL texture is separate, but it references the same memory as a TrueColorImage, so it wouldn't be adding.
>

ok, then it's somewhere in TrueColorImage or the loading of the png.

>
> You might have pinned temporary buffers though. That shouldn't happen on 64 bit, but on 32 bit I have seen it happen a lot.
>

Ok, IIRC LDC both x64 and x86 had high memory usage too, so if it shouldn't happen on 64-bit(if it applies to ldc), this then is not the problem. I'll run a -vgc on it and see if it shows up anything interesting.

>> When I do a bare loop minimum project(create2dwindow + event handler) I get 13% cpu(on 8-core skylake 4ghz) and 14MB memory.
>
> I haven't seen that here.... but I have a theory now: you have some pinned temporary buffer on 32 bit (on 64 bit, the GC would actually clean it up) that keeps memory usage near the collection boundary.

Again, it might be true but I'm pretty sure I saw the problem with ldc x64.

> Then, a small allocation in the loop - which shouldn't be happening, I don't see any in here... - but if there is a small allocation I'm missing, it could be triggering a GC collection cycle each time, eating CPU to scan all that wasted memory without being able to free anything.
>

Ok, Maybe... -vgc might show that.

> If you can run it in the debugger and just see where it is by breaking at random, you might be able to prove it.
>

Good idea, not thought about doing that ;) Might be a crap shoot but who knows...

> That's a possible theory.... I can reproduce the memory usage here, but not the CPU usage though. Sitting idle, it is always <1% here (0 if doing nothing, like 0.5% if I move the mouse in the window to generate some activity)
>
>  I need to get to bed though, we'll have to check this out in more detail later.
>
me too ;) I'll try to test stuff out a little more when I get a chance.

>
>> Thanks!  Also, when I try to run the app in 64-bit windows, RegisterClassW throws for some reason ;/ I haven't been able to figure that one out yet ;/
>
> errrrrr this is a mystery to me too... a hello world on 64 bit seems to work fine, but your program tells me error 998 (invalid memory access) when I run it. WTF, both register class the same way.
>
> I'm kinda lost on that.

Well, It works on LDC x64! again ;) This seems like an issue with DMD x64? I was thinking maybe it has to do the layout of the struct or something, but not sure.

---

I just run a quick test:

LDC x64 uses about 250MB and 13% cpu.

I couldn't check on x86 because of the error

phobos2-ldc.lib(gzlib.c.obj) : fatal error LNK1112: module machine type 'x64' conflicts with target machine type 'X86'

not sure what that means with gzlib.c.ojb. Must be another bug in ldc alpha ;/


Anyways, We'll figure it all out at some point ;) I'm really liking your lib by the way. It's let me build a gui and get a lot done and just "work". Not sure if it will work on X11 with just a recompile, but I hope ;)

June 17, 2016
On Friday, 17 June 2016 at 03:41:02 UTC, Adam D. Ruppe wrote:
> It actually has been on my todo list for a while to change the decoder to generate less garbage. I have had trouble in the past with temporary arrays being pinned by false pointers and the memory use ballooning from that, and the lifetime is really easy to manage so just malloc/freeing it would be an easy solution, just like you said, std.zlib basically sucks so I have to use the underlying C functions and I just haven't gotten around to it.

did that. decoding still sux, but now it should suck less. ;-) encoder is still using "std.zlib", though. next time, maybe.
June 17, 2016
On Friday, 17 June 2016 at 04:54:27 UTC, Joerg Joergonson wrote:
> LDC x64 uses about 250MB and 13% cpu.
>
> I couldn't check on x86 because of the error
>
> phobos2-ldc.lib(gzlib.c.obj) : fatal error LNK1112: module machine type 'x64' conflicts with target machine type 'X86'
>
> not sure what that means with gzlib.c.ojb. Must be another bug in ldc alpha ;/

It looks like you're trying to link 32-bit objects to a 64-bit Phobos.
The only pre-built LDC for Windows capable of linking both 32-bit and 64-bit code is the multilib CI release, see https://github.com/ldc-developers/ldc/releases/tag/LDC-Win64-master.
June 17, 2016
On Friday, 17 June 2016 at 04:54:27 UTC, Joerg Joergonson wrote:
> ok, then it's somewhere in TrueColorImage or the loading of the png.

So, opengltexture actually does reallocate if the size isn't right for the texture... and your image was one of those sizes.

The texture pixel size needs to be a power of two, so 3000 gets rounded up to 4096, which means an internal allocation.

But it can be a temporary one! So ketmar tackled png.d's loaders' temporaries and I took care of gamehelper.d's...

And the test program went down about to 1/3 of its memory usage. Try grabbing the new ones from github now and see if it works for you too.


> Well, It works on LDC x64! again ;) This seems like an issue with DMD x64? I was thinking maybe it has to do the layout of the struct or something, but not sure.

I have a fix for this too, though I don't understand why it works....

I just .dup'd the string literal before passing it to Windows. I think dmd is putting the literal in a bad place for these functions (they do bit tests to see if it is a pointer or an atom, so maybe it is in an address where the wrong bits are set)

In any case, the .dup seems to fix it, so all should work on 32 or 64 bit now. In my tests, now that the big temporary arrays are manually freed, the memory usage is actually slightly lower on 32 bit, but it isn't bad on 64 bit either.


The CPU usage is consistently very low on my computer. I still don't know what could be causing it for you, but maybe it is the temporary garbage... let us know if the new patches make a difference there.

> Anyways, We'll figure it all out at some point ;) I'm really liking your lib by the way. It's let me build a gui and get a lot done and just "work". Not sure if it will work on X11 with just a recompile, but I hope ;)


It often will! If you aren't using any of the native event handler functions or any of the impl.* members, most things just work (exception being the windows hotkey functions, but those are marked Windows anyway!). The basic opengl stuff is all done for both platforms. Advanced opengl isn't implemented on Windows yet though (I don't know it; my opengl knowledge stops in like 1998 with opengl 1.1 sooooo yeah, I depend on people's contributions for that and someone did Linux for me, but not Windows yet. I think.)
June 17, 2016
On Friday, 17 June 2016 at 14:48:22 UTC, Adam D. Ruppe wrote:
> On Friday, 17 June 2016 at 04:54:27 UTC, Joerg Joergonson wrote:
>> [...]
>
> So, opengltexture actually does reallocate if the size isn't right for the texture... and your image was one of those sizes.
>
> [...]


Cool, I'll check all this out and report back. I'll look into the cpu issue too.

Thanks!
June 18, 2016
On Friday, 17 June 2016 at 14:48:22 UTC, Adam D. Ruppe wrote:
> On Friday, 17 June 2016 at 04:54:27 UTC, Joerg Joergonson wrote:
>> ok, then it's somewhere in TrueColorImage or the loading of the png.
>
> So, opengltexture actually does reallocate if the size isn't right for the texture... and your image was one of those sizes.
>
> The texture pixel size needs to be a power of two, so 3000 gets rounded up to 4096, which means an internal allocation.
>
> But it can be a temporary one! So ketmar tackled png.d's loaders' temporaries and I took care of gamehelper.d's...
>
> And the test program went down about to 1/3 of its memory usage. Try grabbing the new ones from github now and see if it works for you too.
>

Yes, same here! Great! It runs around 122MB in x86 and 107MB x64. Much better!

>
>> Well, It works on LDC x64! again ;) This seems like an issue with DMD x64? I was thinking maybe it has to do the layout of the struct or something, but not sure.
>
> I have a fix for this too, though I don't understand why it works....
>
> I just .dup'd the string literal before passing it to Windows. I think dmd is putting the literal in a bad place for these functions (they do bit tests to see if it is a pointer or an atom, so maybe it is in an address where the wrong bits are set)
>

Yeah, strange but good catch! It now works in x64! I modified it to to!wstring(title).dup simply to have the same title and classname.

> In any case, the .dup seems to fix it, so all should work on 32 or 64 bit now. In my tests, now that the big temporary arrays are manually freed, the memory usage is actually slightly lower on 32 bit, but it isn't bad on 64 bit either.

I have the opposite on memory but not a big deal.


> The CPU usage is consistently very low on my computer. I still don't know what could be causing it for you, but maybe it is the temporary garbage... let us know if the new patches make a difference there.

I will investigate this soon and report back anything. It probably is something straightforward.

>> Anyways, We'll figure it all out at some point ;) I'm really liking your lib by the way. It's let me build a gui and get a lot done and just "work". Not sure if it will work on X11 with just a recompile, but I hope ;)
>
>
> It often will! If you aren't using any of the native event handler functions or any of the impl.* members, most things just work (exception being the windows hotkey functions, but those are marked Windows anyway!). The basic opengl stuff is all done for both platforms. Advanced opengl isn't implemented on Windows yet though (I don't know it; my opengl knowledge stops in like 1998 with opengl 1.1 sooooo yeah, I depend on people's contributions for that and someone did Linux for me, but not Windows yet. I think.)

I found this on non-power of 2 textures:

https://www.opengl.org/wiki/NPOT_Texture


https://www.opengl.org/registry/specs/ARB/texture_non_power_of_two.txt

It seems like it's probably a quick and easy add on and you already have the padding code, it could easily be optional(set a flag or pass a bool or whatever).

it could definitely same some serious memory for large textures.

e.g., a 3000x3000x4 texture takes about 36MB or 2^25.1 bytes. Since this has to be rounded up to 2^26 = 67MB, we almost have doubled the amount of wasted space.

Hence, allowing for non-power of two would probably reduce the memory footprint of my code to near 50MB(around 40MB being the minimum using uncompressed textures).

I might try to get a working version of that at some point. Going to deal with the cpu thing now though.

Thanks again.


« First   ‹ Prev
1 2 3