Thread overview
Pragma msg goes out of memory when supplied with large data.
May 21
realhet
May 21
realhet
May 21
realhet
6 days ago
realhet
6 days ago
Dennis
6 days ago
Ali Çehreli
5 days ago
realhet
5 days ago
realhet
May 21

Hi,

I have a pragma(msg, xxx) statement where x is a byte array of 30KBytes.

LDC2 produces the following symptom: It's memory usage goes slowly up to the maximum (20GB) then stops with out of memory error. The amount of used memory grows in an exponentially slowing rate. (I guess it's a reallocation/append problem, but I can't see what's going on inside. For small data it's ok, but for 'big' data it's bad. But normally pragma(msg) is not used for 'big' data.)

I can catch the contents on the other side of stdout stream.

But it violates something inside LDC2/DMD (win64 platform).

I also know it's not a normal scenario, but I have a reason to try hard and solve it.
(It's going to be a CompileTime build/make system for GLSL shader code embedded into a D environment.)

Anyone has more knowledge on this, please help me. I tried to send a quoted string too, but also failed. Just like ubyte[]. All my experiments going in a direction that pragma(msg, xxx) has an upper limit for safe working. For a few kilobytes it was perfect. With a hello world compute shader it was perfect: the shader source received all the constants known in Compile Time and the generated exe file received the compiled shader binary using the new string import feature. But with a larger production shader, my smile went down fast :S

Now I will try a new crazy idea: multipart pragma messages. I hope it will cancel out that exponential growth problem I feel.

May 21

On Wednesday, 21 May 2025 at 14:00:56 UTC, realhet wrote:

>

Hi,

Small program to reproduce.

import std;

string doit(string data)()
{
	static foreach(i; 0..256)
	{
		pragma(msg, i"Here goes lots of data: $(cast(ubyte[])data)".text);
	}
	return "dummy";
}

static immutable storedData = doit!("x".replicate(256));

void main(){ writeln(storedData.length, storedData.all!"a=='x'"); }

run with: ldc2 testPragma.d > a.txt 2>&1

This is not so fast but works.

If I remove the static foreach(i; 0..256), it will never complete, it exponentially runs out of memory.

I'm basically searching for the fastest version of the inverse of import(file).

I think the operations inside the pragma(msg, xxx) construct has a really high cost. I will experiment more.

May 21

On Wednesday, 21 May 2025 at 14:40:21 UTC, realhet wrote:

>

On Wednesday, 21 May 2025 at 14:00:56 UTC, realhet wrote:
Small program to reproduce.

I'm lucky:
pragma(msg, "Here comes the big data: ", data);
With this simple way I managed to put 16MB large data through it in no time.
It was new to me that pragma_msg don't care about the actual type of data type, it just copies the raw bytes to the stderr as fast as it can.

And if I use a high level functional transformation on it, it goes terrible slow.
For example the complexity inside text() has a really big penalty in Compile Time.

Because of this, I wanted to make a simpler non-functional transformation myself:

string transform(string a)
{
	auto b = cast(ubyte[])a.dup;
	foreach(ref ch; b) if(ch=='x') ch = 'y';
	return cast(string)b;
}

For my big surprise, with this simple thing I managed to trigger that same out of memory 'bug'.

Now I only have one question: Why are these simple looking things are so slow in compile time? I learned that there is a different limited runtime environment in compile-time, but I don't understand. Anyone can help with some clues please?

6 days ago

On Wednesday, 21 May 2025 at 15:20:57 UTC, realhet wrote:

>

Why are these simple looking things are so slow in compile time?

Now I learned why: I had the misconception that CTFE is using the compiler itself to generate code and then runs it with the CPU. In reality I've found out it's an interpreter.
On top of this, mutating data, even a single byte of a 1 MByte array will allocate the whole array again, that's why I got the out of memory errors even with 64K data.
I also know that the compiler only deallocates at the very end.

I will keep these limitations in my head, CTFE is still an awesome thing.

My experiments with integrating external data processing into the LDC2 compiling process were a success:

  • export: It was possible to export data (20MByte) from the compiler in seconds, using pragma(msg, ...) Just don't touch the large data with CTFE.
  • import: It was also possible to import large data fast, by using import(filename). Where the file was served using ProjectedFS.

This is basically a modified version of the StringImport (-J) language construct: It is a StringImport that redirects data to an external program and imports the resulting data from that external programs into the D source code.
(Also an even bigger security risk than the original one :D)

6 days ago

On Friday, 23 May 2025 at 12:06:00 UTC, realhet wrote:

>
  • export: It was possible to export data (20MByte) from the compiler in seconds, using pragma(msg, ...) Just don't touch the large data with CTFE.

pragma(msg) is meant to print informative human-readable strings for debugging purposes. It's not designed for large data or consistent results. A pull request to the compiler improving the formatting of a pragma(msg) might break your setup.

Please carefully consider whether a convoluted build system is worth it. I've done some cute GLSL+D metaprogramming as well in the past with the goal to automatically bind uniforms and vertex attributes between the two languages seamlessly, but I'm not using it anymore because the impedance mismatch is too high.

I'm talking: Different alignments (std140 and std430), booleans are 4 bytes, normalized and packed integers need special types on the D side, vec3 and float[3] both exist in OpenGL, there's 36 sampler types, OpenGL's reflection API has holes (you can't query the pixelformat from layout(binding=2, rg32f)), column/row major matrix layout etc.

This results in a complex system that's more annoying to deal with than the original problem of just maintaining a .d file and shader file in parallel, which is what I'm doing now for the time being.

... but if that didn't scare you off, you can use __ctfeWrite 😉

string f(string s) { __ctfeWrite(s); return s; };
enum x = f("Hello world\n"); // prints "Hello World\n" to stderr

I recommend keeping it simple, cover common cases and don't try to make it perfect, because it won't be. But if you do somehow make it perfect, post your results, I'd love to see it!

6 days ago
On 5/23/25 6:10 AM, Dennis wrote:

> This results in a complex system that's more annoying to deal with than
> the original problem of just maintaining a .d file and shader file in
> parallel, which is what I'm doing now for the time being.

Approved! Sounds like engineering to me. :)

Ali

5 days ago

On Friday, 23 May 2025 at 22:11:58 UTC, Ali Çehreli wrote:

>

On 5/23/25 6:10 AM, Dennis wrote:

>

This results in a complex system that's more annoying to deal
with than
the original problem of just maintaining a .d file and shader
file in
parallel, which is what I'm doing now for the time being.

Approved! Sounds like engineering to me. :)

I totally accept your worries and I thought about this critical dilemma a lot in the past too.
This is the very first time, I'm going to use an external 'hack' to improve the capabilities of LDC2.exe to my needs.

Here are my points, why:

Maintaining 2 sets of code is too much for my brain. I notice the following bad pattern from time to time:

  • I have an idea,
  • I try to implement it quickly,
  • then I end up with a bug (caused by own human error),
  • fighting with it for hours/days, then my client can't understand why is it took so long. And both of us getting frustrated. The bad outcome is that I rather don't touch these codes. The innovation stops in order to avoid frustration.

So I really need the help of the machine in these tasks that are require 100% focus.
And I as a human can easily experiment with stuff, without the worries I can cause nasty hidden bugs with that.

If you change the internal behavior of the compiler that break my 'hack', I will be notice that because I'm sending a hashOf() as well. But so far so good. :D

Every year I'm spending some weeks, (maybe a month) to catch up with the new features of D and use them in my framework. And I'm happy to see the progress, I don't care if sometimes it breaks my stuff, I can detect it and fix it. My recent favourite feature is $() without a doubt!

An addition the to dificulty of this redundant CPU/GPU code management is this: I'm transitioning from OpenGL to Vulkan. And Vulkan checks nothing, from now, I have to check everything. Alone I sure can't do this, I know my limits. But I'm sure with D's metaprogramming I will. Vulkan can't even care the human readable identifiers inside a 'shader binary', so sampler times, no 'entity framework' at all, just integer indices. (And I think Vulkan is a piece of art. It doesn't try to make the users job easier, it's only aim is to give full control of the underlying hardware.)

This difficult transition also can't avoided by me: I need Vulkan because I want to unlock true multithreading, I'm gonna have a lot of PCIE traffic from cameras, while I want my UI go at 60FPS. Vulkan is ideal for this, I can access all components individually. OpenGL serializes all work, it is easy to clog it with a 10MB texture.

One more thing: I just took a look at the std430 align requirements in the Vulkan documentation, and I was like, no way I do and guarantee all this manually.

Anyways, I can't go back :D

So as Dennis asked earlier, here are a small demo of my experiments with this 'hack':

This is the way of declaring an embedded source code inside a D module:

First I declare fields for the uniform buffer:

enum UBO_fields =
q{
	uint 	param0,
		param1;
};
static struct UBO { mixin(UBO_fields); }

Then I declare some constants that are both accessible from the shader and from the CPU program, I also specify the command line args for the external compiler:

enum groupSize 	= 1024,
bufSize 	= 64 <<20,
shaderBinary = (碼!(iq{glslc -O}.text,iq{
	#version 430

	layout(local_size_x = $(groupSize)) in;
	
	layout(binding = 0) uniform UBO {$(UBO_fields)};
	
	layout(std430, binding = 1) buffer BUF { uint values[]; };
	
	void main() {
		const uint id = gl_GlobalInvocationID.x;
		values[id] = values[id] * param0 + param1 + 1000;
	}
}.text));

(Note that I use Chinese identifier chars to mark machine related parts of my code.)

When the D compiler reaches the 碼 template it will do this:

template 碼/+ExternalCode+/(string args, string src, string FILE=__FILE__, int LINE=__LINE__)
{
	pragma(msg, "$DIDE_TEXTBLOCK_BEGIN$");
	pragma(msg, src /+This is large data to CTFE, so do not touch it here!!!+/);
	pragma(msg, "$DIDE_TEXTBLOCK_END$");
	enum hash = src.hashOf(args.hashOf).to!string(26) /+hashOf CTFE performance: 1.2ms / 1KB+/;
	pragma(msg, FILE, "(", LINE, ",1): $DIDE_EXTERNAL_COMPILATION_REQUEST: ", args.quoted, ",", hash.quoted);
	//The TEXTBLOCK will be attached to the end of the compilation request message as a string literal inside DIDE.
	enum 碼 = (cast(immutable(ubyte)[])(import(hash)));
	static if(碼.startsWith(cast(ubyte[])("ERROR:")))
	{
		pragma(msg, (cast(string)(碼)).splitter('\n').drop(1).join('\n'));
		static assert(false, "$DIDE_EXTERNAL_COMPILATION_"~(cast(string)(碼)).splitter('\n').front);
	}
}

It sends out the following things to stderr:

  • A begin of data market
  • The actual data: The source code as it is.
  • An end of data marker
  • Finally a command that looks like a standard error message containing the following indo:
    • source code location of the template issuation
    • the command line arguments az a quoted string. (It's small, CTFE performance is fine here)
    • the hash of the large source code

At this point inside another process my buildsystem processes the stderr messages:

  • It detects the begin/end markers and collects the large source code inside those.
  • It catches the message with the source location, the command line and the hash (in textual form).

So now the buildsystem has all the data to begin the external compilation process.
In my example it will call "glslc.exe" and gives it a file with the source code that was earlier received from stdErr.
After the compilation finishes, it puts the resulting binary (or compilation error messages) into an associative array (into a cache. This compilation is also incremental, just like the D modules).

Now in the D module the compiler reaches the stringImport statement:

enum 碼 = (cast(immutable(ubyte)[])(import(hash)));

Then name of the file is the hash. And the path of the file is served by ProjectedFS (Now that I worked with this on windows for a week, I can say it's a stable, reliable technology: I use it to virtually serve any files inside a directory.)

Do LDC2.exe will open the import(file), meanwhile in the buildsystem "glslc.exe" is compiling the shader.
When "glslc.exe" finishes, the file content is sent to ProjectedFS.

Now inside LDC2.exe the fread() instruction returns and LDC2.exe will have the shader binary.

Other ways to do this:

Second chance, if the 'hack' stops working because of a compiler change:

  1. Compile the modules as normally with an incremental way.
  2. run the exe in a special 'mode', so it will export the shader sources generated in DLang CTFE. (Difficult because what if multiple modules.)
  3. compile the shaders
  4. zip the shaders and copy to the end of the exe file.

Third chance:
Move the shader compilation task (glslc, Vulkan SDK), to the machine of the client. I don't want this.

So I have more options but the best one form me was the "extending the functionality of the LDC2.exe by a non standard thing". (I had my own macro preprocessor for Delphi around 15-20 years ago. It was awesome but it also had a heavy IDE integration dependency, I burned my hands with that, haha. But this time, I'm optimistic.)

I can only choose the fastest way because I'm lonewolfing, this tool lets me have control over 100KLOC which is kinda my limit. Once there is a team involved in the development, these tools easily transforming from helping goodies to a big burden, noone want to deal with, I know.

(Thanks for __ctfeWrite! I can't use it right now because I'm depending on LDC2, but I saved it in my stash. ;) )

5 days ago

On Friday, 23 May 2025 at 13:10:47 UTC, Dennis wrote:

>

I recommend keeping it simple, cover common cases and don't try to make it perfect, because it won't be. But if you do somehow make it perfect, post your results, I'd love to see it!

When you look at it through my glasses (graphical IDE), I believe it is much simpler than the source code I pasted here in text.

youtube link, (headphone users BEWARE :D ) -> https://youtu.be/UiDfDvsZNxw
"Experimental DLang IDE - Embedded Vulkan Compute Hello World"

It's not perfect. Yesterday I had lockups, but I also had out of memory errors, so that's ok.

But it's good enough.
Feasibility test = passed for sure!

I will have a month ahead of me for testing and using it.