August 01
On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:
> The OP's issue is looking at this the wrong way. The responsibility is on the user to validate their input.

Technically, any function calling `toStringz()` should probably inherit that warning from the doc comment of `toStringz` into its own documentation.
August 01

On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:

>

If we checked for mid-string zero terminators on all calls to toStringz, we would kill performance where mostly it isn't necessary

I bet the speed test is effectively a tie if its reasonably written

August 01

On Friday, 1 August 2025 at 00:22:20 UTC, monkyyy wrote:

>

On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:

>

If we checked for mid-string zero terminators on all calls to toStringz, we would kill performance where mostly it isn't necessary

I bet the speed test is effectively a tie if its reasonably written

How do you "reasonably" write a linear search?

Sure you can make it a faster O(n), but it's still O(n). Whereas just "add on a 0 if not there" is an O(1) operation.

-Steve

August 01

On Friday, 1 August 2025 at 00:48:36 UTC, Steven Schveighoffer wrote:

>

On Friday, 1 August 2025 at 00:22:20 UTC, monkyyy wrote:

>

On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:

>

If we checked for mid-string zero terminators on all calls to toStringz, we would kill performance where mostly it isn't necessary

I bet the speed test is effectively a tie if its reasonably written

How do you "reasonably" write a linear search?

Sure you can make it a faster O(n), but it's still O(n). Whereas just "add on a 0 if not there" is an O(1) operation.

-Steve

Lets say you have a few paragraph of text that you split by \n then you call toStringz to pass it to raylib(and you pay attention to the immutablity of strings and not replace \n with null)

You can do better with char arrays and different data structures but the current api of the sorta maybe kinda dynamic array that airnt special case chars, immutablity, and c liking null termination all combine together to mean its probably making a copy. During that copy you could check.

August 01

On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:

>

[...]
If we checked for mid-string zero terminators on all calls to toStringz, we would kill performance where mostly it isn't necessary (this is only important if you don't trust where the data came from. This would lead to a different sort of problem ("How come D/C interop is so slow!?")

This is a strawman. I am writing about file system functions! Furthermode in the case of std.file.rename toStringz is not even called. What is used to convert to char * seems to be the highly interesting template tempCString in std.internal.cstring.

And of course a library should not assert, nor exit nor ignore the error, but make it handleable:

#!/usr/bin/python

def myfun (filename):
   open (filename, 'w')

try:
   myfun ("a\0c")
except TypeError:
   print ("error occurred")
#   raise
August 01

On Friday, 1 August 2025 at 11:02:37 UTC, kdevel wrote:

>

On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:

>

[...]
If we checked for mid-string zero terminators on all calls to toStringz, we would kill performance where mostly it isn't necessary (this is only important if you don't trust where the data came from. This would lead to a different sort of problem ("How come D/C interop is so slow!?")

This is a strawman. I am writing about file system functions! Furthermode in the case of std.file.rename toStringz is not even called. What is used to convert to char * seems to be the highly interesting template tempCString in std.internal.cstring.

I was responding to the link sent about toStringz. I don't think we can always check for internal 0 characters there.

But in tempCString, we can, since we are always copying.

However, this is delicate, because you will affect performance. A slice assign is memcpy, and memcpy is damn fast (and hard to reimplement).

If instead you check every character, you will change to a for loop, which will be slow.

I think the right answer here is to use strncpy. According to the docs, strncpy will copy up to N characters. But if a NUL character is reached before end of the string, then it zeroes the rest of the buffer. This means we can detect whether a 0 was inside the string by checking the last byte copied.

So this really is a limitation of tempCString, and we can fix that. However, std.file could just have easily used toStringz.

In terms of correctness, you are passing in a parameter that has different properties than the one ultimately sent to the underlying call. It's always a fight between correctness and performance here. I always prefer (if it is available) an API that does not rely on zero termination, but when it is not available, it needs to be checked. Where it is checked is important.

I'm not sure that's the library's responsibility. In other words, if I pass in rmdir("foo"), then why should I pay the penalty of examining "foo" for malicious NUL bytes? The source is a literal, I can know whether it has them or not. In fact, there isn't even a way to pass in a char* to rmdir, which is unfortunate.

We have the capability of making an API that allows for all cases -- user input, static data, validated data, actual C strings, etc. We shouldn't make all these go through the same super-defensive gauntlet.

>

And of course a library should not assert, nor exit nor ignore the error, but make it handleable:

#!/usr/bin/python

def myfun (filename):
   open (filename, 'w')

try:
   myfun ("a\0c")
except TypeError:
   print ("error occurred")
#   raise

Python is not D. It cannot do any kind of type introspection, and char* just isn't a thing, so it has to always go through the same path. We don't want to make D as slow as python.

-Steve

August 01

On Friday, 1 August 2025 at 17:53:17 UTC, Steven Schveighoffer wrote:

>

If instead you check every character, you will change to a for loop, which will be slow.

I think the right answer here is to use strncpy. According to the docs, strncpy will copy up to N characters. But if a NUL character is reached before end of the string, then it zeroes the rest of the buffer. This means we can detect whether a 0 was inside the string by checking the last byte copied.

https://github.com/dlang/phobos/issues/10836

Would be quite an easy fix if someone wants to tackle it.

-Steve

August 03

On Friday, 1 August 2025 at 17:53:17 UTC, Steven Schveighoffer wrote:

>

[...]
It's always a fight between correctness and performance here.

Pardon?

>

[...]
I'm not sure that's the library's responsibility. In other words, if I pass in rmdir("foo"), then why should I pay the penalty of examining "foo" for malicious NUL bytes?

The NUL is not representable in char *. But all phobos filesystem functions use unadorned string parameters for pathnames. Who else if not the designer of the library shall be responsible for coding (not paying!) the extra cpu cycles?

If the library expects pathnames without embedded NULs I would create a subtype of string, say fstring or cstring. One then has to discuss which of these calls

    rmdir ("a\0b");                //with the technique from [1]
    rmdir (fstring ("a\0b"));
    rmdir (cast (fstring) ("a\0b"));

shall compile and what one expects as runtime behavior.

[1] Implicit type conversion of an argument when a function is called
https://forum.dlang.org/thread/agstjpezerwlgdhphclk@forum.dlang.org

August 03

On Sunday, 3 August 2025 at 00:03:47 UTC, kdevel wrote:

>

On Friday, 1 August 2025 at 17:53:17 UTC, Steven Schveighoffer wrote:

>

[...]
It's always a fight between correctness and performance here.

Pardon?

I mean in terms of where you apply the checks. You can say "I expect you to have validated this before sending it in", or you can say "I will always validate this, even if you already have, in the name of correctness."

And I suppose a better word should have been "tradeoff" and not "fight".

> >

[...]
I'm not sure that's the library's responsibility. In other words, if I pass in rmdir("foo"), then why should I pay the penalty of examining "foo" for malicious NUL bytes?

The NUL is not representable in char *. But all phobos filesystem functions use unadorned string parameters for pathnames. Who else if not the designer of the library shall be responsible for coding (not paying!) the extra cpu cycles?

If the library expects pathnames without embedded NULs I would create a subtype of string, say fstring or cstring. One then has to discuss which of these calls

The issue is going to be fixed, not sure if you saw my issue report. It's quite an easy fix actually.

But just so you know, C also allows passing in C strings with embedded null characters:

#include <unistd.h>

int main() {
   rmdir("hello\0world");
   return 0;
}

And we do expose core.stdc.posix.unistd.

-Steve

6 days ago

On Thursday, 31 July 2025 at 23:27:42 UTC, H. S. Teoh wrote:

>

As a contrived example, say you prohibit "/etc/passwd" as a filename. Now what happens when the user inputs "/etc/passwd\0ha_you_missed_me" as filename? The OS considers the NUL as the end of the filename, so your user gets access to "/etc/passwd" after all.

If you need path validation, you probably will do more checks than null check, stdio won't cut it. See how path validation vulnerabilities work IRL: https://github.com/dagster-io/dagster/pull/30002