On Friday, 1 August 2025 at 11:02:37 UTC, kdevel wrote:
> On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:
> [...]
If we checked for mid-string zero terminators on all calls to toStringz
, we would kill performance where mostly it isn't necessary (this is only important if you don't trust where the data came from. This would lead to a different sort of problem ("How come D/C interop is so slow!?")
This is a strawman. I am writing about file system functions! Furthermode in the case of std.file.rename
toStringz
is not even called. What is used to convert to char *
seems to be the highly interesting template tempCString
in std.internal.cstring
.
I was responding to the link sent about toStringz
. I don't think we can always check for internal 0 characters there.
But in tempCString
, we can, since we are always copying.
However, this is delicate, because you will affect performance. A slice assign is memcpy
, and memcpy
is damn fast (and hard to reimplement).
If instead you check every character, you will change to a for loop, which will be slow.
I think the right answer here is to use strncpy. According to the docs, strncpy will copy up to N characters. But if a NUL character is reached before end of the string, then it zeroes the rest of the buffer. This means we can detect whether a 0 was inside the string by checking the last byte copied.
So this really is a limitation of tempCString
, and we can fix that. However, std.file could just have easily used toStringz
.
In terms of correctness, you are passing in a parameter that has different properties than the one ultimately sent to the underlying call. It's always a fight between correctness and performance here. I always prefer (if it is available) an API that does not rely on zero termination, but when it is not available, it needs to be checked. Where it is checked is important.
I'm not sure that's the library's responsibility. In other words, if I pass in rmdir("foo")
, then why should I pay the penalty of examining "foo"
for malicious NUL bytes? The source is a literal, I can know whether it has them or not. In fact, there isn't even a way to pass in a char*
to rmdir
, which is unfortunate.
We have the capability of making an API that allows for all cases -- user input, static data, validated data, actual C strings, etc. We shouldn't make all these go through the same super-defensive gauntlet.
> And of course a library should not assert, nor exit nor ignore the error, but make it handleable:
#!/usr/bin/python
def myfun (filename):
open (filename, 'w')
try:
myfun ("a\0c")
except TypeError:
print ("error occurred")
# raise
Python is not D. It cannot do any kind of type introspection, and char* just isn't a thing, so it has to always go through the same path. We don't want to make D as slow as python.
-Steve