Thread overview
How to test if a string is pointing into read-only memory?
Oct 12
jfondren
Oct 12
Elronnd
Oct 12
Elronnd
Oct 12
IGotD-
Oct 12
ag0aep6g
Oct 12
Kagamin
October 12

std.string.toStringz always allocates a new string, but it has this note:

/+ Unfortunately, this isn't reliable.
 We could make this work if string literals are put
 in read-only memory and we test if s[] is pointing into
 that.

 /* Peek past end of s[], if it's 0, no conversion necessary.
 * Note that the compiler will put a 0 past the end of static
 * strings, and the storage allocator will put a 0 past the end
 * of newly allocated char[]'s.
 */
 char* p = &s[0] + s.length;
 if (*p == 0)
 return s;
 +/

and string literals weren't reliably in read-only memory as recently as early 2017: https://github.com/dlang/dmd/pull/6546#issuecomment-280612721

What's a reliable test that could be used in a toStringz that skips allocation when given a string in read-only memory?

As for whether it's a necessarily a good idea to patch toStringz, I'd worry that

  1. someone will slice a string literal and pass the test while not having NUL where it's expected

  2. people are probably relying by now on toStringz always allocating, to e.g. safely cast immutable off the result.

October 12
On Tuesday, 12 October 2021 at 08:19:01 UTC, jfondren wrote:
> What's a reliable test that could be used in a toStringz that skips allocation when given a string in read-only memory?

There is no good way.

- You could peek in /proc, but that's not portable

- You could poke the data and catch the resulting fault; but that's: 1) horrible, 2) slow, 3) problematic wrt threading, 4) sensitive to user code mapping its own memory and then remapping as rw (or unmapping)

- You could make a global hash table into which are registered the addresses of all rodata; but that is difficult to get right across translation units, especially in the face of dynamic linking.  This is probably the most feasible, but is really not worth the hassle.
October 12
On Tuesday, 12 October 2021 at 09:20:42 UTC, Elronnd wrote:
> problematic wrt threading

Not to mention signals.  Reentrancy's a bitch.
October 12
On 12.10.21 10:19, jfondren wrote:
> ```d
> /+ Unfortunately, this isn't reliable.
>   We could make this work if string literals are put
>   in read-only memory and we test if s[] is pointing into
>   that.
> 
>   /* Peek past end of s[], if it's 0, no conversion necessary.
>   * Note that the compiler will put a 0 past the end of static
>   * strings, and the storage allocator will put a 0 past the end
>   * of newly allocated char[]'s.
>   */
>   char* p = &s[0] + s.length;
>   if (*p == 0)
>   return s;
>   +/
> ```
[...]
> As for whether it's a necessarily a good idea to patch toStringz, I'd worry that
> 
> 1. someone will slice a string literal and pass the test while not having NUL where it's expected

The (commented-out) code checks if the NUL is there. Just make sure that it's also read-only.

> 2. people are probably relying by now on toStringz always allocating, to e.g. safely cast immutable off the result.

It doesn't matter if the result is freshly allocated. Casting away immutable is only allowed as long as you don't use it to actually change the data (i.e. it remains de-facto immutable).
October 12

On Tuesday, 12 October 2021 at 08:19:01 UTC, jfondren wrote:

>

and string literals weren't reliably in read-only memory as recently as early 2017: https://github.com/dlang/dmd/pull/6546#issuecomment-280612721

Sometimes sections have defined symbols for start and end, you can check if the string is in rdata section. On windows you can test it generically with IsBadWritePtr function.

October 12
On Tuesday, 12 October 2021 at 09:20:42 UTC, Elronnd wrote:
>
> There is no good way.

Can't it be done using function overloading?
October 12
On Tuesday, 12 October 2021 at 21:42:45 UTC, IGotD- wrote:
> On Tuesday, 12 October 2021 at 09:20:42 UTC, Elronnd wrote:
>>
>> There is no good way.
>
> Can't it be done using function overloading?

Function overloading lets you distinguish between arguments with different types, but strings in read-only memory and strings in read-write memory both have the same type: string.