Thread overview
[Issue 8229] New: string literals are not zero-terminated during CTFE
Jun 11, 2012
timon.gehr@gmx.ch
Jun 12, 2012
Don
Jun 12, 2012
timon.gehr@gmx.ch
Jun 13, 2012
Don
Sep 27, 2013
Martin Nowak
Sep 28, 2013
Martin Nowak
June 11, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8229

           Summary: string literals are not zero-terminated during CTFE
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Keywords: CTFE
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: timon.gehr@gmx.ch


--- Comment #0 from timon.gehr@gmx.ch 2012-06-11 15:56:58 PDT ---
DMD 2.059:

static assert(!(x){return *x;}("".ptr)); // error

The static assertion should pass.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 12, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8229


Don <clugdbug@yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clugdbug@yahoo.com.au


--- Comment #1 from Don <clugdbug@yahoo.com.au> 2012-06-12 09:48:41 PDT ---
This behaviour is intentional. Pointer operations are strictly checked in CTFE. It's the same as doing

int n = 0;
char c = ""[n];

which generates an array bounds error at runtime.

Is the terminating null character still in the spec? A long time ago it was in there, but now I can only find two references to it in the current spec (in 'arrays' and in 'interfacing to C'), and they both relate to printf.

The most detailed is in 'interface to C', which states:
"string literals, when they are not part of an initializer to a larger data
structure, have a '\0' character helpfully stored after the end of them."

which is pretty weird. These funky semantics would be difficult to implement in CTFE, and I doubt they are desirable. Here's an example:

const(char)[] foo(char[] s) { return "abc" ~ s; }

immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE

bool baz()
{
    immutable bar2 = foo("xyz"); // local variable, so isn't a string literal.

    return true;
}
static assert(baz());

---> bar is zero-terminated, bar2 is not, even though they had the same
assignment. When does this magical trailing zero get added?

I think you could reasonably interpret the spec as meaning that a trailing zero is added to the end of string literals by the linker, not by the compiler. It's only in CTFE that you can tell the difference.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 12, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8229



--- Comment #2 from timon.gehr@gmx.ch 2012-06-12 10:55:45 PDT ---
(In reply to comment #1)
> This behaviour is intentional. Pointer operations are strictly checked in CTFE. It's the same as doing
> 
> int n = 0;
> char c = ""[n];
> 
> which generates an array bounds error at runtime.
> 

I think that would be stretching it too far. It is more like:

auto s = ['\0'];
auto q = s[0..0];
char c = *q.ptr;

Which works fine at runtime and during CTFE.

> Is the terminating null character still in the spec? A long time ago it was in there, but now I can only find two references to it in the current spec (in 'arrays' and in 'interfacing to C'), and they both relate to printf.
> 
> The most detailed is in 'interface to C', which states:
> "string literals, when they are not part of an initializer to a larger data
> structure, have a '\0' character helpfully stored after the end of them."
> 
> which is pretty weird. These funky semantics would be difficult to implement in CTFE,

I guess this is from D1 times, when string literals were static arrays, and doesn't apply anymore.

> and I doubt they are desirable. Here's an example:
> 
> const(char)[] foo(char[] s) { return "abc" ~ s; }
> 
> immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE
> 

Well, this is not specified afaics.

> bool baz()
> {
>     immutable bar2 = foo("xyz"); // local variable, so isn't a string literal.
> 
>     return true;
> }
> static assert(baz());
> 
> ---> bar is zero-terminated, bar2 is not, even though they had the same
> assignment. When does this magical trailing zero get added?
> 

This is exactly the behavior that is observed at runtime. If it is undesirable, then that is a distinct issue that should be investigated.

It would certainly be desirable to have consistent behavior at compile time and at runtime, but this is not a top-priority issue.

> I think you could reasonably interpret the spec as meaning that a trailing zero is added to the end of string literals by the linker, not by the compiler. It's only in CTFE that you can tell the difference.

In this case, the spec should definitely be fixed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8229



--- Comment #3 from Don <clugdbug@yahoo.com.au> 2012-06-13 01:44:42 PDT ---
(In reply to comment #2)
> (In reply to comment #1)
> > This behaviour is intentional. Pointer operations are strictly checked in CTFE. It's the same as doing
> > 
> > int n = 0;
> > char c = ""[n];
> > 
> > which generates an array bounds error at runtime.
> > 
> 
> I think that would be stretching it too far. It is more like:
> 
> auto s = ['\0'];
> auto q = s[0..0];
> char c = *q.ptr;

That's an interesting interpretation. It can't be true for D1, where string literals are fixed length arrays, but it could work for D2.

In D1 it's more like:
struct S
{
  static char[3] s = ['a', 'b', 'c'];
  static char terminator = '\0';
}
And every mention of it in the spec dates from D1.

> > Is the terminating null character still in the spec? A long time ago it was in there, but now I can only find two references to it in the current spec (in 'arrays' and in 'interfacing to C'), and they both relate to printf.
> > 
> > The most detailed is in 'interface to C', which states:
> > "string literals, when they are not part of an initializer to a larger data
> > structure, have a '\0' character helpfully stored after the end of them."
> > 
> > which is pretty weird. These funky semantics would be difficult to implement in CTFE,
> 
> I guess this is from D1 times, when string literals were static arrays, and doesn't apply anymore.

Could be. So the few parts of the spec that mention it are horribly
out-of-date.
Though it also applies to assigning to fixed length arrays.

immutable(char)[3] s = "abc";
// Does this have a trailing zero?

> > and I doubt they are desirable. Here's an example:
> > 
> > const(char)[] foo(char[] s) { return "abc" ~ s; }
> > 
> > immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE
> > 
> 
> Well, this is not specified afaics.

Hmm, maybe it isn't. The spec says almost nothing about the whole thing. What I
do know is that there is a lot of existing code that relies on this behaviour
(especially, "abc" ~ "def" having a trailing zero).
Pretty much the only thing the spec says is that you can use string literals
with printf.

Does TDPL mention it?

The spec definitely needs to be improved.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 27, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=8229


Martin Nowak <code@dawg.eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |code@dawg.eu
           Severity|normal                      |major


--- Comment #4 from Martin Nowak <code@dawg.eu> 2013-09-27 15:58:28 PDT ---
---
string bug(string a)
{
    char[] buf;
    buf.length = a.length;
    buf[0 .. a.length] = a[];
    return cast(string)buf[];
}

static const var = bug("foo");
---

I have a much bigger problem related to this.
String literals resulting from CTFE are missing the terminating zero in the
data segment. Whether or not the bug bites depends on the object layout and the
virtual memory mapping, so this is pretty annoying because it works too often.
The underlying issue is that var is emitted to the object file from
ArrayLiteralExp::toDt which doesn't perform the zero termination.
Not sure if and at which stage this should be converted to a StringLiteralExp.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 28, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=8229



--- Comment #5 from Martin Nowak <code@dawg.eu> 2013-09-28 04:20:53 PDT ---
It is also a huge performance issue to use ArrayLiteralExp instead of StringLiteralExp during object emission because the compiler creates a list of 1-byte elements. If for example you generate a 5kB string in CTFE this induces a huge overhead.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------