Thread overview
[Issue 786] New: the \ EndOfFile EscapeSequence in double-quoted strings doesn't work
Jan 03, 2007
d-bugmail
Jan 03, 2007
d-bugmail
Jan 06, 2007
d-bugmail
Feb 03, 2007
d-bugmail
Feb 03, 2007
d-bugmail
Feb 03, 2007
d-bugmail
Feb 03, 2007
d-bugmail
Feb 03, 2007
d-bugmail
Feb 04, 2007
d-bugmail
January 03, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=786

           Summary: the \ EndOfFile EscapeSequence in double-quoted strings
                    doesn't work
           Product: D
           Version: 0.178
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Keywords: rejects-valid, spec
          Severity: normal
          Priority: P3
         Component: DMD
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: thecybershadow@gmail.com


Spec non-conformacy, I believe.

Spec: http://www.digitalmars.com/d/lex.html#StringLiteral

Program:

void main()
{
  char[] eof_literal = "\";  // the character after the backslash is \u001A,
as per the specs
}

Compiler output:

C:\...>dmd lexical.d
lexical.d(3): unterminated string constant starting at lexical.d(3)
lexical.d(3): semicolon expected, not 'EOF'
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement
lexical.d(3): found 'EOF' instead of statement

(that's 19 repeating lines)


-- 

January 03, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=786


smjg@iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg@iname.com




------- Comment #1 from smjg@iname.com  2007-01-03 04:01 -------
"End of File

EndOfFile:
        physical end of the file
        \u0000
        \u001A
"

AIUI, locating the end of the code conceptually happens before tokenization. But indeed, the spec isn't crystal clear on this.


-- 

January 06, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=786





------- Comment #2 from thomas-dloop@kuehne.cn  2007-01-06 15:46 -------
Intermingling eof detection with tokenisation would cause quite a bit of changes within DMD and makes no sense to me as it would allow to read past the physical end of the file.


-- 

February 03, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=786


bugzilla@digitalmars.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID




------- Comment #3 from bugzilla@digitalmars.com  2007-02-02 21:34 -------
0x1A is listed in lex.html as 'end of file', which trumps any token, I think the spec is reasonably clear on this: "The source text is terminated by whichever comes first." The reason for this is that some (old) text editors put out a 0x1A to mark end of file.

Not a bug.


-- 

February 03, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=786





------- Comment #4 from thecybershadow@gmail.com  2007-02-02 21:37 -------
In that case, why is "\ EndOfFile" listed as a valid EscapeSequence token?


-- 

February 03, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=786





------- Comment #5 from bugzilla@digitalmars.com  2007-02-02 23:19 -------
If a \ is the last character in a file, the escape sequence will resolve to the \ character, that's what that is for.


-- 

February 03, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=786





------- Comment #6 from smjg@iname.com  2007-02-03 08:10 -------
But a StringLiteral can never be the last token of a syntactically valid D source file, or can it?


-- 

February 03, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=786





------- Comment #7 from bugzilla@digitalmars.com  2007-02-03 12:13 -------
Currently, no, it can't, hence the error message about semicolon expected instead of EOF. But the lexer doesn't (and shouldn't) know syntax, it just knows tokens.


-- 

February 04, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=786





------- Comment #8 from smjg@iname.com  2007-02-04 07:00 -------
Exactly.  So really,

    EscapeSequence: \ EndOfFile

has no effect except perhaps on what error message the compiler throws.

Moreover, UIMS the spec gives no meaning to this EscapeSequence form.  Which is probably why we're all asking.


--