View mode: basic / threaded / horizontal-split · Log in · Help
July 07, 2008
[Issue 2201] New: Unescaped carriage return ('\r') in string is changed into different EOL.
http://d.puremagic.com/issues/show_bug.cgi?id=2201

          Summary: Unescaped carriage return ('\r') in string is changed
                   into different EOL.
          Product: D
          Version: 1.029
         Platform: PC
       OS/Version: Windows
           Status: NEW
         Severity: normal
         Priority: P2
        Component: DMD
       AssignedTo: bugzilla@digitalmars.com
       ReportedBy: business3@twistedpairgaming.com


The following code demonstrates that a carriage return ("\r") embedded in a
string (at least if it's done with a string mixin) is not retained as a
carriage return ("\r") and is turned into something else (a CRLF ("\r\n") on
Windows, I haven't tested on Unix).

BEGIN FILE testEmbeddedChars.d
module testEmbeddedChars;
import tango.io.Stdout;

char[] makeStrReturnFunc(char[] name, char[] str)
{
       return 
       "char[] "~name~"()"
       "{ return r\""~str~"\"; }";
}

mixin(makeStrReturnFunc("lineFeed",       "\n"));
mixin(makeStrReturnFunc("carriageReturn", "\r"));
mixin(makeStrReturnFunc("tab",            "\t"));
mixin(makeStrReturnFunc("vtab",           "\v"));
mixin(makeStrReturnFunc("formFeed",       "\f"));

void main()
{
       if(lineFeed() != "\n"c)
               Stdout.formatln("lineFeed() is incorrect");

       if(carriageReturn() != "\r"c)
               Stdout.formatln("carriageReturn() is incorrect");

       if(carriageReturn() != "\n"c)
               Stdout.formatln("carriageReturn() is unix EOL");

       if(carriageReturn() != "\r\n"c)
               Stdout.formatln("carriageReturn() is win EOL");

       if(tab() != "\t"c)
               Stdout.formatln("tab() is incorrect");

       if(vtab() != "\v"c)
               Stdout.formatln("vtab() is incorrect");

       if(formFeed() != "\f"c)
               Stdout.formatln("formFeed() is incorrect");
}
END FILE

On Windows, with DMD 1.029, the output is:

BEGIN OUTPUT
carriageReturn() is incorrect
carriageReturn() is win EOL
END OUTPUT

This causes a problem with a utility function I have:

BEGIN CODE
char[] multiTypeString(char[] name, char[] data, char[] access="public")
{
       return 
       access~" T[] "~name~"(T)()"~
       "{"~
       "            static if(is(T ==  char)) { return \""~data~"\"c; }"~
       "       else static if(is(T == wchar)) { return \""~data~"\"w; }"~
       "       else static if(is(T == dchar)) { return \""~data~"\"d; }"~
       "       else static assert(\"T must be char, wchar, or dchar\");"~
       "}";
}

//Sample uses:
mixin(multiTypeString("whitespaceChars", r" \n\r\t\v\f"));
mixin(multiTypeString("winEOL", r"\r\n"));
mixin(multiTypeString("digitEscSeqForAMadeupCustomRegex", r"\\d"));
END CODE

The problem with the above (though it works) is that it requires its user to
doubly-escape the data parameter. This could theoretically be solved with a
CTFE "char[] makeEscaped(char[])", but something like that is bit too complex
for the current CTFE engine to handle. So I'd like to solve it by making the
generated function return a WYSIWYG string of the 'data' parameter (probably a
cleaner solution anyway - and faster to compiler), but this carriage return bug
gets in the way.


--
July 07, 2008
[Issue 2201] Unescaped carriage return ('\r') in string is changed into different EOL.
http://d.puremagic.com/issues/show_bug.cgi?id=2201





------- Comment #1 from shro8822@vandals.uidaho.edu  2008-07-07 15:59 -------

I'm not sure this is valid. IIRC a "real" line end in a string:

"hello
world"

is converted to a system specific EOL. If the mixin is converting the \r before
the function is parsed then this is working correctly. However, I think this
may be undesirable in that case but I'm not sure how to best fix it.


--
July 08, 2008
[Issue 2201] Unescaped carriage return ('\r') in string is changed into different EOL.
http://d.puremagic.com/issues/show_bug.cgi?id=2201





------- Comment #2 from business3@twistedpairgaming.com  2008-07-07 20:58 -------
(In reply to comment #1)

Ok, I just looked it up. According to the documentation
(http://www.digitalmars.com/d/1.0/lex.html ) (The ver 2.0 docs say the same
thing too):

"All characters between the r" and " are part of the string except for
EndOfLine which is regarded as a single \n character."

And EndOfLine is defined as such:

EndOfLine:
       \u000D
       \u000A
       \u000D \u000A
       EndOfFile

So, strictly speaking, even the current behavior is still wrong according to
the docs (According to the docs, "\n", "\r", "\r\n" and EOF should all turn
into "\n").

Although I'm still not convinced that "\n", "\r", and "\r\n" shouldn't just be
left as-is.


--
July 08, 2008
[Issue 2201] Unescaped carriage return ('\r') in string is changed into different EOL.
http://d.puremagic.com/issues/show_bug.cgi?id=2201





------- Comment #3 from business3@twistedpairgaming.com  2008-07-07 21:10 -------
(In reply to comment #2)

After thinking about this some more, it does make sense that the embedded
characters should be turned into system-specific EOLs. But that does mean my
technique is flawed and shouldn't be able to work anyway. Only solution I can
think of is that maybe the compiler somehow remembers that the string literal
was put there by a string mixin and therefore should keep line-endings as-is.
But that seems kind of messy and might still cause problems for other
cross-platform scenarios. I guess I'll just hope for CTFE's to progress to the
point where they could reliably do an "escapeString()" (Maybe there's some way
they currently can, but I've tried using some "replace()" stuff in a CTFE and
the compiler threw an out of memory exception, so it doesn't seem to be
doable).

In any case, this is still illustrates a discrepancy between the documentation
and actual behavior (at least when it occurs in a string mixin), so I'll leave
this open.


--
July 08, 2008
[Issue 2201] Doc/Behavior Discrepancy: EndOfLine in string turns to "\n" or system-specific?
http://d.puremagic.com/issues/show_bug.cgi?id=2201





------- Comment #4 from shro8822@vandals.uidaho.edu  2008-07-08 11:34 -------
(In reply to comment #2)
> (In reply to comment #1)
> 
> Ok, I just looked it up. According to the documentation
> 
> "All characters between the r" and " are part of the string except for
> EndOfLine which is regarded as a single \n character."
> 

I think theres some verbiage to the effect of the current behavior (but applied
to the language text as a whole) Copy/Paste that in there or maybe even just
drop the above quoted all together and I think this can be closed.

(In reply to comment #3)
> Only solution I can
> think of is that maybe the compiler somehow remembers that the string literal
> was put there by a string mixin and therefore should keep line-endings as-is.

That's as good as anything I have though of.


Also, (untested) I think this edit will make it work:

mixin(makeStrReturnFunc("carriageReturn", "\r"));
mixin(makeStrReturnFunc("carriageReturn", r"\r"));

you might need to drop the r in the mixin maker as well.


--
July 08, 2008
[Issue 2201] Doc/Behavior Discrepancy: EndOfLine in string turns to "\n" or system-specific?
http://d.puremagic.com/issues/show_bug.cgi?id=2201





------- Comment #5 from business3@twistedpairgaming.com  2008-07-08 12:22 -------
(In reply to comment #4)
> I think theres some verbiage to the effect of the current behavior (but applied
> to the language text as a whole) Copy/Paste that in there or maybe even just
> drop the above quoted all together and I think this can be closed.

I don't see anything like that. You might be thinking of the definition of
EndOfLine (quoted above).

> Also, (untested) I think this edit will make it work:
> 
> mixin(makeStrReturnFunc("carriageReturn", "\r"));
> mixin(makeStrReturnFunc("carriageReturn", r"\r"));
> 
> you might need to drop the r in the mixin maker as well.

That changes the content of the generated function from:

return r"
"; // <- Intended CR inside, not CRLF

To:

return "\r";

That does work, but:

1. It's not a demonstration of an embedded CR (so it doesn't solve the
doc/behavior discrepancy).

2. It causes the caller of makeStrReturnFunc to double-escape everything (r"\r"
and "\\r" suddenly mean CR instead of meaning ['\\', 'r'], and if ['\\', 'r']
is desired, you need r"\\r" or "\\\\r"), which is what I was trying to prevent
by making the generated function return ...r"~data~"... instead of
..."~data~".... But like I've said before, that could be solved when/if
escapeString(strToBeEscaped) is doable at compile-time (ie, make it return
..."~escapeString(data)~"... instead of ...r"~data~"...).


--
July 08, 2008
[Issue 2201] Doc/Behavior Discrepancy: EndOfLine in string turns to "\n" or system-specific?
http://d.puremagic.com/issues/show_bug.cgi?id=2201





------- Comment #6 from shro8822@vandals.uidaho.edu  2008-07-08 12:30 -------

I guess it's not in the spec. However I did run into this about a year or two
ago and Walter said in the NG that converting EOL's to the system default on
the way to the lexer is correct inside and outside of quotes (he /might/ have
said that converting to \n is correct but I don't think so)


--
January 22, 2012
[Issue 2201] Doc/Behavior Discrepancy: EndOfLine in string turns to "\n" or system-specific?
http://d.puremagic.com/issues/show_bug.cgi?id=2201


Walter Bright <bugzilla@digitalmars.com> changed:

          What    |Removed                     |Added
----------------------------------------------------------------------------
            Status|NEW                         |RESOLVED
                CC|                            |bugzilla@digitalmars.com
        Resolution|                            |WORKSFORME


--- Comment #7 from Walter Bright <bugzilla@digitalmars.com> 2012-01-21 21:36:18 PST ---
All string literals have EndOfLine converted to a single '\n' character
according to the spec, and I believe the implementation matches it.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 22, 2012
[Issue 2201] Doc/Behavior Discrepancy: EndOfLine in string turns to "\n" or system-specific?
http://d.puremagic.com/issues/show_bug.cgi?id=2201


Rainer Schuetze <r.sagitario@gmx.de> changed:

          What    |Removed                     |Added
----------------------------------------------------------------------------
                CC|                            |r.sagitario@gmx.de
           Version|1.029                       |D1 & D2


--- Comment #8 from Rainer Schuetze <r.sagitario@gmx.de> 2012-01-22 04:15:34 PST ---
Sorry, but this is not true:


const string s = q{a
b};
static assert(s.length == 3);

void main()
{
   assert(s.length == 3);
}

asserts both at compile time and runtime when saving with CR+LF, passes with LF
only

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 22, 2012
[Issue 2201] Doc/Behavior Discrepancy: EndOfLine in string turns to "\n" or system-specific?
http://d.puremagic.com/issues/show_bug.cgi?id=2201


Walter Bright <bugzilla@digitalmars.com> changed:

          What    |Removed                     |Added
----------------------------------------------------------------------------
          Keywords|spec                        |
            Status|RESOLVED                    |REOPENED
        Resolution|WORKSFORME                  |


--- Comment #9 from Walter Bright <bugzilla@digitalmars.com> 2012-01-22 10:22:22 PST ---
(In reply to comment #8)
> Sorry, but this is not true:

Please reopen bugs that turn out to not be fixed and need further
investigation, otherwise they may get overlooked. I'll reopen this one. Also
marking it as not a spec issue.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home