Thread overview
Encoding of eol in multiline wysiwyg strings
Feb 17, 2009
KlausO
Feb 17, 2009
grauzone
February 17, 2009
Hello,

does the D specification specify how the "end of line" is encoded when you use wysiwyg strings. Currently it seems to be '\n' on windows
(And I guess it will '\n' on linux, too.).
Is this the intended behaviour ?
It's not a big issue but somtimes when you use wysiwyg strings, string concatenation and import expressions to combine some text the result is a string with mixed EOL encodings.
Thanks for clarifying,

KlausO
February 17, 2009
On Tue, Feb 17, 2009 at 4:41 AM, KlausO <oberhofer@users.sf.net> wrote:
> Hello,
>
> does the D specification specify how the "end of line" is encoded when you
> use wysiwyg strings. Currently it seems to be '\n' on windows
> (And I guess it will '\n' on linux, too.).
> Is this the intended behaviour ?

http://www.digitalmars.com/d/1.0/lex.html

"Wysiwyg Strings

Wysiwyg quoted strings are enclosed by r" and ". All characters between the r" and " are part of the string except for EndOfLine which is regarded as a single \n character."

> It's not a big issue but somtimes when you use wysiwyg strings, string
> concatenation and import expressions to combine some text the result is a
> string with mixed EOL encodings.
> Thanks for clarifying,

It's the import() expression that's messing things up.  It just loads the file verbatim and does no line-ending conversions.
February 17, 2009
Jarrett Billingsley wrote:
> On Tue, Feb 17, 2009 at 4:41 AM, KlausO <oberhofer@users.sf.net> wrote:
>> Hello,
>>
>> does the D specification specify how the "end of line" is encoded when you
>> use wysiwyg strings. Currently it seems to be '\n' on windows
>> (And I guess it will '\n' on linux, too.).
>> Is this the intended behaviour ?
> 
> http://www.digitalmars.com/d/1.0/lex.html
> 
> "Wysiwyg Strings
> 
> Wysiwyg quoted strings are enclosed by r" and ". All characters
> between the r" and " are part of the string except for EndOfLine which
> is regarded as a single \n character."
> 
>> It's not a big issue but somtimes when you use wysiwyg strings, string
>> concatenation and import expressions to combine some text the result is a
>> string with mixed EOL encodings.
>> Thanks for clarifying,
> 
> It's the import() expression that's messing things up.  It just loads
> the file verbatim and does no line-ending conversions.

But many people would like to use import() to read binary data.

I guess one could extend the language specification to solve this:

//load, convert line endings, check for valid UTF-8
char[] import_text(char[] filename);

//return unchanged file contents as byte array
ubyte[] import_binary(char[] filename);

On the other hand, both could be implemented as compile-time functions using the current import().
February 17, 2009
On Tue, Feb 17, 2009 at 10:02 AM, grauzone <none@example.net> wrote:
>
> But many people would like to use import() to read binary data.

Oh, I'm not saying import() is in the wrong here :) just that that's
where his mixed line endings are coming from.

> I guess one could extend the language specification to solve this:
>
> //load, convert line endings, check for valid UTF-8
> char[] import_text(char[] filename);
>
> //return unchanged file contents as byte array
> ubyte[] import_binary(char[] filename);
>
> On the other hand, both could be implemented as compile-time functions using
> the current import().

I suppose, as long as CTFE were made a bit more efficient.  Can you imagine doing line-end conversions on a 20k line text file at compile time?  The compiler would probably explode.