Jump to page: 1 2 3
Thread overview
Is implicit string literal concatenation a good thing?
Feb 22, 2009
Frank Benoit
Feb 22, 2009
Brad Roberts
Feb 22, 2009
Christopher Wright
Feb 22, 2009
Denis Koroskin
Feb 22, 2009
Bill Baxter
Feb 22, 2009
Don
Feb 22, 2009
Bill Baxter
Feb 22, 2009
BCS
Feb 22, 2009
BCS
Feb 23, 2009
bearophile
Feb 23, 2009
Denis Koroskin
Feb 23, 2009
bearophile
Feb 23, 2009
Ellery Newcomer
Feb 23, 2009
BCS
Feb 26, 2009
Sergey Gromov
Feb 26, 2009
Denis Koroskin
Implicit string literal concatenation will die [Was: Is implicit string literal concatenation a good thing?]
Feb 22, 2009
bearophile
Feb 24, 2009
Kagamin
Feb 22, 2009
Sergey Gromov
Feb 25, 2009
Miles
Feb 25, 2009
Frank Benoit
February 22, 2009
Find the bug:
    static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
        "assert", "auto", "body", "bool", "break", "byte", "case",
        "cast", "catch", "cdouble", "cent", "cfloat", "char", "class",
        "const", "continue", "creal", "dchar", "debug", "default",
        "delegate", "delete", "deprecated", "do", "double", "else",
        "enum", "export", "extern", "false", "final", "finally",
        "float", "for", "foreach", "foreach_reverse", "function",
        "goto", "idouble", "if", "ifloat", "import", "in", "inout",
        "int", "interface", "invariant", "ireal", "is", "lazy", "long",
        "mixin", "module", "new", "null", "out", "override", "package",
        "pragma", "private", "private:", "protected", "protected:",
        "public", "public:", "real", "return", "scope", "short",
        "static", "struct", "super", "switch", "synchronized",
        "template", "this", "throw", "true", "try", "typedef", "typeid",
        "typeof", "ubyte", "ucent", "uint" "ulong", "union", "unittest",
        "ushort", "version", "void", "volatile", "wchar", "while",
        "with", "~this" ];

There is a comma missing : "uint" "ulong"
February 22, 2009
Frank Benoit wrote:
> Find the bug:
>     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
>         "assert", "auto", "body", "bool", "break", "byte", "case",
>         "cast", "catch", "cdouble", "cent", "cfloat", "char", "class",
>         "const", "continue", "creal", "dchar", "debug", "default",
>         "delegate", "delete", "deprecated", "do", "double", "else",
>         "enum", "export", "extern", "false", "final", "finally",
>         "float", "for", "foreach", "foreach_reverse", "function",
>         "goto", "idouble", "if", "ifloat", "import", "in", "inout",
>         "int", "interface", "invariant", "ireal", "is", "lazy", "long",
>         "mixin", "module", "new", "null", "out", "override", "package",
>         "pragma", "private", "private:", "protected", "protected:",
>         "public", "public:", "real", "return", "scope", "short",
>         "static", "struct", "super", "switch", "synchronized",
>         "template", "this", "throw", "true", "try", "typedef", "typeid",
>         "typeof", "ubyte", "ucent", "uint" "ulong", "union", "unittest",
>         "ushort", "version", "void", "volatile", "wchar", "while",
>         "with", "~this" ];
> 
> There is a comma missing : "uint" "ulong"

I have a personal style rule that says: if a list like that (be it function parameters, initializers, whatever) is more than one line, it's one element per line.  I hate having to visually parse things, or play the re-wrap game as the lists change.  I hadn't really thought about, until now, the side benefit of making it easier to spot missing trailing commas.

Back in c and c++, with it's pre-processor, merging adjacent string literals is very handy.  In D, it's only marginally so, but not completely useless.  It can still be used to break a really long string literal into parts.  There's other string boundary tokens in D which might well provide viable alternatives.

Just my two cents,
Brad

February 22, 2009
Frank Benoit Wrote:

>     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
>         "assert", "auto", "body", "bool", "break", "byte", "case",
> ...
>         "with", "~this" ];
> 
> There is a comma missing : "uint" "ulong"

In such situations I often let the language split my string for me, it reduces noise:

auto keywords = "abstract alias align asm
                 assert auto body bool break byte case
                 cast catch cdouble cent cfloat char class
                 const continue creal dchar debug default
                 delegate delete deprecated do double else
                 enum export extern false final finally
                 float for foreach foreach_reverse function
                 goto idouble if ifloat import in inout
                 int interface invariant ireal is lazy long
                 mixin module new null out override package
                 pragma private private: protected protected:
                 public public: real return scope short
                 static struct super switch synchronized
                 template this throw true try typedef typeid
                 typeof ubyte ucent uint ulong union unittest
                 ushort version void volatile wchar while
                 with ~this".split();

You can also put one keyword for each line, or put them in better formatted columns.

If the strings may have spaces too inside then, then I put each string in a different line, and then split according to the lines with std.string.splitlines() (or str.splitlines() in Python).

Implicit string literal concatenation is a bug-prone anti-feature that is a relic of C language that doesn't have a nice string concatenation syntax. In D (and Python, etc) it's bad.
Months ago I have suggested to remove it and turn adjacent string literals into a syntax error (to "solve" the back-compatibility with ported C/C++ code).

Brad Roberts:

>In D, it's only marginally so, but not completely useless.  It can still be used to break a really long string literal into parts.<

In such situations you can put a ~ at the end of each part. Explicit is better than implicit :-)

Bye,
bearophile
February 22, 2009
Brad Roberts wrote:
> Back in c and c++, with it's pre-processor, merging adjacent string
> literals is very handy.  In D, it's only marginally so, but not
> completely useless.  It can still be used to break a really long string
> literal into parts.  There's other string boundary tokens in D which
> might well provide viable alternatives.

In C and C++, there is no way to catenate strings at compile time. The only way to catenate strings is with strcat. That places the additional burden on programmers that they have to include string.h. For that reason, it makes sense to catenate adjacent string literals.

In D, there's a compile time catenation operator that doesn't require libraries. So the catenation by association saves you only one character. I'd say that's useless.
February 22, 2009
Sun, 22 Feb 2009 10:21:20 +0100, Frank Benoit wrote:

> Find the bug:
>     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
>         "assert", "auto", "body", "bool", "break", "byte", "case",
>         "cast", "catch", "cdouble", "cent", "cfloat", "char", "class",
>         "const", "continue", "creal", "dchar", "debug", "default",
>         "delegate", "delete", "deprecated", "do", "double", "else",
>         "enum", "export", "extern", "false", "final", "finally",
>         "float", "for", "foreach", "foreach_reverse", "function",
>         "goto", "idouble", "if", "ifloat", "import", "in", "inout",
>         "int", "interface", "invariant", "ireal", "is", "lazy", "long",
>         "mixin", "module", "new", "null", "out", "override", "package",
>         "pragma", "private", "private:", "protected", "protected:",
>         "public", "public:", "real", "return", "scope", "short",
>         "static", "struct", "super", "switch", "synchronized",
>         "template", "this", "throw", "true", "try", "typedef", "typeid",
>         "typeof", "ubyte", "ucent", "uint" "ulong", "union", "unittest",
>         "ushort", "version", "void", "volatile", "wchar", "while",
>         "with", "~this" ];
> 
> There is a comma missing : "uint" "ulong"

I agree this feature is dangerous and useless in D.
February 22, 2009
On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright <dhasenan@gmail.com> wrote:

> Brad Roberts wrote:
>> Back in c and c++, with it's pre-processor, merging adjacent string
>> literals is very handy.  In D, it's only marginally so, but not
>> completely useless.  It can still be used to break a really long string
>> literal into parts.  There's other string boundary tokens in D which
>> might well provide viable alternatives.
>
> In C and C++, there is no way to catenate strings at compile time. The only way to catenate strings is with strcat. That places the additional burden on programmers that they have to include string.h. For that reason, it makes sense to catenate adjacent string literals.
>
> In D, there's a compile time catenation operator that doesn't require libraries. So the catenation by association saves you only one character. I'd say that's useless.

I agree.

February 22, 2009
On Sun, Feb 22, 2009 at 11:12 PM, Denis Koroskin <2korden@gmail.com> wrote:
> On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright <dhasenan@gmail.com> wrote:
>
>> Brad Roberts wrote:
>>>
>>> Back in c and c++, with it's pre-processor, merging adjacent string literals is very handy.  In D, it's only marginally so, but not completely useless.  It can still be used to break a really long string literal into parts.  There's other string boundary tokens in D which might well provide viable alternatives.
>>
>> In C and C++, there is no way to catenate strings at compile time. The only way to catenate strings is with strcat. That places the additional burden on programmers that they have to include string.h. For that reason, it makes sense to catenate adjacent string literals.
>>
>> In D, there's a compile time catenation operator that doesn't require libraries. So the catenation by association saves you only one character. I'd say that's useless.
>
> I agree.

I use this feature pretty frequently to break up long strings. I think I didn't use ~ for that because it makes me think an allocation might happen when it doesn't need to.

But after seeing the discussion here I'd be happy to switch to using "a"~"b" as long as it's guaranteed by the language that such strings will be concatenated at compile time.   (I think the is the case now, right?)

--bb
February 22, 2009
On Sun, Feb 22, 2009 at 12:51 PM, Bill Baxter <wbaxter@gmail.com> wrote:
>
> I use this feature pretty frequently to break up long strings. I think I didn't use ~ for that because it makes me think an allocation might happen when it doesn't need to.
>
> But after seeing the discussion here I'd be happy to switch to using "a"~"b" as long as it's guaranteed by the language that such strings will be concatenated at compile time.   (I think the is the case now, right?)

Of course, it does it as a matter of constant folding, just like 3 + 4.
February 22, 2009
Bill Baxter wrote:
> On Sun, Feb 22, 2009 at 11:12 PM, Denis Koroskin <2korden@gmail.com> wrote:
>> On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright <dhasenan@gmail.com>
>> wrote:
>>
>>> Brad Roberts wrote:
>>>> Back in c and c++, with it's pre-processor, merging adjacent string
>>>> literals is very handy.  In D, it's only marginally so, but not
>>>> completely useless.  It can still be used to break a really long string
>>>> literal into parts.  There's other string boundary tokens in D which
>>>> might well provide viable alternatives.
>>> In C and C++, there is no way to catenate strings at compile time. The
>>> only way to catenate strings is with strcat. That places the additional
>>> burden on programmers that they have to include string.h. For that reason,
>>> it makes sense to catenate adjacent string literals.
>>>
>>> In D, there's a compile time catenation operator that doesn't require
>>> libraries. So the catenation by association saves you only one character.
>>> I'd say that's useless.
>> I agree.
> 
> I use this feature pretty frequently to break up long strings.
> I think I didn't use ~ for that because it makes me think an
> allocation might happen when it doesn't need to.
> 
> But after seeing the discussion here I'd be happy to switch to using
> "a"~"b" as long as it's guaranteed by the language that such strings
> will be concatenated at compile time.   (I think the is the case now,
> right?)

Yes, and because of CTFE, even complicated applications of ~ frequently don't involve any allocation. So your intuition was wrong! Implicit concatentation was probably one of the things which led to your false impression. So it may be bad in that respect, as well as bug-breeding.



> 
> --bb
February 22, 2009
On Mon, Feb 23, 2009 at 3:42 AM, Don <nospam@nospam.com> wrote:
> Bill Baxter wrote:
>>
>> On Sun, Feb 22, 2009 at 11:12 PM, Denis Koroskin <2korden@gmail.com> wrote:
>>>
>>> On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright
>>> <dhasenan@gmail.com>
>>> wrote:
>>>
>>>> Brad Roberts wrote:
>>>>>
>>>>> Back in c and c++, with it's pre-processor, merging adjacent string literals is very handy.  In D, it's only marginally so, but not completely useless.  It can still be used to break a really long string literal into parts.  There's other string boundary tokens in D which might well provide viable alternatives.
>>>>
>>>> In C and C++, there is no way to catenate strings at compile time. The
>>>> only way to catenate strings is with strcat. That places the additional
>>>> burden on programmers that they have to include string.h. For that
>>>> reason,
>>>> it makes sense to catenate adjacent string literals.
>>>>
>>>> In D, there's a compile time catenation operator that doesn't require
>>>> libraries. So the catenation by association saves you only one
>>>> character.
>>>> I'd say that's useless.
>>>
>>> I agree.
>>
>> I use this feature pretty frequently to break up long strings. I think I didn't use ~ for that because it makes me think an allocation might happen when it doesn't need to.
>>
>> But after seeing the discussion here I'd be happy to switch to using "a"~"b" as long as it's guaranteed by the language that such strings will be concatenated at compile time.   (I think the is the case now, right?)
>
> Yes, and because of CTFE, even complicated applications of ~ frequently don't involve any allocation. So your intuition was wrong! Implicit concatentation was probably one of the things which led to your false impression. So it may be bad in that respect, as well as bug-breeding.

Well, like I said, I vaguely recalled that DMD would eliminate the alloc.  But is it in the spec?  Some other compiler might not implement that optimization.  Or I might change from "foo"~"bar" to "foo"~runTimeVar at some point and not notice that I'd introduced an allocation because of that.  So the benefit of "foo" "bar" there was that I could be absolutely sure, since it's in the spec, that it concatenates the strings at compile time.

But I agree it's something that could be gotten rid of.

--bb
« First   ‹ Prev
1 2 3