Thread overview
== comparison of string literals, and their usage
Apr 05, 2019
diniz
Apr 06, 2019
AltFunction1
Apr 06, 2019
diniz
Apr 06, 2019
lithium iodate
Apr 07, 2019
diniz
Apr 07, 2019
bauss
Apr 07, 2019
diniz
April 05, 2019
Hello,

Since literal strings are interned (and immutable), can I count on the fact that they are compared (==) by pointer?

Context: The use case is a custom lexer for a custom language. I initially wanted to represent lexeme classes by a big enum 'LexClass'. However, this makes me write 3 times all constant lexemes (keywords and keysigns):
1- in the enum of lexeme classes
2- in an array of constants (for the contant-scanning func)
3- in an associative array mapping constants to their classes
However, if literal strings are compared by equality, then they are kinds of Scheme or Ruby symbols: read enum values representing *cases*, which is exactly what I need. I would thus use the constants' strings themselves as lexeme classes... the parser would not be slown down.

What do you think?
-- 
diniz {la vita e estranj}
April 06, 2019
On Friday, 5 April 2019 at 14:49:50 UTC, diniz wrote:
> Hello,
>
> Since literal strings are interned (and immutable), can I count on the fact that they are compared (==) by pointer?

No. "==" performs a full array comparison and "is" is apparently simplified at compile time. In the compiler there's no notion of string literal as a special expression. It's always a StringExp. See https://d.godbolt.org/z/K5R6u6.

However you're right to say that literal are not duplicated.


April 06, 2019
Le 06/04/2019 à 16:07, AltFunction1 via Digitalmars-d-learn a écrit :
> On Friday, 5 April 2019 at 14:49:50 UTC, diniz wrote:
>> Hello,
>>
>> Since literal strings are interned (and immutable), can I count on the fact that they are compared (==) by pointer?
> 
> No. "==" performs a full array comparison and "is" is apparently simplified at compile time. In the compiler there's no notion of string literal as a special expression. It's always a StringExp. See https://d.godbolt.org/z/K5R6u6.
> 
> However you're right to say that literal are not duplicated.

Thank you very much.

So, I still could store and use and compare string pointers myself [1], and get valid results, meaning: pointer equality implies (literal) string equality. Or am I wrong? The point is, the parser, operating on an array of prescanned lexemes,  will constantly check whether a valid lexeme is present simply by checking the lexeme "class". I don't want that to be a real string comp, too expesensive and for no gain.

[1] As in the second comp of your example:
void main()
{
    auto c2 =  "one" == "two";
    auto c1 =  "one".ptr is "two".ptr;
}
-- 
diniz {la vita e estranj}
April 06, 2019
On Saturday, 6 April 2019 at 15:35:22 UTC, diniz wrote:
> So, I still could store and use and compare string pointers myself [1], and get valid results, meaning: pointer equality implies (literal) string equality. Or am I wrong? The point is, the parser, operating on an array of prescanned lexemes,  will constantly check whether a valid lexeme is present simply by checking the lexeme "class". I don't want that to be a real string comp, too expesensive and for no gain.
>
> [1] As in the second comp of your example:
> void main()
> {
>     auto c2 =  "one" == "two";
>     auto c1 =  "one".ptr is "two".ptr;
> }

Not quite. D-strings strictly consist of pointer *and* length, so you need to compare the .length properties as well to correctly conclude that the strings equal. You can concisely do that in one go by simply `is` comparing the array references as in

string a = "hello";
string b = a;
assert(a is b);
assert(a[] is b[]);

Of course, if the strings are never sliced, you can just compare the pointers and be done, just make sure to document how it operates. Depending on the circumstances I'd throw in some asserts that do actual strings comparison to verify the program logic.
April 07, 2019
Le 06/04/2019 à 21:47, lithium iodate via Digitalmars-d-learn a écrit :
> On Saturday, 6 April 2019 at 15:35:22 UTC, diniz wrote:
>> So, I still could store and use and compare string pointers myself [1], and get valid results, meaning: pointer equality implies (literal) string equality. Or am I wrong? The point is, the parser, operating on an array of prescanned lexemes,  will constantly check whether a valid lexeme is present simply by checking the lexeme "class". I don't want that to be a real string comp, too expesensive and for no gain.
>>
>> [1] As in the second comp of your example:
>> void main()
>> {
>>     auto c2 =  "one" == "two";
>>     auto c1 =  "one".ptr is "two".ptr;
>> }
> 
> Not quite. D-strings strictly consist of pointer *and* length, so you need to compare the .length properties as well to correctly conclude that the strings equal. You can concisely do that in one go by simply `is` comparing the array references as in
> 
> string a = "hello";
> string b = a;
> assert(a is b);
> assert(a[] is b[]);
> 
> Of course, if the strings are never sliced, you can just compare the pointers and be done, just make sure to document how it operates. Depending on the circumstances I'd throw in some asserts that do actual strings comparison to verify the program logic.

Thank you very much! And yes, properly documenting is also important to me.
-- 
diniz {la vita e estranj}
April 07, 2019
On Saturday, 6 April 2019 at 19:47:14 UTC, lithium iodate wrote:
> On Saturday, 6 April 2019 at 15:35:22 UTC, diniz wrote:
>> So, I still could store and use and compare string pointers myself [1], and get valid results, meaning: pointer equality implies (literal) string equality. Or am I wrong? The point is, the parser, operating on an array of prescanned lexemes,  will constantly check whether a valid lexeme is present simply by checking the lexeme "class". I don't want that to be a real string comp, too expesensive and for no gain.
>>
>> [1] As in the second comp of your example:
>> void main()
>> {
>>     auto c2 =  "one" == "two";
>>     auto c1 =  "one".ptr is "two".ptr;
>> }
>
> Not quite. D-strings strictly consist of pointer *and* length, so you need to compare the .length properties as well to correctly conclude that the strings equal. You can concisely do that in one go by simply `is` comparing the array references as in
>
> string a = "hello";
> string b = a;
> assert(a is b);
> assert(a[] is b[]);
>
> Of course, if the strings are never sliced, you can just compare the pointers and be done, just make sure to document how it operates. Depending on the circumstances I'd throw in some asserts that do actual strings comparison to verify the program logic.

To add onto this.

Here is an example why it's important to compare the length as well:

    string a = "hello";
    string b = a[0 .. 3];

    assert(a.ptr == b.ptr);
    assert(a.length != b.length);
April 07, 2019
Le 07/04/2019 à 14:23, bauss via Digitalmars-d-learn a écrit :
> On Saturday, 6 April 2019 at 19:47:14 UTC, lithium iodate wrote:
>> On Saturday, 6 April 2019 at 15:35:22 UTC, diniz wrote:
>>> So, I still could store and use and compare string pointers myself [1], and get valid results, meaning: pointer equality implies (literal) string equality. Or am I wrong? The point is, the parser, operating on an array of prescanned lexemes, will constantly check whether a valid lexeme is present simply by checking the lexeme "class". I don't want that to be a real string comp, too expesensive and for no gain.
>>>
>>> [1] As in the second comp of your example:
>>> void main()
>>> {
>>>     auto c2 =  "one" == "two";
>>>     auto c1 =  "one".ptr is "two".ptr;
>>> }
>>
>> Not quite. D-strings strictly consist of pointer *and* length, so you need to compare the .length properties as well to correctly conclude that the strings equal. You can concisely do that in one go by simply `is` comparing the array references as in
>>
>> string a = "hello";
>> string b = a;
>> assert(a is b);
>> assert(a[] is b[]);
>>
>> Of course, if the strings are never sliced, you can just compare the pointers and be done, just make sure to document how it operates. Depending on the circumstances I'd throw in some asserts that do actual strings comparison to verify the program logic.
> 
> To add onto this.
> 
> Here is an example why it's important to compare the length as well:
> 
>      string a = "hello";
>      string b = a[0 .. 3];
> 
>      assert(a.ptr == b.ptr);
>      assert(a.length != b.length);


Thank you! Very clear :-).

-- 
diniz {la vita e estranj}