March 16, 2012
On Friday, 16 March 2012 at 15:41:32 UTC, Timon Gehr wrote:
> On 03/16/2012 03:28 PM, H. S. Teoh wrote:
>> More to the point, does dmd perform this optimization currently?
>>
>>
>> T
>>
>
> No.
>
> immutable string a = "123";
> immutable string b = a;
>
> void main(){writeln(a.ptr is b.ptr);} // "false"

It actually does, but only identical strings. It doesn't seem to do strings within strings.

void foo(string a){
	string b = "123";
	writeln(a is b);
}

void main(){
	string a = "123";
	string b = "456";
	string c = "123456";
	foo(a);
	foo(b);
	foo(c);
}

Prints:
true
false
false
March 16, 2012
On Friday, 16 March 2012 at 18:44:53 UTC, Xinok wrote:
> On Friday, 16 March 2012 at 15:41:32 UTC, Timon Gehr wrote:
>> On 03/16/2012 03:28 PM, H. S. Teoh wrote:
>>> More to the point, does dmd perform this optimization currently?
>>>
>>>
>>> T
>>>
>>
>> No.
>>
>> immutable string a = "123";
>> immutable string b = a;
>>
>> void main(){writeln(a.ptr is b.ptr);} // "false"
>
> It actually does, but only identical strings. It doesn't seem to do strings within strings.
>
> void foo(string a){
> 	string b = "123";
> 	writeln(a is b);
> }
>
> void main(){
> 	string a = "123";
> 	string b = "456";
> 	string c = "123456";
> 	foo(a);
> 	foo(b);
> 	foo(c);
> }
>
> Prints:
> true
> false
> false

Captain obvious to the rescue, 'is' is false if the strings are of different lengths >.<. But it still stands, D doesn't dedup strings within strings.

void main(){
	string a = "123";
	string b = "123456";
	writeln(a.ptr);
	writeln(b.ptr);
	writeln(a.ptr);
	writeln(b.ptr);
}

Prints:
44F080
44F090
44F080
44F090

I printed it twice to ensure it wasn't duping the strings.
March 16, 2012
On Friday, 16 March 2012 at 18:44:53 UTC, Xinok wrote:
> It actually does, but only identical strings. It doesn't seem to do strings within strings.

Don't forget that "123" is /not/ a substring of "123456"
because of the invisible 0 terminator (which is there
for easy compatibility with C functions).

March 16, 2012
On 03/16/2012 07:52 PM, Xinok wrote:
> On Friday, 16 March 2012 at 18:44:53 UTC, Xinok wrote:
>> On Friday, 16 March 2012 at 15:41:32 UTC, Timon Gehr wrote:
>>> On 03/16/2012 03:28 PM, H. S. Teoh wrote:
>>>> More to the point, does dmd perform this optimization currently?
>>>>
>>>>
>>>> T
>>>>
>>>
>>> No.
>>>
>>> immutable string a = "123";
>>> immutable string b = a;
>>>
>>> void main(){writeln(a.ptr is b.ptr);} // "false"
>>
>> It actually does, but only identical strings. It doesn't seem to do
>> strings within strings.
>>
>> void foo(string a){
>> string b = "123";
>> writeln(a is b);
>> }
>>
>> void main(){
>> string a = "123";
>> string b = "456";
>> string c = "123456";
>> foo(a);
>> foo(b);
>> foo(c);
>> }
>>
>> Prints:
>> true
>> false
>> false
>
> Captain obvious to the rescue, 'is' is false if the strings are of
> different lengths >.<. But it still stands, D doesn't dedup strings
> within strings.
>
> void main(){
> string a = "123";
> string b = "123456";
> writeln(a.ptr);
> writeln(b.ptr);
> writeln(a.ptr);
> writeln(b.ptr);
> }
>
> Prints:
> 44F080
> 44F090
> 44F080
> 44F090
>
> I printed it twice to ensure it wasn't duping the strings.

It can't because there must be a terminating zero byte. It does not do it even if it could though.


immutable string x = "123";
immutable string y = "123";

void foo(string a){
	string b = "123";
	writeln(a is b);
}

void main(){
	string a = "123";
	string b = "456";
	string c = "456123";
	foo(c[3..$]);    // false
	writeln(x is y); // false
	writeln(a is x); // false
	writeln(b is x); // false
	writeln(a is y); // false
	writeln(b is y); // false
	foo(a);          // true
	foo(b);          // false
}

March 16, 2012
On Friday, 16 March 2012 at 18:56:00 UTC, Timon Gehr wrote:
> It can't because there must be a terminating zero byte. It does not do it even if it could though.
>
>
> immutable string x = "123";
> immutable string y = "123";
>
> void foo(string a){
> 	string b = "123";
> 	writeln(a is b);
> }
>
> void main(){
> 	string a = "123";
> 	string b = "456";
> 	string c = "456123";
> 	foo(c[3..$]);    // false
> 	writeln(x is y); // false
> 	writeln(a is x); // false
> 	writeln(b is x); // false
> 	writeln(a is y); // false
> 	writeln(b is y); // false
> 	foo(a);          // true
> 	foo(b);          // false
> }

So while D does pool strings, it doesn't seem to optimize globals. I couldn't find anything about it on the bug tracker.
March 17, 2012
On 16/03/12 13:24, Kevin Cox wrote:
>
> On Mar 16, 2012 7:45 AM, "Alex Rønne Petersen" <xtzgzorex@gmail.com
> <mailto:xtzgzorex@gmail.com>> wrote
>  >
>  > I don't see any reason why c couldn't point to element number 3 of b,
> and have its length set to 3...
>  >
>  > --
>  > - Alex
>
> And the previous examples were language agnostic.  In D and other
> languages where the length of a string is stored we can nest strings
> anywhere inside other strings.
>
> const char[] a = "foofoo";
> const char[] b = "oof";
>
> Those can't be nested in null terminated strings, bit they can where
> strings have an explicit length.
>

Unfortunately string literals in D have an implicit \0 added beyond the end, so we don't have much more freedom than C.
March 18, 2012
On Friday, 16 March 2012 at 11:41:59 UTC, Alex Rønne Petersen wrote:
> On 16-03-2012 12:32, Peter Alexander wrote:
>> On Friday, 16 March 2012 at 02:31:47 UTC, Xinok wrote:
>>> On Friday, 16 March 2012 at 02:18:27 UTC, Kevin wrote:
>>>> This is in no way D specific but say you have two constant strings.
>>>>
>>>> const char[] a = "1234567890";
>>>> // and
>>>> const char[] b = "67890";
>>>>
>>>> You could lay out the memory inside of one another. IE: if a.ptr = 1
>>>> then b.ptr = 6. I'm not sure if this has been done and I don't think
>>>> it would apply very often but it would be kinda cool.
>>>>
>>>> I thought of this because I wanted to pre-generate
>>>> hex-representations of some numbers I realized I could use half the
>>>> memory if I nested them. (At least I think it would be half).
>>>>
>>>> Kevin.
>>>
>>> I'm pretty sure this is called string pooling.
>>
>> My understanding is that string pooling just shares whole strings rather
>> than combining suffixes.
>>
>> e.g.
>> const char[] a = "fubar";
>> const char[] b = "fubar"; // shared
>> const char[] c = "bar"; // not shared at all
>>
>> Combining suffixes is obviously possible, but I'm not sure that string
>> pooling implies suffix pooling.
>
> I don't see any reason why c couldn't point to element number 3 of b, and have its length set to 3...

Neither do I, but it's more work for the compiler, and even if the compiler does string pooling, it may not look for common suffixes.
March 18, 2012
On Mar 18, 2012 4:50 PM, "Peter Alexander" <peter.alexander.au@gmail.com> wrote:

> Neither do I, but it's more work for the compiler, and even if the
compiler does string pooling, it may not look for common suffixes.

It would be more work but it would have memory and cache benefits.  If you stored created a set of strings ordered lexographicaly by their reverse it would not be that much overhead.


March 19, 2012
On Thu, 15 Mar 2012 22:16:18 -0400, Kevin <kevincox.ca@gmail.com> wrote:

> This is in no way D specific but say you have two constant strings.
>
> const char[] a = "1234567890";
> // and
> const char[] b = "67890";
>
> You could lay out the memory inside of one another. IE: if a.ptr = 1 then b.ptr = 6.  I'm not sure if this has been done and I don't think it would apply very often but it would be kinda cool.
>
> I thought of this because I wanted to pre-generate hex-representations of some numbers I realized I could use half the memory if I nested them. (At least I think it would be half).

I have done this manually in the past.  In an application that ran on a 8-bit micro with 256 bytes of RAM and 4K code space, I ran out of space and was able to save quite a bit by making one large string array with all the data, and then use pointer/length combinations out of that string array.

It seems like the compiler could do some work, but what about CTFE?  I think this would be a cool project.  But I'm not sure if CTFE can save state for later...

string poolString(string s)
{
   // look for s in existing pool, if found, return
   // otherwise, add to pool.
}

-Steve
March 19, 2012
On Fri, 16 Mar 2012 13:16:18 +1100, Kevin <kevincox.ca@gmail.com> wrote:

> This is in no way D specific but say you have two constant strings.
>
> const char[] a = "1234567890";
> // and
> const char[] b = "67890";
>
> You could lay out the memory inside of one another. IE: if a.ptr = 1 then b.ptr = 6.  I'm not sure if this has been done and I don't think it would apply very often but it would be kinda cool.
>
> I thought of this because I wanted to pre-generate hex-representations of some numbers I realized I could use half the memory if I nested them. (At least I think it would be half).

Is the effort to do this really an issue with today's vast amounts of RAM (virtual and real) available? How much memory are you expecting to 'save'?

And is RAM address alignment an issue here also? Currently most literals are aligned on a 4 or 8-byte boundary but with this sort of pooling, some literals will not be so aligned any more. That might not be an issue but I'm just curious.

-- 
Derek Parnell
Melbourne, Australia