September 20, 2014
On Friday, 19 September 2014 at 15:09:41 UTC, Andrei Alexandrescu wrote:
> It does affect management, i.e. you don't know when to free the buffer if slices are unaccounted for. So the design of slices are affected as much as that of the buffer.

I see where you are going at. A bit hard to imagine how it fits the big picture when going bottom-up though but I trust you on this :)

>> I agree that pipeline approach does not work that well for complex
>> programs in general but strings seem to be best match to it - either you
>> want read-only access or a pipe-line, everything else feels inefficient
>> as amount of write operations gets out of control. Every single attempt
>> to do something clever with shared CoW strings in C++ I have met was a
>> total failure.
>
> What were the issues?

Usually it went that way:

1) Get basic implementation, become shocked how slow it is because of redundant reference increments/decrements and thread safety
2) Add speed-up hacks to avoid reference count amending when considered unnecessary
3) Get hit by a snowball of synchronization / double-free issues and abandon the idea completely after months of debugging.

Of course those weren't teams of rock-star programmers but at the same time more "stupid" approach with making extra copies and putting extra effort into defining strict linear ownership chain seemed to work much better.

>> That is why I wonder - what kind of applications really need the
>> rcstring as opposed to some generic rcarray?
>
> I started with rcstring because (a) it's easier to lift off the ground - no worries about construction/destruction of elements etc. and (b) it's frequent enough to warrant some good testing. Of course there'll be an rcarray!T as well.


Thanks for explanation :) Well, I am curious how will it turn out but a bit skeptical right now.
September 20, 2014
On Monday, 15 September 2014 at 02:26:19 UTC, Andrei Alexandrescu wrote:
> Andrei

I'm testing your RCstring right now in my code to see how much memory it will save and speed it will gain. I want to use RCString in place of string as a key in my AAs. Any proposals for a suitable implementation of

    size_t toHash() @trusted pure nothrow

for RCString? I'm guessing there are two cases here; one for the SSO-case an one for the other. The other should be similar to

    size_t toHash(string) @trusted pure nothrow

right?
September 20, 2014
On 9/20/14, 12:42 AM, Dicebot wrote:
> On Friday, 19 September 2014 at 15:09:41 UTC, Andrei Alexandrescu wrote:
>>> as amount of write operations gets out of control. Every single attempt
>>> to do something clever with shared CoW strings in C++ I have met was a
>>> total failure.
>>
>> What were the issues?
>
> Usually it went that way:
>
> 1) Get basic implementation, become shocked how slow it is because of
> redundant reference increments/decrements and thread safety
> 2) Add speed-up hacks to avoid reference count amending when considered
> unnecessary
> 3) Get hit by a snowball of synchronization / double-free issues and
> abandon the idea completely after months of debugging.

I understand. RC strings will work just fine. Compared to interlocked approaches we're looking at a 5x improvement in RC speed for the most part because we can dispense with most interlocking. -- Andrei

September 20, 2014
On Saturday, 20 September 2014 at 15:21:18 UTC, Nordlöw wrote:
> for RCString? I'm guessing there are two cases here;

I'm guessing

    size_t toHash() const @trusted pure nothrow
    {
        import core.internal.hash : hashOf;
        if (isSmall)
        {
            return this.small.hashOf;
        }
        else
        {
            return this.large[].hashOf;
        }
    }

Will

    this.large[].hashOf

do unneccessary GC-allocations? -vgc says nothing.

I'm compiling as

    dmd -vcolumns -debug -g -gs -vgc -unittest -wi -main rcstring.d -o rcstring.out
September 20, 2014
On 9/20/14, 8:21 AM, "Nordlöw" wrote:
> On Monday, 15 September 2014 at 02:26:19 UTC, Andrei Alexandrescu wrote:
>> Andrei
>
> I'm testing your RCstring right now in my code to see how much memory it
> will save and speed it will gain.

Thanks!

> I want to use RCString in place of
> string as a key in my AAs. Any proposals for a suitable implementation of
>
>      size_t toHash() @trusted pure nothrow
>
> for RCString? I'm guessing there are two cases here; one for the
> SSO-case an one for the other. The other should be similar to
>
>      size_t toHash(string) @trusted pure nothrow
>
> right?

Yah, that's the one.


Andrei

September 20, 2014
On 9/20/14, 8:54 AM, "Nordlöw" wrote:
> On Saturday, 20 September 2014 at 15:21:18 UTC, Nordlöw wrote:
>> for RCString? I'm guessing there are two cases here;
>
> I'm guessing
>
>      size_t toHash() const @trusted pure nothrow
>      {
>          import core.internal.hash : hashOf;
>          if (isSmall)
>          {
>              return this.small.hashOf;
>          }
>          else
>          {
>              return this.large[].hashOf;
>          }
>      }

Why not just "return this.asSlice.hashOf;"?

> Will
>
>      this.large[].hashOf
>
> do unneccessary GC-allocations? -vgc says nothing.

No.


Andrei

September 20, 2014
On 9/20/14, 8:54 AM, "Nordlöw" wrote:
> On Saturday, 20 September 2014 at 15:21:18 UTC, Nordlöw wrote:
>> for RCString? I'm guessing there are two cases here;
>
> I'm guessing
>
>      size_t toHash() const @trusted pure nothrow
>      {
>          import core.internal.hash : hashOf;
>          if (isSmall)
>          {
>              return this.small.hashOf;

Oh in fact this.small.hashOf is incorrect anyway because it hashes random characters after the used portion of the string. -- Andrei

September 20, 2014
On Saturday, 20 September 2014 at 15:30:55 UTC, Andrei Alexandrescu wrote:
> I understand. RC strings will work just fine. Compared to interlocked approaches we're looking at a 5x improvement in RC speed for the most part because we can dispense with most interlocking. -- Andrei

Can someone explain why?

Since fibers can travel between threads, they will also be able to leak objects to different threads.
September 20, 2014
On Saturday, 20 September 2014 at 17:06:48 UTC, Andrei
Alexandrescu wrote:
> Why not just "return this.asSlice.hashOf;"?

Good idea :)

I'll use that instead.

>> Will
>>
>>     this.large[].hashOf
>>
>> do unneccessary GC-allocations? -vgc says nothing.

Ok, great! A couple of followup questions.

How big overhead is an RC compared to a non-RC GC-free string
variant?

Perhaps it would be nice to add a template parameter in RCXString
that makes the RC-optional?

If I want a *non*-RC GC-free variant of string/wstring/dstring
what's the best way to define them?

Would Array!char, Array!wchar, Array!dchar, be suitable
solutions? Of course these wouldn't utilize SSO. I'm asking
because Array is RandomAccess but string/wstring is not
byCodePoint.
September 20, 2014
On 9/20/14, 11:01 AM, "Nordlöw" wrote:
> How big overhead is an RC compared to a non-RC GC-free string
> variant?

Ballpark would be probably 1.1-2.5x. But there's of course a bunch of variability.

> Perhaps it would be nice to add a template parameter in RCXString
> that makes the RC-optional?

Manual memory management is not part of its charter.

> If I want a *non*-RC GC-free variant of string/wstring/dstring
> what's the best way to define them?

I think you're back to malloc and free kind of stuff.

> Would Array!char, Array!wchar, Array!dchar, be suitable
> solutions? Of course these wouldn't utilize SSO. I'm asking
> because Array is RandomAccess but string/wstring is not
> byCodePoint.

Those are refcounted.


Andrei