September 15, 2014
On Monday, 15 September 2014 at 02:26:19 UTC, Andrei Alexandrescu wrote:
> So, please fire away. I'd appreciate it if you used RCString in lieu of string and note the differences. The closer we get to parity in semantics, the better.

It should support appending single code units:

---
alias String = RCString;

void main()
{
    String s = "abc";

    s ~= cast(char)'0';
    s ~= cast(wchar)'0';
    s ~= cast(dchar)'0';

    writeln(s); // abc000
}
---

Works with C[], fails with RCString. The same is true for concatenation.
September 15, 2014
On Monday, 15 September 2014 at 02:26:19 UTC, Andrei Alexandrescu wrote:
> Walter, Brad, myself, and a couple of others have had a couple of quite exciting ideas regarding code that is configurable to use the GC or alternate resource management strategies. One thing that became obvious to us is we need to have a reference counted string in the standard library. That would be usable with applications that want to benefit from comfortable string manipulation whilst using classic reference counting for memory management. I'll get into more details into the mechanisms that would allow the stdlib to provide functionality for both GC strings and RC strings; for now let's say that we hope and aim for swapping between these with ease. We hope that at one point people would be able to change one line of code, rebuild, and get either GC or RC automatically (for Phobos and their own code).
>
> The road there is long, but it starts with the proverbial first step. As it were, I have a rough draft of a almost-drop-in replacement of string (aka immutable(char)[]). Destroy with maximum prejudice:
>
> http://dpaste.dzfl.pl/817283c163f5
>
> For now RCString supports only immutable char as element type. That means you can't modify individual characters in an RCString object but you can take slices, append to it, etc. - just as you can with string. A compact reference counting scheme is complemented with a small buffer optimization, so performance should be fairly decent.
>
> Somewhat surprisingly, pure constructors and inout took good care of qualified semantics (you can convert a mutable to an immutable string and back safely). I'm not sure whether semantics there are a bit too lax, but at least for RCString they turned out to work beautifully and without too much fuss.
>
> The one wrinkle is that you need to wrap string literals "abc" with explicit constructor calls, e.g. RCString("abc"). This puts RCString on a lower footing than built-in strings and makes swapping configurations a tad more difficult.
>
> Currently I've customized RCString with the allocation policy, which I hurriedly reduced to just one function with the semantics of realloc. That will probably change in a future pass; the point for now is that allocation is somewhat modularized away from the string workings.
>
> So, please fire away. I'd appreciate it if you used RCString in lieu of string and note the differences. The closer we get to parity in semantics, the better.
>
>
> Thanks,
>
> Andrei

Why not open this up to all slices of immutable value type elements?
September 15, 2014
On Monday, 15 September 2014 at 02:26:19 UTC, Andrei Alexandrescu wrote:
> So, please fire away. I'd appreciate it if you used RCString in lieu of string and note the differences. The closer we get to parity in semantics, the better.
>
>
> Thanks,
>
> Andrei

***Blocker thoughts***
(unless I'm misunderstood)

- Does not provide Forward range iteration that I can find. This makes it unuseable for algorithms:
    find (myRCString, "hello"); //Nope
Also, adding "save" to make it forward might not be a good idea, since it would also mean it becomes an RA range (which it isn't).

- Does not provide any way to (even "unsafely") extract a raw array. Makes it difficult to interface with existing functions. It would also be important for "RCString aware" functions to be properly optimized (eg memchr for searching etc...)

- No way to "GC-dup" the RCString. giving "dup"/"idup" members on RCstring, for when you really just need to revert to pure un-collected GC.

Did I miss something? It seems actually *doing* something with an RCString is really difficult.


***Random implementation thought:***
"size_t maxSmall = 23" is (IMO) gratuitous: It can only lead to non-optimization and binary bloat. We'd end up having incompatible RCStrings, which is bad.

At the very least, I'd say make it a parameter *after* the "realloc" function (as arguably, maxSmall  depends on the allocation scheme, and not the other way around).

In particular, it seems RCBuffer does not depend on maxSmall, so it might be possible to move that out of RCXString.

***Extra thoughts***
There have been requests for non auto-decoding strings. Maybe this would be a good opportunity for "RCXUString" ?
September 15, 2014
On Monday, 15 September 2014 at 02:26:19 UTC, Andrei Alexandrescu wrote:
>
> The road there is long, but it starts with the proverbial first step. As it were, I have a rough draft of a almost-drop-in replacement of string (aka immutable(char)[]). Destroy with maximum prejudice:
>
> http://dpaste.dzfl.pl/817283c163f5
>

I haven't found a single lock, is single threading by design or is thread-safety on your todo?

Could you transfer this into phobos and make it work with the functions in std.string, it would be a shame if they wouldn't work out of the box when this gets merged. I haven't seen anything that should prevent using the functions of std.string except isSomeString but that should be no problem to fix. This is sort of personal to me as most of my PR are in std.string and I sort of aspire to become the LT for std.string ;-)

I would assume RCString should be faster than string, so could you provide a benchmark of the two.
September 15, 2014
On Monday, 15 September 2014 at 09:53:28 UTC, Robert burner
Schadek wrote:
> On Monday, 15 September 2014 at 02:26:19 UTC, Andrei Alexandrescu wrote:
>>
>> The road there is long, but it starts with the proverbial first step. As it were, I have a rough draft of a almost-drop-in replacement of string (aka immutable(char)[]). Destroy with maximum prejudice:
>>
>> http://dpaste.dzfl.pl/817283c163f5
>>
>
> I haven't found a single lock, is single threading by design or is thread-safety on your todo?

There's no use of `shared`, so all data involved is TLS.
September 15, 2014
Andrei Alexandrescu:

> Walter, Brad, myself, and a couple of others have had a couple of quite exciting ideas regarding code that is configurable to use the GC or alternate resource management strategies.

An alternative design solution is to follow the Java way, leave the D strings as they are, and avoid to make a mess of user D code. Java GC and runtime contain numerous optimizations for the management of strings, like the recently introduced string de-duplication at run-time:

https://blog.codecentric.de/en/2014/08/string-deduplication-new-feature-java-8-update-20-2

Bye,
bearophile
September 15, 2014
On Monday, 15 September 2014 at 10:13:28 UTC, Jakob Ovrum wrote:
> On Monday, 15 September 2014 at 09:53:28 UTC, Robert burner
>>
>> I haven't found a single lock, is single threading by design or is thread-safety on your todo?
>
> There's no use of `shared`, so all data involved is TLS.

Then it must be made sure that send and receive work properly.
September 15, 2014
On Monday, 15 September 2014 at 11:53:15 UTC, Robert burner
Schadek wrote:
> On Monday, 15 September 2014 at 10:13:28 UTC, Jakob Ovrum wrote:
>> On Monday, 15 September 2014 at 09:53:28 UTC, Robert burner
>>>
>>> I haven't found a single lock, is single threading by design or is thread-safety on your todo?
>>
>> There's no use of `shared`, so all data involved is TLS.
>
> Then it must be made sure that send and receive work properly.

They do. They only accept shared or immutable arguments (or
arguments with no mutable indirection).
September 15, 2014
On Monday, 15 September 2014 at 12:11:14 UTC, Jakob Ovrum wrote:
>>> There's no use of `shared`, so all data involved is TLS.
>>
>> Then it must be made sure that send and receive work properly.
>
> They do. They only accept shared or immutable arguments (or
> arguments with no mutable indirection).

compiler says no: concurrency.d(554): Error: static assert  "Aliases to mutable thread-local data not allowed."

I used the std.concurrency example
September 15, 2014
On Monday, 15 September 2014 at 12:47:08 UTC, Robert burner Schadek wrote:
> On Monday, 15 September 2014 at 12:11:14 UTC, Jakob Ovrum wrote:
>>>> There's no use of `shared`, so all data involved is TLS.
>>>
>>> Then it must be made sure that send and receive work properly.
>>
>> They do. They only accept shared or immutable arguments (or
>> arguments with no mutable indirection).
>
> compiler says no: concurrency.d(554): Error: static assert  "Aliases to mutable thread-local data not allowed."
>
> I used the std.concurrency example

Yes, that was my point. std.concurrency handles it correctly - there's no unsafe memory sharing going on with RCString's implementation.

If you are suggesting we somehow make this work so it can be a drop-in replacement for `string`:

I don't think that should be implicitly supported.

One method would be to support shared(RCString). This isn't very practical for this use-case, as since atomic reference counting is super slow, you wouldn't want to be using shared(RCString) throughout your program. So you'd have to make a copy on each side (unshared -> shared, then send, then shared -> unshared) which is one copy more than necessary and would still require support for shared(RCString) which is non-trivial.

Another option would be to hardcode support for RCString in std.concurrency. This would make the copy hidden, which would go against good practices concerning arrays in D, and not very useful for @nogc if the copy has to be a GC copy. Additionally, RCString's interface would need to be compromised to allow constructing from an existing buffer somehow.

Maybe the right solution involves integration with std.typecons.Unique. Passing an instance of Unique!T to another thread is something std.concurrency should support, and RCString could be given a method that returns Unique!RCString if the reference count is 1 and errors otherwise. Unique's current implementation would have to be overhauled to carry its payload in-situ instead of on the GC heap like it currently does, but that's something we should do regardless.