September 15, 2014
On Monday, 15 September 2014 at 09:50:30 UTC, monarch_dodra wrote:
> On Monday, 15 September 2014 at 02:26:19 UTC, Andrei Alexandrescu wrote:
>> So, please fire away. I'd appreciate it if you used RCString in lieu of string and note the differences. The closer we get to parity in semantics, the better.
>>
>>
>> Thanks,
>>
>> Andrei
>
> ***Blocker thoughts***
> (unless I'm misunderstood)
>
> - Does not provide Forward range iteration that I can find. This makes it unuseable for algorithms:
>     find (myRCString, "hello"); //Nope
> Also, adding "save" to make it forward might not be a good idea, since it would also mean it becomes an RA range (which it isn't).

No, RA is not implied by forward.

>
> - Does not provide any way to (even "unsafely") extract a raw array. Makes it difficult to interface with existing functions. It would also be important for "RCString aware" functions to be properly optimized (eg memchr for searching etc...)

Another perfect use case for borrowing...

> ***Extra thoughts***
> There have been requests for non auto-decoding strings. Maybe this would be a good opportunity for "RCXUString" ?

Yes. I'm surprised by this proposal, because I thought Walter was totally opposed to a dedicated string type. If it now becomes acceptable, it's a good opportunity for moving away for auto-decoding.
September 15, 2014
On Monday, 15 September 2014 at 13:13:34 UTC, Jakob Ovrum wrote:
>
> If you are suggesting we somehow make this work so it can be a drop-in replacement for `string`:

Yes, you must be able to get a RCString from one thread to the next.

>
> I don't think that should be implicitly supported.
>

Well, it should be at least supported in phobos.

How is another matter.

>
> Maybe the right solution involves integration with std.typecons.Unique. Passing an instance of Unique!T to another thread is something std.concurrency should support, and RCString could be given a method that returns Unique!RCString if the reference count is 1 and errors otherwise. Unique's current implementation would have to be overhauled to carry its payload in-situ instead of on the GC heap like it currently does, but that's something we should do regardless.

Sounds good.
September 15, 2014
On Monday, 15 September 2014 at 13:15:28 UTC, Marc Schütz wrote:
>> - Does not provide Forward range iteration that I can find. This makes it unuseable for algorithms:
>>    find (myRCString, "hello"); //Nope
>> Also, adding "save" to make it forward might not be a good idea, since it would also mean it becomes an RA range (which it isn't).
>
> No, RA is not implied by forward.

Right, but RCString already has the RA primitives (and hasLength), it's only missing ForwardRange traits to *also* become RandomAccess.
September 15, 2014
On 9/15/14, 1:51 AM, John Colvin wrote:
> Why not open this up to all slices of immutable value type elements?

That will come in good time. For now I didn't want to worry about indirections, constructors, etc. -- Andrei
September 15, 2014
On 9/15/14, 2:50 AM, monarch_dodra wrote:
> - Does not provide Forward range iteration that I can find. This makes
> it unuseable for algorithms:
>      find (myRCString, "hello"); //Nope
> Also, adding "save" to make it forward might not be a good idea, since
> it would also mean it becomes an RA range (which it isn't).

If we move forward with this type, traits will recognize it as isSomeString.

> - Does not provide any way to (even "unsafely") extract a raw array.
> Makes it difficult to interface with existing functions. It would also
> be important for "RCString aware" functions to be properly optimized (eg
> memchr for searching etc...)

I think a @system unsafeSlice() property would be needed indeed.

> - No way to "GC-dup" the RCString. giving "dup"/"idup" members on
> RCstring, for when you really just need to revert to pure un-collected GC.

Nice. But then I'm thinking, wouldn't people think .dup produces another RCString?

> Did I miss something? It seems actually *doing* something with an
> RCString is really difficult.

Yah it's too tightly wound right now, but that's the right way!

> ***Random implementation thought:***
> "size_t maxSmall = 23" is (IMO) gratuitous: It can only lead to
> non-optimization and binary bloat. We'd end up having incompatible
> RCStrings, which is bad.
>
> At the very least, I'd say make it a parameter *after* the "realloc"
> function (as arguably, maxSmall  depends on the allocation scheme, and
> not the other way around).

I think realloc will disappear.

> In particular, it seems RCBuffer does not depend on maxSmall, so it
> might be possible to move that out of RCXString.
>
> ***Extra thoughts***
> There have been requests for non auto-decoding strings. Maybe this would
> be a good opportunity for "RCXUString" ?

For now I was aiming at copying string's semantics.


Andrei
September 15, 2014
On 9/15/14, 2:53 AM, Robert burner Schadek wrote:
> On Monday, 15 September 2014 at 02:26:19 UTC, Andrei Alexandrescu wrote:
>>
>> The road there is long, but it starts with the proverbial first step.
>> As it were, I have a rough draft of a almost-drop-in replacement of
>> string (aka immutable(char)[]). Destroy with maximum prejudice:
>>
>> http://dpaste.dzfl.pl/817283c163f5
>>
>
> I haven't found a single lock, is single threading by design or is
> thread-safety on your todo?

Currently shared strings are not addressed.

> Could you transfer this into phobos and make it work with the functions
> in std.string, it would be a shame if they wouldn't work out of the box
> when this gets merged. I haven't seen anything that should prevent using
> the functions of std.string except isSomeString but that should be no
> problem to fix.

Good idea.

> This is sort of personal to me as most of my PR are in
> std.string and I sort of aspire to become the LT for std.string ;-)

Oooh, nice!

> I would assume RCString should be faster than string, so could you
> provide a benchmark of the two.

Good idea. It likely won't be faster for the most part (unless it uses realloc and realloc is a lot faster than GC.realloc). Designs based on RCString will, however, have a tighter memory footprint.


Andrei

September 15, 2014
On 9/15/14, 3:30 AM, bearophile wrote:
> Andrei Alexandrescu:
>
>> Walter, Brad, myself, and a couple of others have had a couple of
>> quite exciting ideas regarding code that is configurable to use the GC
>> or alternate resource management strategies.
>
> An alternative design solution is to follow the Java way, leave the D
> strings as they are, and avoid to make a mess of user D code. Java GC
> and runtime contain numerous optimizations for the management of
> strings, like the recently introduced string de-duplication at run-time:
>
> https://blog.codecentric.de/en/2014/08/string-deduplication-new-feature-java-8-update-20-2

Again, it's become obvious that a category of users will simply refuse to use a GC, either for the right or the wrong reasons. We must make D eminently usable for them.

Andrei


September 15, 2014
On 9/15/14, 6:13 AM, Jakob Ovrum wrote:
> One method would be to support shared(RCString). This isn't very
> practical for this use-case, as since atomic reference counting is super
> slow, you wouldn't want to be using shared(RCString) throughout your
> program. So you'd have to make a copy on each side (unshared -> shared,
> then send, then shared -> unshared) which is one copy more than
> necessary and would still require support for shared(RCString) which is
> non-trivial.

I think shared(RCString) should be supported. Unique!T is, of course, also worth exploring. -- Andrei
September 15, 2014
Andrei Alexandrescu:

> Again, it's become obvious that a category of users will simply refuse to use a GC, either for the right or the wrong reasons. We must make D eminently usable for them.

Is adding reference counted strings to D going to add a significant amount of complexity for the programmers?

As usual your judgement is better than mine, but surely the increase in complexity of D language and its usage must be considered in this rcstring discussion. So far I have not seen this point discussed enough in this thread.

D is currently quite complex, so I prefer enhancements that simplify the code (like tuples), or that make it safer (this mostly means type system improvements, like eprovably correct tracking of memory areas and lifetimes, or stricter types for array indexes, or better means to detect errors at compile-times with more compile-time introspection for function/ctor arguments), or features that have a limited scope and don't increase the general code complexity much (like the partial type inference patch created by Kenji).

Bye,
bearophile
September 15, 2014
On Monday, 15 September 2014 at 14:44:53 UTC, Andrei Alexandrescu wrote:
> For now I was aiming at copying string's semantics.

Then range primitives should move to std.range or where they are now. By default string iterates over its array elements, which is char in this case.