May 28, 2016
On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu wrote:
> I've been working on RCStr (endearingly pronounced "Our Sister"), D's up-and-coming reference counted string type. The goals are:

<Slightly off-topic>

RCStr may be an easier first step, but I think generic dynamic arrays are more interesting, because are more generally applicable and user types like move-only resources make them a more challenging problem to solve.

BTW, what happened to scope? Generally speaking, I'm not a fan of Rust, and I know that you think that D needs to differentiate, but I like their borrowing model for several reasons:
a) while not 100% safe and quite verbose, it offers enough improvements over @safe D to make it a worthwhile upgrade, if you don't care about any other language features
b) it's not that hard to grasp / almost natural for people familiar with C++11's copy (shared_ptr) and move (unique_ptr) semantics.
3) it's general enough that it can be applied to areas like iterator invalidation, thread synchronization and other logic bugs, like some third-party rust packages demonstrate.

I think that improving escape analysis with the scope attribute can go along way to shortening the gap between Rust and D in that area.

The other elephant(s) in the room are nested contexts like delegates, nested structs and some alias template parameter arguments. These are especially bad because the user has zero control over those GC allocations. Which makes some of D's key features unusable in @nogc contexts.
<End off-topic>

>
> * Reference counted, shouldn't leak if all instances destroyed; even if not, use the GC as a last-resort reclamation mechanism.
>
> * Entirely @safe.
>
> * Support UTF 100% by means of RCStr!char, RCStr!wchar etc. but also raw manipulation and custom encodings via RCStr!ubyte, RCStr!ushort etc.
>
> * Support several views of the same string, e.g. given s of type RCStr!char, it can be iterated byte-wise, code point-wise, code unit-wise etc. by using s.by!ubyte, s.by!char, s.by!dchar etc.
>
> * Support const and immutable qualifiers for the character type.
>
> * Work well with const and immutable when they qualify the entire RCStr type.
>
> * Fast: use the small string optimization and various other layout and algorithms to make it a good choice for high performance strings
>
> RFC: what primitives should RCStr have?
>
>
> Thanks,
>
> Andrei

0) (Prerequisite) Composition/interaction with language features/user types - RCStr in nested contexts (alias template parameters, delegates, nested structs/classes), array of RCStr-s, RCStr as a struct/class member, RCStr passed as (const) ref parameter, etc. should correctly increase/decrease ref count. This is also a prerequisite for safe RefCounted!T.
Action item: related compiler bugs should be prioritized. E.g. the RAII bug from
Shachar Shemesh's lightning talk - http://forum.dlang.org/post/n8algm$qra$1@digitalmars.com.
See also:
https://issues.dlang.org/buglist.cgi?quicksearch=raii&list_id=208631
https://issues.dlang.org/buglist.cgi?quicksearch=destructor&list_id=208632
(not everything in those lists is related but there are some nasty ones, like bad RVO codegen).

1) Safe slicing

2) shared overloads of member functions (e.g. for stuff like atomic incRef/decRef)

3) Concatenation (RCStr ~= RCStr ~ RCStr ~ char)

4) (Optional) Reserving (pre-allocating capacity) / shrinking. I labeled this feature request as optional, as it's not clear if RCStr is more like a container, or more like a slice/range.

5) Some sort of optimization for zero-terminated strings. Quite often one needs to interact with C APIs, which requires calling toStringz / toUTFz, which causes unnecessary allocations. It would be great if RCStr could efficiently handle this scenario.

6) !!! Not really a primitive, but we need to make sure that applying a chain of range transformations won't break ownership (e.g. leak or free prematurely).

7) Should be able to replace GC usage in transient ranges like e.g. File.byLine

8) Cheap initialization/assignment from string literals - should be roughly the same as either initializing a static character array (if the small string optimization is used) or just making it point to read-only memory in the data segment of the executable. It shouldn't try to write or free such memory. When initialized from a string literal, RCStr should also offer a null-terminating byte, provided that it points to the whole
If one wants to assign a string literal by overwriting parts of the already allocated storage, std.algorithm.mutation.copy should be used instead.

There may be other important primitives which I haven't thought of, but generally we should try to leverage std.algorithm, std.range, std.string and std.uni for them, via UFCS.

----------

On a related note, I know that you want to use AffixAllocator for reference counting, and I think it's a great idea. I have one question, which wasn't answered during that discussion:

// Use a nightly build to compile
import core.thread : Thread, thread_joinAll;
import std.range : iota;
import std.experimental.allocator : makeArray;
import std.experimental.allocator.building_blocks.region : InSituRegion;
import std.experimental.allocator.building_blocks.affix_allocator : AffixAllocator;

AffixAllocator!(InSituRegion!(4096) , uint) tlsAllocator;

static assert (tlsAllocator.sizeof >= 4096);

import std.stdio;
void main()
{
    shared(int)[] myArray;

    foreach (i; 0 .. 100)
    {
        new Thread(
        {
            if (i != 0) return;

            myArray = tlsAllocator.makeArray!(shared int)(100.iota);
            static assert(is(typeof(&tlsAllocator.prefix(myArray)) == shared(uint)*));
            writefln("At %x: %s", myArray.ptr, myArray);

        }).start();

        thread_joinAll();
    }

    writeln(myArray); // prints garbage!!!
}

So my question is: should it be possible to share thread-local data like this?
IMO, the current allocator design opens a serious hole in the type system, because it allows using data allocated from another thread's thread-local storage. After the other thread exits, accessing memory allocated from it's TLS should not be possible, but https://github.com/dlang/phobos/pull/3991 clearly allows that.

One should be able to allocate shared memory only from shared allocators. And shared allocators must backed by shared parent allocators or shared underlying storage. In this case the Region allocator should be shared, and must be backed by shared memory, Mallocator, or something in that vein.
May 28, 2016
On Saturday, 28 May 2016 at 09:43:41 UTC, ZombineDev wrote:
> On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu wrote:
>> I've been working on RCStr (endearingly pronounced "Our Sister"), D's up-and-coming reference counted string type. The goals are:
>
> <Slightly off-topic>
>
> RCStr may be an easier first step, but I think generic dynamic arrays are more interesting, because are more generally applicable and user types like move-only resources make them a more challenging problem to solve.
>
> BTW, what happened to scope? Generally speaking, I'm not a fan of Rust, and I know that you think that D needs to differentiate, but I like their borrowing model for several reasons:
> a) while not 100% safe and quite verbose, it offers enough improvements over @safe D to make it a worthwhile upgrade, if you don't care about any other language features
> b) it's not that hard to grasp / almost natural for people familiar with C++11's copy (shared_ptr) and move (unique_ptr) semantics.
> 3) it's general enough that it can be applied to areas like iterator invalidation, thread synchronization and other logic bugs, like some third-party rust packages demonstrate.
>
> I think that improving escape analysis with the scope attribute can go along way to shortening the gap between Rust and D in that area.
>
> The other elephant(s) in the room are nested contexts like delegates, nested structs and some alias template parameter arguments. These are especially bad because the user has zero control over those GC allocations. Which makes some of D's key features unusable in @nogc contexts.
> <End off-topic>
>
>>
>> * Reference counted, shouldn't leak if all instances destroyed; even if not, use the GC as a last-resort reclamation mechanism.
>>
>> * Entirely @safe.
>>
>> * Support UTF 100% by means of RCStr!char, RCStr!wchar etc. but also raw manipulation and custom encodings via RCStr!ubyte, RCStr!ushort etc.
>>
>> * Support several views of the same string, e.g. given s of type RCStr!char, it can be iterated byte-wise, code point-wise, code unit-wise etc. by using s.by!ubyte, s.by!char, s.by!dchar etc.
>>
>> * Support const and immutable qualifiers for the character type.
>>
>> * Work well with const and immutable when they qualify the entire RCStr type.
>>
>> * Fast: use the small string optimization and various other layout and algorithms to make it a good choice for high performance strings
>>
>> RFC: what primitives should RCStr have?
>>
>>
>> Thanks,
>>
>> Andrei
>
> 0) (Prerequisite) Composition/interaction with language features/user types - RCStr in nested contexts (alias template parameters, delegates, nested structs/classes), array of RCStr-s, RCStr as a struct/class member, RCStr passed as (const) ref parameter, etc. should correctly increase/decrease ref count. This is also a prerequisite for safe RefCounted!T.
> Action item: related compiler bugs should be prioritized. E.g. the RAII bug from
> Shachar Shemesh's lightning talk - http://forum.dlang.org/post/n8algm$qra$1@digitalmars.com.
> See also:
> https://issues.dlang.org/buglist.cgi?quicksearch=raii&list_id=208631
> https://issues.dlang.org/buglist.cgi?quicksearch=destructor&list_id=208632
> (not everything in those lists is related but there are some nasty ones, like bad RVO codegen).
>
> 1) Safe slicing
>
> 2) shared overloads of member functions (e.g. for stuff like atomic incRef/decRef)
>
> 3) Concatenation (RCStr ~= RCStr ~ RCStr ~ char)
>
> 4) (Optional) Reserving (pre-allocating capacity) / shrinking. I labeled this feature request as optional, as it's not clear if RCStr is more like a container, or more like a slice/range.
>
> 5) Some sort of optimization for zero-terminated strings. Quite often one needs to interact with C APIs, which requires calling toStringz / toUTFz, which causes unnecessary allocations. It would be great if RCStr could efficiently handle this scenario.
>
> 6) !!! Not really a primitive, but we need to make sure that applying a chain of range transformations won't break ownership (e.g. leak or free prematurely).
>
> 7) Should be able to replace GC usage in transient ranges like e.g. File.byLine
>
> 8) Cheap initialization/assignment from string literals - should be roughly the same as either initializing a static character array (if the small string optimization is used) or just making it point to read-only memory in the data segment of the executable. It shouldn't try to write or free such memory. When initialized from a string literal, RCStr should also offer a null-terminating byte, provided that it points to the whole
> If one wants to assign a string literal by overwriting parts of the already allocated storage, std.algorithm.mutation.copy should be used instead.
>
> There may be other important primitives which I haven't thought of, but generally we should try to leverage std.algorithm, std.range, std.string and std.uni for them, via UFCS.
>
> ----------
>
> On a related note, I know that you want to use AffixAllocator for reference counting, and I think it's a great idea. I have one question, which wasn't answered during that discussion:
>
> // Use a nightly build to compile
> import core.thread : Thread, thread_joinAll;
> import std.range : iota;
> import std.experimental.allocator : makeArray;
> import std.experimental.allocator.building_blocks.region : InSituRegion;
> import std.experimental.allocator.building_blocks.affix_allocator : AffixAllocator;
>
> AffixAllocator!(InSituRegion!(4096) , uint) tlsAllocator;
>
> static assert (tlsAllocator.sizeof >= 4096);
>
> import std.stdio;
> void main()
> {
>     shared(int)[] myArray;
>
>     foreach (i; 0 .. 100)
>     {
>         new Thread(
>         {
>             if (i != 0) return;
>
>             myArray = tlsAllocator.makeArray!(shared int)(100.iota);
>             static assert(is(typeof(&tlsAllocator.prefix(myArray)) == shared(uint)*));
>             writefln("At %x: %s", myArray.ptr, myArray);
>
>         }).start();
>
>         thread_joinAll();
>     }
>
>     writeln(myArray); // prints garbage!!!
> }
>
> So my question is: should it be possible to share thread-local data like this?
> IMO, the current allocator design opens a serious hole in the type system, because it allows using data allocated from another thread's thread-local storage. After the other thread exits, accessing memory allocated from it's TLS should not be possible, but https://github.com/dlang/phobos/pull/3991 clearly allows that.
>
> One should be able to allocate shared memory only from shared allocators. And shared allocators must backed by shared parent allocators or shared underlying storage. In this case the Region allocator should be shared, and must be backed by shared memory, Mallocator, or something in that vein.

Here's another case where the last change to AffixAllocator is really dangerous:
void main()
{
    immutable(int)[] myArray;

    foreach (i; 0 .. 100)
    {
        new Thread(
        {
            if (i != 0) return;

            myArray = tlsAllocator.makeArray!(immutable int)(100.iota);
            writeln(myArray); // prints [0, ..., 99]

        }).start();

        thread_joinAll(); // prints garbage
    }

    writeln(myArray);
}

In this case it severely violates the promise of immutable.
May 28, 2016
On Friday, 27 May 2016 at 13:32:30 UTC, Andrei Alexandrescu wrote:
> On 5/27/16 7:07 AM, Marc Schütz wrote:
>> On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu wrote:
>>> RFC: what primitives should RCStr have?
>>
>> It should _safely_ convert to `const(char)[]`.
>
> That is not possible, sorry. -- Andrei

It is when DIP25 [1] is finally fully implemented (by that I mean including for slices and pointers etc., Walter told me at Dconf that this is going to happen), and the problem with aliasing references is solved (which needs to happen anyway for any reference counting to be safe).

[1] https://wiki.dlang.org/DIP25
May 28, 2016
On Saturday, 28 May 2016 at 04:28:16 UTC, Manu wrote:
> On 27 May 2016 at 23:32, Andrei Alexandrescu via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>> On 5/27/16 7:07 AM, Marc Schütz wrote:
>>> It should _safely_ convert to `const(char)[]`.
>>
>>
>> That is not possible, sorry. -- Andrei
>
> It should safely convert to 'scope const(char)[]', then we only need a fat-slice or like at the very top of the callstack...

I didn't want to mention the s-word ;-)
May 28, 2016
On Saturday, 28 May 2016 at 04:15:45 UTC, Manu wrote:
> This is only true for the owner. If we had 'scope', or something like
> it (ie, borrowing in rust lingo), then the fat slice wouldn't need to
> be passed around

Right, I agree - if we keep the slice just the way it is now, it all still works if you borrow correctly!

(BTW, I don't think we even need this to be strictly @safe, though it would be nice if it was tested, we could say @system getSlice and potentially change it to @safe later.)
May 29, 2016
On 05/27/2016 01:17 AM, Seb wrote:
> Oh yes that's what I meant. Sorry for being so confusing.
> __Right__ is way more important than breakages. For that we have `dfix`.

Don't get overly excited. dfix will never be capable of automatic fixup with such deep levels of semantic analysis required, this can only be done by compiler itself (which is currently not designed for fixup kind of tasks).
May 30, 2016
Am Sat, 28 May 2016 14:15:45 +1000
schrieb Manu via Digitalmars-d <digitalmars-d@puremagic.com>:

> On 28 May 2016 at 10:16, Adam D. Ruppe via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> > On Friday, 27 May 2016 at 21:51:59 UTC, Seb wrote:
> >>
> >> not if [] would be ref-counted too ;-)
> >
> >
> > That would be kinda horrible. Right now, slicing is virtually free and compatible with all kinds of backing schemes. If it became refcounted, it'd:
> >
> > 1) have to keep a pointer to the refcount structure with the slice, adding memory cost
> 
> This is only true for the owner. If we had 'scope', or something like
> it (ie, borrowing in rust lingo), then the fat slice wouldn't need to
> be passed around, it's only a burden on the top-level owner.
> 'scope' is consistently rejected, but it solves so many long-standing
> problems we have, and this reduction of 'fat'(/rc)-slices to normal
> slices is a particularly important one.

I second that thought. But I'd be ok with an unsafe slice and making sure myself, that I don't keep a reference around. A lot of functions only borrow data and can work on a naked pointer/ref/slice, while the owner(s) have the smart pointer. These can of course be converted to templates taking either char[] or RCStr, but I think borrowing is cleaner when the function in question doesn't care a bag of beans if the chars it works on were allocated on the GC heap or reference counted.

-- 
Marco

May 31, 2016
On Friday, 27 May 2016 at 21:25:50 UTC, Andrei Alexandrescu wrote:
> On 05/27/2016 05:02 PM, Era Scarecrow wrote:
>>   With the current state of things, I'll just take your word on it.
>
> Reasoning is simple - yes we could safely convert to const(char)[] but that means effectively all refcounting is lost for that string. So we can convert but in an explicit manner, e.g. str.toGCThisWillCompletelySuckMan. -- Andrei

We could have:

const(char)[] s = rcstr.stealSlice;

Which is null* if the refcount is > 1. rcstr would then be empty on success. In fact if with the RC DIP we guarantee the memory doesn't escape, stealSlice could return string.

*Or better, return an Option.
June 01, 2016
On 31 May 2016 at 01:00, Marco Leise via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> Am Sat, 28 May 2016 14:15:45 +1000
> schrieb Manu via Digitalmars-d <digitalmars-d@puremagic.com>:
>
>> On 28 May 2016 at 10:16, Adam D. Ruppe via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>> > On Friday, 27 May 2016 at 21:51:59 UTC, Seb wrote:
>> >>
>> >> not if [] would be ref-counted too ;-)
>> >
>> >
>> > That would be kinda horrible. Right now, slicing is virtually free and compatible with all kinds of backing schemes. If it became refcounted, it'd:
>> >
>> > 1) have to keep a pointer to the refcount structure with the slice, adding memory cost
>>
>> This is only true for the owner. If we had 'scope', or something like
>> it (ie, borrowing in rust lingo), then the fat slice wouldn't need to
>> be passed around, it's only a burden on the top-level owner.
>> 'scope' is consistently rejected, but it solves so many long-standing
>> problems we have, and this reduction of 'fat'(/rc)-slices to normal
>> slices is a particularly important one.
>
> I second that thought. But I'd be ok with an unsafe slice and making sure myself, that I don't keep a reference around. A lot of functions only borrow data and can work on a naked pointer/ref/slice, while the owner(s) have the smart pointer. These can of course be converted to templates taking either char[] or RCStr, but I think borrowing is cleaner when the function in question doesn't care a bag of beans if the chars it works on were allocated on the GC heap or reference counted.

D loves templates, but templates aren't a given. Closed-source projects often can't have templates in the public API (ie, source should not be available), and this is my world.
May 31, 2016
Am Wed, 1 Jun 2016 01:06:36 +1000
schrieb Manu via Digitalmars-d <digitalmars-d@puremagic.com>:

> D loves templates, but templates aren't a given. Closed-source projects often can't have templates in the public API (ie, source should not be available), and this is my world.

Same effect for GPL code. Funny. (Template instantiations are like statically linking in the open source code.)

-- 
Marco

1 2 3 4 5
Next ›   Last »