February 04, 2014
On Tuesday, 4 February 2014 at 21:02:37 UTC, deadalnix wrote:
> Core can share a cache line in read mode, but can't in write mode.

Why not? Level 3 is shared, but it has a latency of 30-40 cycles or so.

> That mean that updating the reference count will cause contention on the cache line (core will have to fight for the cache line ownership).

If they access it simultaneously and it stays in the local caches.

> That is why immutability + GC is so efficient in a highly concurrent system, and ref counting would ruin that.

If you blindly use ARC rather than sane RC. There is no reason to up the ref count if the data structure is "owned" while processing it. Which is a good reason to avoid ARC and use regular RC. If you know that the entire graph has a 1+ count, you don't need to do any ref counting while processing it.
February 04, 2014
On Tuesday, 4 February 2014 at 20:53:50 UTC, Nordlöw wrote:
>>
>> std.typecons.RefCounted!T
>>
>
> Does this work equally well with T being both a value objects (struct) and reference objects (class)?

RefCounted doesn't support classes yet but that's just because nobody has taken the time to implement it. There also needs to be a WeakRef added in order to round out the smart pointer support.

Once that's done Phobos will at least have parity with C++ (it already has Unique).
February 04, 2014
On Tue, 04 Feb 2014 16:37:20 -0500, Ola Fosheim Grøstad <ola.fosheim.grostad+dlang@gmail.com> wrote:

> If you blindly use ARC rather than sane RC. There is no reason to up the ref count if the data structure is "owned" while processing it. Which is a good reason to avoid ARC and use regular RC. If you know that the entire graph has a 1+ count, you don't need to do any ref counting while processing it.

This is an important point. However, the default is to be safe, and good enough for most. As long as there is a way to go into "I know what I'm doing" mode for encapsulated low-level code, the default should be to do ARC. ARC should handle simple cases, like if the reference only ever exists as a stack local.

One could also say you could mark a memory region as no-scan if you know that the data it points at will never be collected for its entire existence in a GC environment.

I think some mix of GC and ref counting for D would be extremely useful.

In my major Objective C project, I use manual reference counting, and I do avoid extra retains for redundant pointers. I've also had a few bugs because of that :)

-Steve
February 04, 2014
Am 04.02.2014 23:17, schrieb Steven Schveighoffer:
> On Tue, 04 Feb 2014 16:37:20 -0500, Ola Fosheim Grøstad
> <ola.fosheim.grostad+dlang@gmail.com> wrote:
>
>> If you blindly use ARC rather than sane RC. There is no reason to up
>> the ref count if the data structure is "owned" while processing it.
>> Which is a good reason to avoid ARC and use regular RC. If you know
>> that the entire graph has a 1+ count, you don't need to do any ref
>> counting while processing it.
>
> This is an important point. However, the default is to be safe, and good
> enough for most. As long as there is a way to go into "I know what I'm
> doing" mode for encapsulated low-level code, the default should be to do
> ARC. ARC should handle simple cases, like if the reference only ever
> exists as a stack local.
>
> One could also say you could mark a memory region as no-scan if you know
> that the data it points at will never be collected for its entire
> existence in a GC environment.
>
> I think some mix of GC and ref counting for D would be extremely useful.
>
> In my major Objective C project, I use manual reference counting, and I
> do avoid extra retains for redundant pointers. I've also had a few bugs
> because of that :)
>
> -Steve

How big is your team?

Based on my experience, this type of coding falls apart in the typical corporate team size.

--
Paulo
February 04, 2014
On 2/4/2014 4:23 AM, Michel Fortin wrote:
> For the rare cases where you actually want both versions to work,

I think you're making a vast assumption that the case is rare.

When I write utility functions, I want them to work on as wide a variety of inputs as possible. Otherwise, they are not very useful.

> you can write them twice  or use a template (except in a virtual context), and in both cases
> you keep the efficiency of not checking for null when the argument is not nullable.

That's just what I wish to avoid. Consider adding more pointer types - the combinatorics quickly explode. Heck, just have two pointer parameters, and you already have 4 cases.

I wonder how Rust deals with this.


> In any case, I have yet to understand why @nullable as a storage class would be
> any better. How do you solve that problem with a storage class?

Good question. I don't have an answer offhand.

February 04, 2014
On 2/4/2014 4:10 AM, Shammah Chancellor wrote:
> On 2014-02-04 06:50:51 +0000, Walter Bright said:
>
>> On 2/3/2014 1:42 PM, Shammah Chancellor wrote:
>>> It's also probably
>>> possible to create a drop-in replacement for the GC to do something else.
>>
>> It certainly is possible. There's nothing magic about the current GC, it's
>> just library code.
>
> Is it possible that we add some additional functionality to the GC API so that
> it could do ARC?  I took a look at it, and it seems that right now, there'd be
> no way to implement ARC.

ARC would require compiler changes, so it cannot be drop in.

February 04, 2014
On Tuesday, 4 February 2014 at 22:30:39 UTC, Walter Bright wrote:
>> you can write them twice  or use a template (except in a virtual context), and in both cases
>> you keep the efficiency of not checking for null when the argument is not nullable.
>
> That's just what I wish to avoid. Consider adding more pointer types - the combinatorics quickly explode. Heck, just have two pointer parameters, and you already have 4 cases.
>
> I wonder how Rust deals with this.
>

I think it is pretty much same thing as with using specialised types (i.e. SQLEscapedString) - your API explodes exponentially if you try to write code with methods/functions that support all of them but in practice you shouldn't. Same is here. Some functions should always deal with with nullable types and most - only with non-nullable. Tricky part is writing your code in such way that separation is clear but good language guidlines and examples can solve it.

Also non-nullable types should be implicitly cast to nullable parameters so you don't always need to support all cases distinctively.
February 04, 2014
On Tuesday, 4 February 2014 at 22:17:40 UTC, Steven Schveighoffer wrote:
> This is an important point. However, the default is to be safe, and good enough for most.

Yes, absolutely. I am all for progressive refinement, starting out with something conceptual  and very close to pseudocode, with all the convenience of high level programming. Then replace the parts that does not perform.

In Objective-C you don't really care that much about performance though, Cocoa provides very versatile libraries doing a lot of the work for you, but you have to go one step down to get speed without libraries. Inefficiencies are less noticeable in a heavy library/premade-components environment like Cococa than in a more barebones hit-the-iron-C-like approach.

> One could also say you could mark a memory region as no-scan if you know that the data it points at will never be collected for its entire existence in a GC environment.

Yes, I agree. You could have the same type in different pools and mark one of the pools as no-scan. You could have pool-local GC/segmented GC or even type-limited GC with whole program analysis (only collecting a specific class/subclasses).

> I think some mix of GC and ref counting for D would be extremely useful.

Yes, I don't want to throw out GC, but I think it would be more valuable and more realistic to have C++ reference semantics than advanced compiler inferred ARC, though I don't mind ARC, I think it probably is harder to get right for a more pure C systems level language than for Objective-C with Cocoa.

So, I think ARC is right for Cocoa, but not so sure about how well it works in other environments.
February 04, 2014
On Tue, 04 Feb 2014 17:27:46 -0500, Paulo Pinto <pjmlp@progtools.org> wrote:

> Am 04.02.2014 23:17, schrieb Steven Schveighoffer:
>> On Tue, 04 Feb 2014 16:37:20 -0500, Ola Fosheim Grøstad
>> <ola.fosheim.grostad+dlang@gmail.com> wrote:
>>
>>> If you blindly use ARC rather than sane RC. There is no reason to up
>>> the ref count if the data structure is "owned" while processing it.
>>> Which is a good reason to avoid ARC and use regular RC. If you know
>>> that the entire graph has a 1+ count, you don't need to do any ref
>>> counting while processing it.
>>
>> This is an important point. However, the default is to be safe, and good
>> enough for most. As long as there is a way to go into "I know what I'm
>> doing" mode for encapsulated low-level code, the default should be to do
>> ARC. ARC should handle simple cases, like if the reference only ever
>> exists as a stack local.
>>
>> One could also say you could mark a memory region as no-scan if you know
>> that the data it points at will never be collected for its entire
>> existence in a GC environment.
>>
>> I think some mix of GC and ref counting for D would be extremely useful.
>>
>> In my major Objective C project, I use manual reference counting, and I
>> do avoid extra retains for redundant pointers. I've also had a few bugs
>> because of that :)
>
> How big is your team?

2

> Based on my experience, this type of coding falls apart in the typical corporate team size.

It's not clear what "this type of coding" refers to, but I'll assume you targeted the last paragraph.

If it's encapsulated (and it is), it should be relatively untouched. In fact, I haven't touched the linked list part of it for a long time.

Note that the MRC was not really our choice, the project existed before ARC did.

-Steve
February 04, 2014
On Tuesday, 4 February 2014 at 22:30:39 UTC, Walter Bright wrote:
> I wonder how Rust deals with this.

The only time ownership matters is if you are going to store the pointer. It is like the difference between a container and a range.

An algorithm doesn't need to know about the specifics of a container. Let's use average for example. We might write it in D:

int average(InputRange)(InputRange r) {
    int count = 0;
    int sum;
    while(!r.empty) {
         count++;
         sum += r.front;
         r.popFront();
    }
    return sum / count;
}

Now, this being a template, D will generate new code for a variety of types... but even if we replaced InputRange with a specific thing, let's call it int[], it is still usable by a variety of containers:

int average(int[] r) { /* same impl */ }


D has two containers built in that provide this range:

int[50] staticArray;
int[] dynamicArray = new int[](50);

average(staticArray[]); // works
average(dynamicArray); // works

Pointers also offer this:

int* pointer = cast(int*) malloc(50 * int.sizeof);
average(pointer[0 .. 50]);



Moreover, user-defined types can also provide this range:

struct Numbers {
    int[] opSlice() { return [1,2,3]; }
}

Numbers numbers;
average(numbers[]); // works

In theory, we could provide either an inputRangeObject or a slice for linked lists, lazy generators, anything. One function, any kind of input.


Of course, we could slice memory from any allocator. Heck, we saw three different allocations right here (with three different types! stack, gc, and malloc) all using the same function, without templating.




I'm sure none of this is new to you... and this is basically how the rust thing works too. Our usage of int[] (or the input range) are borrowed pointers. Algorithms are written in their terms.

The ownership type only matters when you store it. And turns out, this matters in D as well:

struct ManualArray(T) {
    size_t length;
    T* data;

    this(size_t len) { data = malloc(T.sizeof * len); length = len; }
    ~this() { free(data); }
    T[] opSlice() { return data[0 .. length]; }
    @disable this(this); // copying this is wrong, don't allow it!
}

void main() {
    auto array = ManualArray!int(50);
    average(array[]); // works, reusing our pointer
}


But, borrowed comes into play if we store it:

int[] globalArray;
void foo(int[] array) {
    globalArray = array;
}

void bar() {
    auto array = ManualArray!int(50);
    foo(array[]); // uh oh
}

void main() {
   bar();
   globalArray[0] = 10; // crash likely, memory safety violated
}



Again, I'm sure none of this is new to you, but it illustrates owned vs borrowed: ManualArray is owned. Storing it is safe - it ensures its internal pointer is valid throughout its entire life time.

But ManualArray.opSlice returns a borrowed reference. Great for algorithms or any processing that doesn't escape the reference. Anything that would be written in terms of an input range is probably correct with this.

However, we stored the borrowed reference, which is a no-no. array went out of scope, freeing the memory, leaving the escaped borrowed reference in an invalid state.


Let's say we did want to store it. There's a few options: we could make our own copy or store the pre-made copy.

GC!(int[]) globalArray;
void foo(GC!(int[]) array) { globalArray = array; }


That's sane, the GC owns it and we specified that so storing it is cool.

We could also take a RefCounted!(int[]), if that's how we wanted to store it.


But let's say we wanted to store it with a different method. There's only two sane options:


void foo(int[] array) { globalArray = array.dup; }

Take a borrowed reference and make a copy of it. The function foo is in charge of allocating (here, we made a GC managed copy).


OR, don't implement that and force the user to decide:


void foo(GC!(int[]) array) {...}


user:

foo(ownedArray[]); // error, cannot implicitly convert int[] to GC!(int[])
int[50] stackArray;
foo(stackArray[]); // error, cannot implicitly convert int[] to GC!int[]


Now, the user makes the decision. It is going to be stored, the function signature says that up front by asking for a non-borrowed reference. They won't get a surprise crash when the globalArray later accesses stack or freed data. They have to deal with the error. They might not call the function, or they might do the .dup themselves. Either way, memory safety is preserved and inefficiencies are visible.



So, a function that stores a reference would only ever come in one or two signatures, regardless of how many:

1) the exact match for the callee's allocation strategy. The callee, knowing what the strategy is, can also be sanely responsible for freeing it. (A struct dtor, for example, knows that its members are malloced and can thus call free)

2) A generic borrowed type, e.g. input range or slice, which it then makes a private copy of it internally. Since these are arguably hidden allocations you might not even like these. Calling .dup (or whatever) at the call sight keeps the allocations visible.




So bottom line, you don't duplicate functions for the different types. You borrow references for processing (analogous to implementing algorithms with ranges) and own references for storing... which you need to know about, so only one type makes sense.