May 13, 2014
On Tuesday, 13 May 2014 at 17:37:05 UTC, Steven Schveighoffer wrote:
> On Tue, 13 May 2014 13:36:17 -0400, Daniel Murphy <yebbliesnospam@gmail.com> wrote:
>
>> "Steven Schveighoffer"  wrote in message news:op.xfs6jhp3eav7ka@stevens-macbook-pro-2.local...
>>
>>> If a thread is doing only reads, or is  only touching non-Scanned memory, it continues.
>>
>> How long do threads usually go without touching the stack?
>
> The stack would have to be COW.

When you fork a process, all of its pages are COW. What you propose is much more complicated. You would either first need to fork(), then somehow remap certain pages as shared. The remapping would have to happen in the parent process. I don't know whether there are even interfaces to do this.
May 13, 2014
On Tuesday, 13 May 2014 at 17:22:19 UTC, Steven Schveighoffer wrote:
> The idea I had was to make them pause-on-write. This means, when the original process attempts to write to the page, it gets a page-fault, which pauses the thread until the collector is done with it. This causes the same halting that normally happens with stop-the-world, but only on-demand, instead of preemptively. If a thread is doing only reads, or is only touching non-Scanned memory, it continues.

That would require a separate page table and would freeze threads real fast for global GC. It can work for local heaps like caches? I guess you could set up shared memory between two processes and use that as your heap.

> Just an idea, I make no claims to its actual benefits :)

Another idea is to use transactions and roll back the collector if someone does a write, for small heaps with few writes... Principle: run the collector when a core is idle and yield to cores that do real work.

But I dont think current gen CPUs can handle big transactions...


May 13, 2014

On 13.05.2014 13:09, "Marc Schütz" <schuetzm@gmx.net>" wrote:
> On Tuesday, 13 May 2014 at 06:06:40 UTC, Rainer Schuetze wrote:
>>
>>
>> On 12.05.2014 13:53, "Marc Schütz" <schuetzm@gmx.net>" wrote:
>>>
>>> I'm surprised that you didn't include:
>>>
>>> 3. Thread-local GC, isolated zones (restricting where references to
>>> objects of a particular heap can be placed), exempting certain threads
>>> from GC completely, ...
>>
>> This comes up from time to time, but to me it is very blurry how this
>> can work in reality.
>>
>> Considering how "shared" is supposed to be used to be useful (do some
>> locking, then cast away "shared") there is no guarantee by the
>> language that any object is actually thread local (no references from
>> other threads). Working with immutable (e.g. strings) is shared by
>> design.
>
> Yes, but only a part of the data is shared. I suspect the majority of
> the data in typical programs will be thread-local. If you use a message
> passing model, you can improve that even further (though it requires a
> way to move an object to another thread's heap). This way, you can - in
> the best case - avoid the shared heap completely.

Possible, but this is with a language without shared data (including immutable), not D. Safely transferring a large object tree from one thread to another will be very expensive. If you try to optimize that, the D compiler won't help you to guarantee memory safety.

Actually I don't have much experience with "shared" (I usually use __gshared if I need a shared global). Maybe I understand the idea better if you show what happens with this code when run with thread local GC:

class C { C next; }

shared(C) list;

C newC()
{
	return new C; // allocated on the local heap?
}	

void prependToList(C c)
{
	synchronized(list)
	{
		C lst = cast(C) list;
		c.next = lst;  // anything special happening here?
		list = cast(shared(C)) c;  // and here?
	}
}

void callMeFromMultipleThreads()
{
	prependToList(newC());
}
May 13, 2014

On 13.05.2014 14:20, Wyatt wrote:
> On Tuesday, 13 May 2014 at 06:06:40 UTC, Rainer Schuetze wrote:
>>
>> This comes up from time to time, but to me it is very blurry how this
>> can work in reality.
>>
> The paper I linked on Friday [0] presents a collector like this. Are
> there concerns I've missed that make that not applicable?

I just read the first chapters, and according to that, existing local gcs needs write barriers, so we are back to my second proposal. The implementation in the paper even adds read barriers.

>
>> Considering how "shared" is supposed to be used to be useful (do some
>> locking, then cast away "shared") there is no guarantee by the
>> language that any object is actually thread local (no references from
>> other threads). Working with immutable (e.g. strings) is shared by
>> design.
>
> I'm not seeing much in the documentation, but from what I can tell (per
> the FAQ), shared in D just guarantees it's on the global heap?

To me, shared is a type modifier that doesn't implicitely convert to anything else without casting.
If a global/static variable is shared, it is placed into the data segment of the executable image, not TLS. The heap is not involved here.


> -Wyatt
>
> [0]
> https://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/local-gc.pdf
>
May 13, 2014
On 05/13/2014 09:07 PM, Jacob Carlborg wrote:
> On 2014-05-13 15:56, Dicebot wrote:
>
>> Judging by http://static.rust-lang.org/doc/0.6/tutorial-macros.html
>> those are not full-blown AST macros like ones you have been proposing,
>> more like hygienic version of C macros.
>
> Hmm, I haven't looked at Rust macros that much.
>

Again, the following is an example of Rust macros in action. A bf program is compiled to Rust code at compile time.

https://github.com/huonw/brainfuck_macro/blob/master/lib.rs

Compile-time computations create an AST which is then spliced. Seems full-blown enough to me and not at all like C macros.
May 14, 2014
On Tuesday, 13 May 2014 at 21:16:55 UTC, Timon Gehr wrote:
> On 05/13/2014 09:07 PM, Jacob Carlborg wrote:
>> On 2014-05-13 15:56, Dicebot wrote:
>>
>>> Judging by http://static.rust-lang.org/doc/0.6/tutorial-macros.html
>>> those are not full-blown AST macros like ones you have been proposing,
>>> more like hygienic version of C macros.
>>
>> Hmm, I haven't looked at Rust macros that much.
>>
>
> Again, the following is an example of Rust macros in action. A bf program is compiled to Rust code at compile time.
>
> https://github.com/huonw/brainfuck_macro/blob/master/lib.rs
>
> Compile-time computations create an AST which is then spliced. Seems full-blown enough to me and not at all like C macros.

I really don't know, but understanding is that there are two types of macros in Rust. There's the simple "hygienic macro" one, which has some documentation: http://static.rust-lang.org/doc/0.10/guide-macros.html. Then there's the other, newer kind of macros (they may be called "procedural macros"... or not) and those are basically undocumented at this point. My understanding is that the newer kind of macros can do literally anything, like order you a pizza at compile-time.
May 14, 2014
On Tuesday, 13 May 2014 at 19:50:52 UTC, Rainer Schuetze wrote:
>
>
> On 13.05.2014 13:09, "Marc Schütz" <schuetzm@gmx.net>" wrote:
>> On Tuesday, 13 May 2014 at 06:06:40 UTC, Rainer Schuetze wrote:
>>>
>>>
>>> On 12.05.2014 13:53, "Marc Schütz" <schuetzm@gmx.net>" wrote:
>>>>
>>>> I'm surprised that you didn't include:
>>>>
>>>> 3. Thread-local GC, isolated zones (restricting where references to
>>>> objects of a particular heap can be placed), exempting certain threads
>>>> from GC completely, ...
>>>
>>> This comes up from time to time, but to me it is very blurry how this
>>> can work in reality.
>>>
>>> Considering how "shared" is supposed to be used to be useful (do some
>>> locking, then cast away "shared") there is no guarantee by the
>>> language that any object is actually thread local (no references from
>>> other threads). Working with immutable (e.g. strings) is shared by
>>> design.
>>
>> Yes, but only a part of the data is shared. I suspect the majority of
>> the data in typical programs will be thread-local. If you use a message
>> passing model, you can improve that even further (though it requires a
>> way to move an object to another thread's heap). This way, you can - in
>> the best case - avoid the shared heap completely.
>
> Possible, but this is with a language without shared data (including immutable), not D. Safely transferring a large object tree from one thread to another will be very expensive. If you try to optimize that, the D compiler won't help you to guarantee memory safety.

It certainly requires a few modifications to the language. That's why I think the current discussions about uniqueness, scope, isolated are so important, because it would fit together very well with memory management concepts like ARC, isolated heaps, and safety. For example, if the uniqueness concept that Walter recently added to DMD were not only temporary (only used for implicit conversion to immutable, AFAIU), but a real, permanent type modifier, transferring such a tree can be made safe at practically no runtime costs, apart from notifying the GC about the changed ownership. (No copying needs to happen, because the important thing for thread-local heaps is actually just in which thread references to objects can be stored, not necessarily that the actual objects are placed into physically distinct heaps.)

>
> Actually I don't have much experience with "shared" (I usually use __gshared if I need a shared global). Maybe I understand the idea better if you show what happens with this code when run with thread local GC:
>
> class C { C next; }
>
> shared(C) list;
>
> C newC()
> {
> 	return new C; // allocated on the local heap?

Yes, because it doesn't say "new shared(C)".

> }	
>
> void prependToList(C c)
> {
> 	synchronized(list)
> 	{
> 		C lst = cast(C) list;
> 		c.next = lst;  // anything special happening here?
> 		list = cast(shared(C)) c;  // and here?

The casts are a problem. You're basically lying to the compiler, so this cannot work. Instead, it should be something like:

    auto newC() {
        return new isolated(C);
    }

    void prependToList(isolated(C) c) {
        synchronized(list) {
            // transfer to shared heap
            // afterwards, c is no longer usable ("consumed")
            shared(C) tmp = c.makeShared();
            tmp.next = list;
            list = tmp;
        }
    }

This uses the "isolated" concept suggested by deadalnix here:
http://forum.dlang.org/thread/yiwcgyfzfbkzcavuqdwz@forum.dlang.org?page=1

> 	}
> }
>
> void callMeFromMultipleThreads()
> {
> 	prependToList(newC());
> }

May 14, 2014
Walter Bright:

> On 5/12/2014 10:25 AM, bearophile wrote:
>> Walter Bright:
>>
>>> Unions of pointers are so rare in actual code that treating them
>>> conservatively is not a big problem.
>>
>> std.variant.Algebraic is based on on std.variant.VariantN, and on
>> std.variant.VariantN is based on an union, and often you use algebraic data
>> types to represent trees and similar data structures that contain many
>> references/pointers. Adding Adding an onGC() method to std.variant.VariantN you
>> allow the GC to manage Algebraic well enough.
>
> BTW, the RTinfo can be used to discriminate unions.

Do I have to open a Bugzilla entry asking for VariantN to use such RTinfo to allow a more precise tracing of pointer-heavy Algebraics? Suggestions are welcome.

Bye,
bearophile
May 14, 2014
On Tuesday, 13 May 2014 at 21:16:55 UTC, Timon Gehr wrote:
> On 05/13/2014 09:07 PM, Jacob Carlborg wrote:
>> On 2014-05-13 15:56, Dicebot wrote:
>>
>>> Judging by http://static.rust-lang.org/doc/0.6/tutorial-macros.html
>>> those are not full-blown AST macros like ones you have been proposing,
>>> more like hygienic version of C macros.
>>
>> Hmm, I haven't looked at Rust macros that much.
>>
>
> Again, the following is an example of Rust macros in action. A bf program is compiled to Rust code at compile time.
>
> https://github.com/huonw/brainfuck_macro/blob/master/lib.rs
>
> Compile-time computations create an AST which is then spliced. Seems full-blown enough to me and not at all like C macros.

Comments look promising but unfortunately I can't understand a single line of actual code :) Mine rust-fu is too weak.
May 14, 2014
On Tuesday, 13 May 2014 at 19:56:20 UTC, Rainer Schuetze wrote:
>
> I just read the first chapters, and according to that, existing local gcs needs write barriers, so we are back to my second proposal. The implementation in the paper even adds read barriers.
>
At this point, I suspect write barriers are simply a requirement for any modern scalable GC.  At the very least, the algorithms they enable are important if D is really committed to the pluggable GC concept.  (Given the ongoing discussion on performance tradeoffs of various GC methods and the needs of different groups, I'd even suggest it's a really important feature.)

> To me, shared is a type modifier that doesn't implicitely convert to anything else without casting.

Interesting, maybe this should be clarified better.  Having skimmed back over chapter 13 of TDPL, my understanding of its semantics are that it only really enforces atomicity and execution order.  Also, this bit from near the beginning of 13.12 states:
"For all numeric types and function pointers, shared-qualified values are convertible implicitly to and from unqualified values."

That sounds kind of at-odds with your interpretation...? :/

-Wyatt