June 22, 2013
On 6/22/2013 2:22 AM, Rainer Schuetze wrote:
>
> Don't want to split hairs, but it is the size of 8 64-bit-pointers for a page.

Yeah, you're right! I'm properly chastised.

_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

June 22, 2013
On Jun 22, 2013, at 2:49 AM, Rainer Schuetze <r.sagitario@gmx.de> wrote:
> 
> 
>> 
>> This can be a problem if you have two threads looking at the block at the same time.  One can get the block info, release the GC lock, then the other could extend the block.  By the time the first thread comes back to look at the "end" of the block (which is checked while holding a different lock), the block info is no longer valid, and it would look at the wrong place.  I think for unshared blocks, there would be no problem, but block can become shared/unshared via a cast, and that would cause problems.
> 
> I was a bit surprised to find the special casing for shared/non-shared arrays in the array appending code, given the fact that the absence of "shared" does not guarantee it is not shared. I'm not sure whether we have to still guarantee memory safety in the case of undeclared sharing, but I'd be a bit more comfortable if we could (if it doesn't involve a global lock). I don't have a better solution, though.

The only way the absence of "shared" allows shared data is with __gshared fields or casting.  This is a "you better know what you are doing" storage class, and so is not handled properly in the runtime.  Note that the reason appending is palatable is due to the thread-local storage cache for the last 8 appended-to arrays.  If this wasn't present, the update to the runtime that prevented stomping would be less performant than before.

The shared version attempts to do the best it can, and sacrifices performance for correctness (the obvious choice).

However, __gshared is not detectable, so you should never try to append to a __gshared array from more than one thread.  If you did, the threads would use their thread-local caches, and have possibly different cached typeinfo values.  Not to mention the race condition for writing the 'used' size :)

If you cast away shared, but don't ensure the data becomes actually unshared, you are in bad territory too.

On Jun 22, 2013, at 5:22 AM, Rainer Schuetze <r.sagitario@gmx.de> wrote:

> On 22.06.2013 09:11, Walter Bright wrote:
>> I think it should be stored separately. Storing it with the allocation is inefficient, as (for example) only 2 bits are needed for a 16 byte alloc. Also, storing it with the allocation precludes "extending" an allocation in place if the next chunk is free.
> 
> I did not mean the bitmap here, but the alternative TypeInfo pointer. But you are right, extending becomes more complicated. Currently only page-sized allocation or larger are extendable, so if these allocations have the TypeInfo pointer at the beginning, that should work and would blend pretty well with the current array implementation (storing the length there aswell).

I agree, if you store simply a pointer to a typeinfo, you can use the first 16 bytes, just like the array appending does.  In fact, that makes things more uniform.

Note, you can have a mix of solutions -- for up to 16*sizeof(void*) bytes allocations, store bitmaps.   For everything else, use a typeinfo.  Both solutions could try off-page or in-block.  You can use the same philosophy as array append code -- for non-extendable GC blocks (i.e. < PAGE size) place at the end, for PAGE size and above, place at the front.

I would encourage an implementation of all options (if you are so inclined), and see which one, or which mix, performs better.  It may be an easy call at that point.

-Steve
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

June 22, 2013
On Saturday, June 22, 2013 18:48:46 Steven Schveighoffer wrote:
> If you cast away shared, but don't ensure the data becomes actually unshared, you are in bad territory too.

Well, it's likely to be common to construct stuff as thread-local and cast to shared for essentially the same reason that you have to do that with immutable (though it probably isn't required quite as much), so I would definitely expect code to be doing unshared -> shared.

And it's my understanding at this point that the pretty much the only way that shared is useable is that when you want to operate on it, you protect it with a synchronized block or mutex or whatnot and then cast away shared and do whatever operations you want to do on it, after which, it'll be shared again. If the runtime is doing stuff with thread-local caches or pools or anything which couldn't be used across threads once it's back to being shared again, then the whole lock and and cast away shared scheme won't work, in which case, shared becomes pretty much useless beyond the most basic of basic types and operations (like a shared bool used as a flag across threads).

There's also the issue of using std.concurrency to pass stuff across threads (which IIRC, doens't actually work with shared right now, but is supposed to), in which case, an object might not ever really be shared, but it does jump threads, so if something in the runtime expects something to live on the thread that it was created on just because it wasn't created as shared or is currently marked as shared, then it could be in for a rude surprise.

At this point, I'd be very nervous about anything in the runtime assuming that a non-shared object won't be used across threads. It should be able to assume that it's not _currently_ being used across threads (since it _is_ up to the programmer to ensure that they don't violate the type system by having multiple threads operate on a non-shared object at the same time), but in the general case, I would not expect it to be able to assume that it won't be used across threads in the future without being wrong.

- Jonathan M Davis
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

June 23, 2013
On 23.06.2013 04:07, Jonathan M Davis wrote:
> On Saturday, June 22, 2013 18:48:46 Steven Schveighoffer wrote:
>> If you cast away shared, but don't ensure the data becomes actually
>> unshared, you are in bad territory too.
>
[...]
> At this point, I'd be very nervous about anything in the runtime assuming that
> a non-shared object won't be used across threads. It should be able to assume
> that it's not _currently_ being used across threads (since it _is_ up to the
> programmer to ensure that they don't violate the type system by having
> multiple threads operate on a non-shared object at the same time), but in the
> general case, I would not expect it to be able to assume that it won't be used
> across threads in the future without being wrong.

I feel a little uneasy about that aswell, but I haven't really digged very deep into the implementation yet. Here's an example of a string shared between two threads, but no "shared" type around:

module test;
import std.stdio;
import core.thread;

immutable(string) hello;

shared static this()
{
	hello = "Hello";
	hello ~= ", ";
}

void thread1()
{
	string h = hello;
	h ~= " World";
	writeln("1: ", cast(void*) hello.ptr, " ", cast(void*) h.ptr, " ", h);
}

void thread2()
{
	string h = hello;
	h ~= " D";
	writeln("2: ", cast(void*) hello.ptr, " ", cast(void*) h.ptr, " ", h);
}

void main()
{
	auto t1 = new Thread(&thread1);
	auto t2 = new Thread(&thread2);
	t1.start();
	t2.start();
	t1.join();
	t2.join();
}

Output is:
1: 1E83FF0 1E83FF0 Hello,  World
2: 1E83FF0 1E83F80 Hello,  D

Which shows that the first thread appended in place, and the second reallocated. This is correct, but if both array append operations are executed concurrently, are the problems not the same as with "shared" strings?


_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

June 23, 2013
On 23.06.2013 00:48, Steven Schveighoffer wrote:
>
> Note, you can have a mix of solutions -- for up to 16*sizeof(void*)
> bytes allocations, store bitmaps.   For everything else, use a
> typeinfo.  Both solutions could try off-page or in-block.  You can
> use the same philosophy as array append code -- for non-extendable GC
> blocks (i.e. < PAGE size) place at the end, for PAGE size and above,
> place at the front.

I was thinking in the same direction, using bitmaps for anything < page size and typeinfo for larger blocks. The downside is that you have to implement both versions, and that you loose the flexibility of the bitmap for larger allocations. One way to solve this might be to switch to bitmaps once gc_emplace is used on large allocations.

>
> I would encourage an implementation of all options (if you are so
> inclined), and see which one, or which mix, performs better.  It may
> be an easy call at that point.
>

But probably not so easy to get to that point ;-) I'll see if I find the time to implement some options. Even if dynamic arrays fit well into the design, I still have to find a way to squeeze the memory layout of associative arrays into it.

_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

June 23, 2013
On 6/21/13 11:36 AM, Rainer Schuetze wrote:
> I want to make precise garbage collection as presented at the D
> conference ready for inclusion into druntime.

I'm all for this. One area we should be careful about is not introducing performance regressions.

Andrei
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

June 24, 2013
On Sat, Jun 22, 2013 at 12:11:17AM -0700, Walter Bright wrote:
> >d. The alternative to using a pointer bitmap alongside the pool memory is to store the TypeInfo pointer instead. It's major advantage is that it adds minimal performance overhead to the allocation.
> 
> But it'll subtract performance from the scanner. Which is better can only be determined with testing.

I once played with this option (storing a pointer to a TypeInfo) and had questionable results. Here are a couple of blog posts related to it: http://www.llucax.com.ar/blog/blog/post/098ceae8 http://www.llucax.com.ar/blog/blog/post/1490c03e

> >f. Currently, the precise GC can be versioned in/out, but I think
> >making it a runtime option is preferable. A problem here is that
> >the user has no way to change the default configuration before the
> >GC is initialized.
> >I could imagine always starting with the precise GC, but the user
> >can opt out anytime. The GC could then release any additional
> >memory it has allocated for precise scanning. If the user opts
> >back in, the data structures are rebuilt assuming everything
> >allocated so far as void[].
> 
> I don't think a runtime option is practical. It would be so "expert only" that those experts should be able to rebuild the library as required.

I'll repeat what I said in my talk. I think the "rebuild" option is an extremely bad idea. Here at sociomantic we have some applications that need the concurrent GC (almost all) and some that need the old one (basically because of the glibc mutex when multi-threading). It would be EXTREMELY inconvenient having to build 2 different standard libraries and having to switch between them while compiling. I think even having to recompile a standard library/runtime is completely "development mode", but doesn't work for "production mode" (as described by Don) at all! Is absurd to ask somebody to even recompile a runtime library, you should be able to use different GC options out of the box.

I used "initialization-time" configurability in CDGC and worked pretty well. It would be much more convenient to have a way to change GC options at source code level too, but I had the same problem, I couldn't find a way to do anything before the GC is initialized. I think being able to configure runtime options via environment variables is an extremely convenient option.

But please, please, please, don't make GC option (or any runtime option) only configurable at build-time. We already have the experience of "mem_stomp" and "sentinel", 2 EXTREMELY USEFUL debug helpers that ABSOLUTELY NOBODY KNOW OR USE.

I can port the configuration via environment variables to druntime ASAP if that's helpful.

> >Another option might be to make it a link time option, but that would mean the "standard" gc could not be part of the runtime library to be exchangeable.
> 
> Users really only want one gc.
                            ^
                      CONFIGURABLE

> >5. How do you prefer enabling/disabling precise garbage collection? Versioning/linking/runtime?
> 
> We should pick precise collection and commit to it. And then add Leandro's concurrent collector on top!

runtime, initialization-time and/or at source code level (I have the feeling that there should be a way :).

Please, don't make it a compile time option. A link-time option could be an option but I think is way too complicated.

-- 
Leandro Lucarella
Senior R&D Developer
Sociomantic Labs GmbH <http://www.sociomantic.com>
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

June 24, 2013
On Fri, Jun 21, 2013 at 08:36:23PM +0200, Rainer Schuetze wrote:
> f. Currently, the precise GC can be versioned in/out, but I think
> making it a runtime option is preferable. A problem here is that the
> user has no way to change the default configuration before the GC is
> initialized.
> I could imagine always starting with the precise GC, but the user
> can opt out anytime. The GC could then release any additional memory
> it has allocated for precise scanning. If the user opts back in, the
> data structures are rebuilt assuming everything allocated so far as
> void[].

This is the good thing about "initialization-time" configuration. You avoid all the complexity of changing the internal state of the GC when option changes, but still let the user start the application, without recompiling, using different options.

-- 
Leandro Lucarella
Senior R&D Developer
Sociomantic Labs GmbH <http://www.sociomantic.com>
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

June 24, 2013
On Jun 23, 2013, at 8:51 AM, Rainer Schuetze <r.sagitario@gmx.de> wrote:

> On 23.06.2013 04:07, Jonathan M Davis wrote:
>> On Saturday, June 22, 2013 18:48:46 Steven Schveighoffer wrote:
>>> If you cast away shared, but don't ensure the data becomes actually unshared, you are in bad territory too.
>> 
> [...]
>> At this point, I'd be very nervous about anything in the runtime assuming that a non-shared object won't be used across threads. It should be able to assume that it's not _currently_ being used across threads (since it _is_ up to the programmer to ensure that they don't violate the type system by having multiple threads operate on a non-shared object at the same time), but in the general case, I would not expect it to be able to assume that it won't be used across threads in the future without being wrong.
> 
> I feel a little uneasy about that aswell, but I haven't really digged very deep into the implementation yet. Here's an example of a string shared between two threads, but no "shared" type around:
> 
> [...]
> 
> Which shows that the first thread appended in place, and the second reallocated. This is correct, but if both array append operations are executed concurrently, are the problems not the same as with "shared" strings?

As you guessed, this is not handled correctly by the runtime, which only expects shared data to be shared.  It is a quirk in the fact that array blocks with immutable data are not immutable where the data hasn't yet been written.  In other words, the runtime allows modification of the block via an immutable pointer.  It works fine for thread-local immutable data, but not for shared immutable data.

I don't have a good answer for this.  Your code will eventually result in a race if you ran it enough times.  At this point, I think I have to make immutable appends be like shared appends, not using the cache, and taking a lock.  The only other option I can think of is to make immutable appends always copy.  Neither of these solutions is attractive, and would hurt performance severely on code that doesn't share immutable data (the most common case).

Even worse, const array data could be immutable, could be shared, same problem.  So only mutable data would be able to be appended to without performance penalties.

Here is a possibility for rectifying: any time you share immutable data by reference, the compiler generates a call to some 'makeImmutableArrayShared' function with the pointer, and that removes the block's APPENDABLE bit (if part of the GC).  That would make any immutable shared arrays truly immutable, and avoid this problem.

Thoughts?  Is it possible to intercept the moment immutable data becomes shared?

-Steve
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

June 24, 2013
On 24.06.2013 16:40, Steven Schveighoffer wrote:
>
> Here is a possibility for rectifying: any time you share immutable
> data by reference, the compiler generates a call to some
> 'makeImmutableArrayShared' function with the pointer, and that
> removes the block's APPENDABLE bit (if part of the GC).  That would
> make any immutable shared arrays truly immutable, and avoid this
> problem.
>
> Thoughts?  Is it possible to intercept the moment immutable data
> becomes shared?

I think you can make the length update lock- and race-free with a single CAS operation on the length, that pretty much does what is currently done in __setArrayAllocLength. You don't even have to care for problems like ABA, as the size always increases.

The cost of adding this to the non-shared code path aswell (or better not making the distinction at all) is propably more than offset by removing the tests for sharedness.


_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime