October 09, 2013
On Tue, 08 Oct 2013 17:43:45 +0200, ponce wrote:

> Yet with D the situation is different and I feel that criticism is way
> overblown:
> - first of all, few people will have problems with GC in D at all - then
> minimizing allocations can usually solve most of the problems - if it's
> still a problem, the GC can be completely disabled,
> relevant language features avoided, and there will be no GC pause - this
> work of avoiding allocations would happen anyway in a C++
> codebase - I happen to have a job with some hardcore optimized C++
> codebase and couldn't care less that a GC would run provided there is a
> way to minimize GC usage (and there is)

I thought I'd weigh in with my experience with D code that is in production.

Over the last couple of years, I've had exactly one bad experience with
D's garbage collection and it was really bad.  It was also mostly my
fault.  Our web-fronted API is powered by a pool of persistent worker
processes written in D.  This worker program had an infrequently-used
function that built an associative array for every row of data that it
processed.  I knew this was a bad idea when I wrote it, but it was a
tricky problem and using an AA was a quick and fairly intuitive way to
solve it--what's more, it worked just fine for months in production.  At
some point, however, a user inadvertently found a pathological case that
caused the function to thrash terribly--whenever we attached to the
process with GDB, we would almost invariably find it performing a full
garbage collection.  The process was still running, and would eventually
deliver a response, but only after being at 100% CPU for ten or twenty
minutes (as opposed to the <30s time expected).
The function has since been completely rewritten not only to avoid using
AAs, but with a much better algorithm from a time-complexity point of
view.  As a "customer" of D I'm a bit torn: should I be impressed by good
performance we usually got out of such a crappy bit of code or
disappointed by how terrible the performance became in the pathological
cases?

As a result of my experience with D over the past few years, I tend to
write code in two modes:
 - High level language mode: D as a more awesome Python/Ruby/etc.  Built-
in AAs are a godsend.  Doing `arr ~= element` is great!
 - Improved C: avoid heap allocations (and thus GC).  Looks like nice C
code.

Related to the latter, it would be really nice to be able to prove that a section of code makes no heap allocations/GC collections.  At the moment, I resort to all-cap comments and occasionally running with breakpoints set on the GC functions.

Justin
October 09, 2013
On Wednesday, 9 October 2013 at 20:10:40 UTC, Justin Whear wrote:
> Related to the latter, it would be really nice to be able to prove that a section of code makes no heap allocations/GC collections.

As a quick temporary thing, how about gc_throw_on_next(); ?

That'd just set a thread local flag that gc_malloc checks and if it is set, immediately resets it and throws an AllocAssertError. My thought is this could be quickly and easily implemented pending a better solution and in the mean time can be used in unit tests to help check this stuff.
October 09, 2013
On Wednesday, 9 October 2013 at 20:07:38 UTC, Craig Dillabaugh wrote:
> //Everything defined here is @safe pure nothrow
>
> attributes @system

You can do that now with colons:

@system:
stuff....

The problem is there's no way to turn some of them off. For @safe, there's @system, but there's no "throws" for nothrow, no "impure", no "virtual", etc.

This btw is another trivially easy addition that's been talked about for a while that should just be done for the next release.
October 09, 2013
On 2013-10-09 19:24:18 +0000, Walter Bright <newshound2@digitalmars.com> said:

> On 10/9/2013 10:05 AM, Manu wrote:
>> Supporting ARC in the compiler _is_ the job. That includes a cyclic-reference
>> solution.
> 
> Wholly replacing the GC with ARC has some fundamental issues:
> 
> 1. Array slicing becomes clumsy instead of fast & cheap.

I think you're exaggerating a bit.

If you slice, you know it's the same memory block, so you know it's using the same reference count, so the compiler can elide paired retain release calls just like it should be able to do for regular pointers, keeping it fast and cheap.

Example:

	int[] a = [1, 2]; // 1. retained on allocation
	// 2. scope(exit) release a
	int[] b = a[0..1]; // 3. retain on assignment
	// 4. scope(exit) release b
	return b; // 5. retain b for caller

Now, you know at line 3 that "b" is the same memory block as "a", so 2 and 3 cancels each other, and so do 4 and 5.

Result:

	int[] a = [1, 2]; // 1. retained on allocation
	int[] b = a[0..1];
	return b;

The assumption here is that int[] does not span over two memory blocks. Perhaps this is a problem for memory management @system code, but memory management @system code should be able to opt-out anyway, if only to be able to write the retain/release implementation. (In Objective-C you opt out using the __unsafe_unretained pointer attribute.)

That said, with function boundaries things are a little messier:

	int[] foo(int[] a) // a is implicitly retained by the caller
	                   // for the duration of the call
	{
		int[] b = a[0..1]; // 1. retain on assignment
		// 2. scope(exit) release b
		return b; // 3. retain b for caller
	}

Here, only 1 and 2 can be elided, resulting in one explicit call to retain:

	int[] foo(int[] a) // a is implicitly retained by the caller
	                   // for the duration of the call
	{
		int[] b = a[0..1];
		return b; // 3. retain b for caller
	}

But by inlining this trivial function, similar flow analysis in the caller should be able to elide that call to retain too.

So there is some overhead, but probably not as much as you think (and probably a little more than I think because of control flow and functions that can throw put in the middle of this are going to make it harder to elide redundant retain/release pairs). Remember that you have less GC overhead and possibly increased memory locality too (because memory is freed and reused sooner). I won't try to guess which is faster, it's probably going to differ depending on the benchmark anyway.


> 2. Functions can now accept new'd data, data from user allocations, static data, slices, etc., with aplomb. Wiring in ARC would likely make this unworkable.

That's no different from the GC having to ignore those pointers when it does a scan. Just check it was allocated within the reference counted allocator memory pool, and if so adjust the block's reference counter, else ignore. There's a small performance cost, but it's probably small compared to an atomic increment/decrement.

Objective-C too has some objects (NSString objects) in the static data segment. They also have "magic" hard-coded immutable value objects hiding the object's payload within the pointer itself on 64-bit processors. Calls to retain/release just get ignored for these.


> 3. I'm not convinced yet that ARC can guarantee memory safety. For example, where do weak pointers fit in with memory safety? (ARC uses user-annoted weak pointers to deal with cycles.)

Failure to use weak pointers creates cycles, but cycles are not unsafe. The worse that'll happen is memory exhaustion (but we could/should still have an optional GC available to collect cycles). If weak pointers are nulled automatically (as they should) then you'll never get a dangling pointer.

To use a weak pointer you first have to make a non-weak copy of it through a runtime call. You either get a null non-weak pointer or a non-null one if the object is still alive. Runtime stuff ensure that this works atomically with regard to the reference count falling to zero. No dangling pointer.


-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

October 09, 2013
On 10/9/2013 2:10 PM, Michel Fortin wrote:
> That's no different from the GC having to ignore those pointers when it does a
> scan. Just check it was allocated within the reference counted allocator memory
> pool, and if so adjust the block's reference counter, else ignore. There's a
> small performance cost, but it's probably small compared to an atomic
> increment/decrement.

When passing any dynamic array to a function, or most any assignment, the compiler must insert:

   (is pointer into gc) && (update ref count)

This is costly, because:

1. the gc pools may be fragmented, i.e. interleaved with malloc'd blocks, meaning an arbitrary number of checks for "is pointer into gc". I suspect on 64 bit machines one might be able to reserve in advance a large enough range of addresses to accommodate any realistic eventual gc size, making the check cost 3 instructions, but I don't know how portable such a scheme may be between operating systems.

2. the "update ref count" is likely a function call, which trashes the contents of many registers, leading to poor code performance even if that function is never called (because the compiler must assume it is called, and the registers trashed)

Considering that we are trying to appeal to the performance oriented community, these are serious drawbacks. Recall that array slicing performance has been a BIG WIN for several D users.
October 09, 2013
On 2013-10-09 19:40:45 +0000, Walter Bright <newshound2@digitalmars.com> said:

> On 10/9/2013 12:26 PM, Walter Bright wrote:
>> If it's got valuable information in it, please consider making it public here
>> (after getting permission from its participants).
> 
> Eh, I see that I was on that thread. Am in the process of getting permission.

It seems my emails can't reach you (I'd like to know why). You have my permission.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

October 09, 2013
On 2013-10-09 21:40:11 +0000, Walter Bright <newshound2@digitalmars.com> said:

> On 10/9/2013 2:10 PM, Michel Fortin wrote:
>> That's no different from the GC having to ignore those pointers when it does a
>> scan. Just check it was allocated within the reference counted allocator memory
>> pool, and if so adjust the block's reference counter, else ignore. There's a
>> small performance cost, but it's probably small compared to an atomic
>> increment/decrement.
> 
> When passing any dynamic array to a function, or most any assignment, the compiler must insert:
> 
>     (is pointer into gc) && (update ref count)
> 
> This is costly, because:
> 
> 1. the gc pools may be fragmented, i.e. interleaved with malloc'd blocks, meaning an arbitrary number of checks for "is pointer into gc". I suspect on 64 bit machines one might be able to reserve in advance a large enough range of addresses to accommodate any realistic eventual gc size, making the check cost 3 instructions, but I don't know how portable such a scheme may be between operating systems.

I know it is. The GC already pays that cost when it scans. We're just moving that cost elsewhere.


> 2. the "update ref count" is likely a function call, which trashes the contents of many registers, leading to poor code performance even if that function is never called (because the compiler must assume it is called, and the registers trashed)

In my opinion, the "is pointer into gc" check would be part of the functions. It wouldn't change things much because this is the most likely case and your registers are going to be trashed anyway (and it makes the code smaller at the call site, better for caching).

There's no question that assigning to a pointer will be slower. The interesting question how much of that lost performance do you get back later by not having the GC stop the world?


> Considering that we are trying to appeal to the performance oriented community, these are serious drawbacks. Recall that array slicing performance has been a BIG WIN for several D users.

Performance means different things for different people. Slicing performance is great, but GC pauses very bad in some cases. You can't choose to have one without having the other. It all depends on what you want to do.

In an ideal world, we'd be able to choose between using a GC or using ARC when building our program. A compiler flag could do the trick. But that becomes messy when libraries (static and dynamic) get involved as they all have to agree on the same codegen to work together. Adding something to mangling that would cause link errors in case of mismatch might be good enough to prevent accidents though.


-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

October 10, 2013
On Oct 9, 2013, at 9:42 AM, Jacob Carlborg <doob@me.com> wrote:
> 
>> On 2013-10-09 17:48, Sean Kelly wrote:
>> 
>> Okay so following that… it might be reasonable if the location of data keyed off the attribute set at construction.  So "new shared(X)" puts it in the shared pool.
> 
> I thought that was obvious. Is there a problem with that approach?

Only that this would have to be communicated to the user, since moving data later is problematic. Today, I think it's common to construct an object as unshared and then cast it.
October 10, 2013
On Wednesday, 9 October 2013 at 20:37:40 UTC, Adam D. Ruppe wrote:
> On Wednesday, 9 October 2013 at 20:07:38 UTC, Craig Dillabaugh wrote:
>> //Everything defined here is @safe pure nothrow
> The problem is there's no way to turn some of them off. For @safe, there's @system, but there's no "throws" for nothrow, no "impure", no "virtual", etc.

!@safe ?

> This btw is another trivially easy addition that's been talked about for a while that should just be done for the next release.

too much design, too little experiment?
October 10, 2013
On Wednesday, 9 October 2013 at 23:37:53 UTC, Michel Fortin wrote:
> In an ideal world, we'd be able to choose between using a GC or using ARC when building our program. A compiler flag could do the trick. But that becomes messy when libraries (static and dynamic) get involved as they all have to agree on the same codegen to work together. Adding something to mangling that would cause link errors in case of mismatch might be good enough to prevent accidents though.

ObjC guys used to think that. It turns out it is a really bad idea.