January 09, 2014
On 1/9/2014 12:40 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Thursday, 9 January 2014 at 07:07:29 UTC, Walter Bright wrote:
>> and it works without copying in D, it just returns s1. In C, I gotta copy,
>> ALWAYS.
>
> Only if you write libraries, in an application you can set your own policies
> (invariants).

Please explain how this can work passing both string literals and allocated strings to cat().


>> (C's strings being 0 terminated also forces much extra copying, but that's
>> another topic.)
>
> Not if you have your own allocator and split chopped strings (you can just
> overwrite the boundary character).

How do you return a string that is the path part of a path/filename? (The terminating 0 is not a problem solved by creating your own allocator.)

January 09, 2014
Am 09.01.2014 08:07, schrieb Walter Bright:
>
> The point is, no matter how slow the GC is relative to malloc, not
> allocating is faster than allocating, and a GC can greatly reduce the
> amount of alloc/copy going on.
>

The points should be, if D is going to stay with a GC, and if so, when we will actually get propper GC support so a state of the art GC can be implemented. Or if we are going to replace the GC with ARC.

This is a really important topic which shouldn't wait until the language is 20 years old. I'm already using D since almost 3 years, and the more I learn about Garbage Collectors and about D, the more obvious becomes that D does not properly support garbage collection and it will require quite some effort and spec changes to do so. And in all the time I used D nothing changed about the garbage collector. The only thing that happend was the RtInfo template in object.d. But it still isn't used and only solves a small portion of the percise scanning problem. In my opinion D was designed with language features in mind that need a GC, but D was not designed to actually support a GC. And this needs to change.

If requested I can make a list with all language features / decisions so far that prevent the implementation of a state of the art GC.

-- 
Kind Regards
Benjamin Thaut
January 09, 2014
On Thursday, 9 January 2014 at 10:14:08 UTC, Benjamin Thaut wrote:
> If requested I can make a list with all language features / decisions so far that prevent the implementation of a state of the art GC.

At least I am interested in your observations.
January 09, 2014
On Thursday, 9 January 2014 at 09:58:24 UTC, Walter Bright wrote:
> Please explain how this can work passing both string literals and allocated strings to cat().

By having your own string allocator that tests for membership when you free (if you allow free and foreign strings in your cat)?

> How do you return a string that is the path part of a path/filename? (The terminating 0 is not a problem solved by creating your own allocator.)

If you discard the original you split at '/'. If you use your own stringallocator you don't have to worry about free... You either let the garbage remain until the pool is released or have a separate allocation structure that allows internal splits (no private size info before first char).




January 09, 2014
On Thursday, 9 January 2014 at 09:55:42 UTC, Walter Bright wrote:
> A GC does not prevent such techniques.

No, but programmers gravitate towards less work... If alloc is transparent and free is hidden... You gain a lot from not being explicit, but you get more allocations overall.

January 09, 2014
On Thursday, 9 January 2014 at 10:14:08 UTC, Benjamin Thaut wrote:
> If requested I can make a list with all language features / decisions so far that prevent the implementation of a state of the art GC.

I am also interested in this, so that I can avoid those constructs.

I am in general in agreement with you. I think regular ownership combined with a segmented GC that only scan pointers to a signified GC type would not be such a big deal and could be a real bonus. With whole program analysis you could then reject a lot of the branches you otherwise have to follow and you would not have to stop threads that cannot touch those GC types. Of course, you would then avoid using generic pointers. So, you might not need an advanced GC, just partition the GC scan better.

Scanning stacks could be really fast if you know the call order of stack frames (and you have that opportunity with whole program analysis): e.g.: top frame is a(), but only b() and c() can call a() and b() and c() have same stack frame size and cannot hold pointers to GC object => skip over a() and b/c() in one go.

It doesn't matter much if the GC takes even 20% of your efficiency away, as long as it doesn't lock you down for more than 1-2 milliseconds: that's <4 million cycles for a single core. If you need 25 cycles per pointer you can scan <80.000 pointers per core. So if the search space can be partitioned in a way that makes that possible by not following all pointers, then the GC would be fine. 100.000 cache lines = 3.2MB which is not too horrible either.

I'd rather have 1000% less efficiency in the GC by having frequent GC calls than 400% more latency less frequently.
January 09, 2014
On Thursday, 9 January 2014 at 13:44:10 UTC, Ola Fosheim Grøstad wrote:
> On Thursday, 9 January 2014 at 10:14:08 UTC, Benjamin Thaut wrote:
>> If requested I can make a list with all language features / decisions so far that prevent the implementation of a state of the art GC.
>
> I am also interested in this, so that I can avoid those constructs.
>
> I am in general in agreement with you. I think regular ownership combined with a segmented GC that only scan pointers to a signified GC type would not be such a big deal and could be a real bonus. With whole program analysis you could then reject a lot of the branches you otherwise have to follow and you would not have to stop threads that cannot touch those GC types. Of course, you would then avoid using generic pointers. So, you might not need an advanced GC, just partition the GC scan better.
>
> Scanning stacks could be really fast if you know the call order of stack frames (and you have that opportunity with whole program analysis): e.g.: top frame is a(), but only b() and c() can call a() and b() and c() have same stack frame size and cannot hold pointers to GC object => skip over a() and b/c() in one go.
>
> It doesn't matter much if the GC takes even 20% of your efficiency away, as long as it doesn't lock you down for more than 1-2 milliseconds: that's <4 million cycles for a single core. If you need 25 cycles per pointer you can scan <80.000 pointers per core. So if the search space can be partitioned in a way that makes that possible by not following all pointers, then the GC would be fine. 100.000 cache lines = 3.2MB which is not too horrible either.
>
> I'd rather have 1000% less efficiency in the GC by having frequent GC calls than 400% more latency less frequently.

That could possibly be achieved with a generational parallel GC.


--
Paulo
January 09, 2014
On Thursday, 9 January 2014 at 13:51:09 UTC, Paulo Pinto wrote:
> That could possibly be achieved with a generational parallel GC.

Isn't the basic assumption in a generational GC that most free'd objects has a short life span and happened since the last collection?  Was there some assumption about the majority of inter-object pointers being within the same generation, too? So that you partition the objects in "train carts" and only have few pointers going between carts? I haven't looked at the original paper in a long time...

Anyway, if that is the assumption then it is generally not true for programs that are written for real time. Temporary objects are then allocated in pools or on the stack. Objects that are free'd tend to come from timers, events or because they have a lifespan (like enemies in a computer game).

I also dislike the idea of the GC locking cores down when it doesn't have to, so I don't think parallel is particularly useful. It will just put more pressure on the memory bus. I think it is sufficient to have a simple GC that only scans disjoint subsets (for that kind of application), so yes partitioned by type, or better: by reachability, but not by generation.

If the GC behaviour is predictable then the application can be designed to not trigger bad behaviour from the get go.
January 09, 2014
And, if it isn't in D already I would very much like to have a weak pointer type that will be set to null if the object is only pointed to by weak pointers.

It is a PITA to have objects die and get them out of a bunch of event-queues etc.
January 09, 2014
On Thursday, 9 January 2014 at 14:19:41 UTC, Ola Fosheim Grøstad wrote:
> On Thursday, 9 January 2014 at 13:51:09 UTC, Paulo Pinto wrote:
>> That could possibly be achieved with a generational parallel GC.
>
> Isn't the basic assumption in a generational GC that most free'd objects has a short life span and happened since the last collection?  Was there some assumption about the majority of inter-object pointers being within the same generation, too? So that you partition the objects in "train carts" and only have few pointers going between carts? I haven't looked at the original paper in a long time...

That was just a suggestion. There are plenty of incremental GC algorithms to choose from.

>
> Anyway, if that is the assumption then it is generally not true for programs that are written for real time. Temporary objects are then allocated in pools or on the stack. Objects that are free'd tend to come from timers, events or because they have a lifespan (like enemies in a computer game).

There are real time GCs controlling missile tracking systems.

Personally I find them a bit more real time than computer games.

On a game you might miss a few rendering frames, a GC induced
delay on a missile tracking system might turn out a bit ugly.

>
> I also dislike the idea of the GC locking cores down when it doesn't have to, so I don't think parallel is particularly useful. It will just put more pressure on the memory bus. I think it is sufficient to have a simple GC that only scans disjoint subsets (for that kind of application), so yes partitioned by type, or better: by reachability, but not by generation.
>
> If the GC behaviour is predictable then the application can be designed to not trigger bad behaviour from the get go.


Sure, the GC usage should not hinder the application's performance.

However, unless you target systems without an OS, you'll have anyway the OS making whatever it wants with the existing cores.

I never saw much control besides setting affinities.

--
Paulo