May 12, 2014
On 5/12/2014 7:46 AM, Steven Schveighoffer wrote:
> It doesn't matter where the false pointers are. The largest issue with false
> pointers is not how many false pointers there are. It only matters how large the
> block is that it "points" at. The larger your blocks get, the more likely they
> are "pointed" at by the stack. On 32-bit systems, allocate 1/256th of your
> memory space (i.e. 16.5MB), and the likelihood of random data on the stack
> pointing at it is roughly 1/256. This problem is just about eliminated with
> 64-bit pointers.

Generally, it is a bad idea to allocate such large blocks on the GC heap. GC's work best when the size of the objects being allocated is very small relative to the size of the heap space.

Fortunately, it's a mathematical inevitability that large allocations relative to the GC size are rare, and so it isn't much of a pain to handle them manually.


> And in fact, even if it's forbidden, "requires" is too strong a word -- there is
> no static or runtime prevention of this.

It's still forbidden. Andrei wrote a template that will verify this at runtime, but I don't recall its name.

May 12, 2014
On 13 May 2014 03:44, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 5/12/2014 10:31 AM, Manu via Digitalmars-d wrote:
>>
>> I just searched through my code, and 7 out of 12 unions had pointers.
>
>
> Relative number of objects with unions, not declarations with unions!

Ah, well I have 3 different tree/graph structures with unions, and tree/graph nodes have a tendency to accumulate many instances.
May 12, 2014
On Monday, 12 May 2014 at 16:03:28 UTC, Manu via Digitalmars-d
wrote:
> How long is a collect liable to take in the event the GC threads need
> to collect? Am I likely to lose my service threads for 100s of
> milliseconds at a time?
>
> I'll think on it, but I don't think there's anything practically
> applicable here, and it really sounds like it creates a lot more
> trouble and complexity than it addresses.


Your concerns stem not as much from the speed concern of the GC,
but from the freeze-the-world aspect of it. Would a concurrent
collector not solve these issues? As
http://msdn.microsoft.com/en-us/library/ee787088%28v=vs.110%29.aspx#concurrent_garbage_collection
explains a little bit, the actual time your threads spend frozen
should be little (but I admit I don't know exactly how little),
and so long as you don't allocate too much during the collection
itself (which you say you don't), you should be able to keep
running your code during the collection. If it's not possible to
implement concurrent collection in D (and it's already been shown
it is possible), then I'd agree that ARC is very important. But
depending on how little the stop-the-world time from a concurrent
GC can get, perhaps this could work around some issues that
you're desiring ARC for. A generational collector could help in
theory with your high memory usage situations. I doubt you
allocate a gigabyte each frame, so the actual generation 0
content should be fairly low. Much of your memory usage should be
allocations that will not be freed for long periods of time,
while the per-frame and other short allocations should be fast to
collect as there aren't many of them.

Depending on how tunable the GC is, I feel like it should be
possible to get away with a GC even for soft real-time programs
like games. The problem is it's hard to tell until we get a
proper concurrent collector in D2, just like it's hard to tell
how significant the impact of ARC is until we get an optimized
implementation of it in the compiler. Neither of these is simple.
I do quite like the idea of ARC, it's just something that someone
would have to actually implement (well) in order to see how much
of an impact it really has in D. For the truly low frequency
situations, you could get away with a library type for ARC as
well, and as you mentioned, for high frequency you would get
around ARC regardless.
May 12, 2014
On 5/12/14, 10:25 AM, bearophile wrote:
> Walter Bright:
>
>> Unions of pointers are so rare in actual code that treating them
>> conservatively is not a big problem.
>
> std.variant.Algebraic is based on on std.variant.VariantN, and on
> std.variant.VariantN is based on an union, and often you use algebraic
> data types to represent trees and similar data structures that contain
> many references/pointers. Adding Adding an onGC() method to
> std.variant.VariantN you allow the GC to manage Algebraic well enough.

I, too, felt the need of onGC() - actually preGC() - in my allocators implementation.

Specifically, a thread-local freelist would save a pointer to the root in thread-local storage (i.e. a traditional D global variable). That would thread through a number of free nodes available for allocation.

When a GC cycle occurs, it's okay if the list stays referenced; the GC will consider it "used" and won't do anything in particular about it. However, the GC cycle is a good opportunity to clean these freelists and offer the memory for other size classes, seeing as the freelists may grow unreasonably large and then just hold memory for no good reason.

A hook that nulls all freelist heads just as the collection process starts would be helpful.


Andrei

May 12, 2014
On Monday, 12 May 2014 at 17:52:18 UTC, Walter Bright wrote:
> On 5/12/2014 7:46 AM, Steven Schveighoffer wrote:
>> pointing at it is roughly 1/256. This problem is just about eliminated with
>> 64-bit pointers.

Not generally true. This presumes that the heap is not in the lower region of the address space and that you don't use 64 bit ints on the stack.

> Generally, it is a bad idea to allocate such large blocks on the GC heap. GC's work best when the size of the objects being allocated is very small relative to the size of the heap space.

Generally not true. This is a deficiency of not having a smart allocator / precise scanning that use available meta information properly (obtained statically or by profiling).

> Fortunately, it's a mathematical inevitability that large allocations relative to the GC size are rare, and so it isn't much of a pain to handle them manually.

Programmer pain is not measured in number of instances, but in terms of model complexity.

Ola.
May 12, 2014
12-May-2014 22:08, Andrei Alexandrescu пишет:
> On 5/12/14, 10:25 AM, bearophile wrote:
> A hook that nulls all freelist heads just as the collection process
> starts would be helpful.

One word - weak pointers. Then head of freelist is weak and can be collected at whim.

>
>
> Andrei
>


-- 
Dmitry Olshansky
May 12, 2014
On 5/12/14, 11:17 AM, Dmitry Olshansky wrote:
> 12-May-2014 22:08, Andrei Alexandrescu пишет:
>> On 5/12/14, 10:25 AM, bearophile wrote:
>> A hook that nulls all freelist heads just as the collection process
>> starts would be helpful.
>
> One word - weak pointers. Then head of freelist is weak and can be
> collected at whim.

Of course. My point here is that here you need simpler support than full-blown weak pointers. -- Andrei

May 12, 2014
On 13 May 2014 04:07, Kapps via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Monday, 12 May 2014 at 16:03:28 UTC, Manu via Digitalmars-d wrote:
>>
>> How long is a collect liable to take in the event the GC threads need
>>
>> to collect? Am I likely to lose my service threads for 100s of milliseconds at a time?
>>
>> I'll think on it, but I don't think there's anything practically applicable here, and it really sounds like it creates a lot more trouble and complexity than it addresses.
>
>
>
> Your concerns stem not as much from the speed concern of the GC, but from the freeze-the-world aspect of it. Would a concurrent collector not solve these issues?

I originally thought it would... but the more I think on it, I don't
think it would make an awful lot of difference in practise.
If the stalls were 'short' (like 1-5ms on the background threads,
500µs-1ms on the realtime threads), then maybe it would be workable,
but I don't know that it would be even close to that?

Also, I think it would be very difficult to implement on a machine without virtual memory, or much of an operating system in general?

The problem remains that with no free memory, frequency of collection becomes so high, that it's extremely unlikely full collection so often would be better than ARC.

> As http://msdn.microsoft.com/en-us/library/ee787088%28v=vs.110%29.aspx#concurrent_garbage_collection
> explains a little bit, the actual time your threads spend frozen
> should be little (but I admit I don't know exactly how little),
> and so long as you don't allocate too much during the collection
> itself (which you say you don't), you should be able to keep
> running your code during the collection. If it's not possible to
> implement concurrent collection in D (and it's already been shown
> it is possible), then I'd agree that ARC is very important. But
> depending on how little the stop-the-world time from a concurrent
> GC can get, perhaps this could work around some issues that
> you're desiring ARC for. A generational collector could help in
> theory with your high memory usage situations. I doubt you
> allocate a gigabyte each frame, so the actual generation 0
> content should be fairly low. Much of your memory usage should be
> allocations that will not be freed for long periods of time,
> while the per-frame and other short allocations should be fast to
> collect as there aren't many of them.

Yeah, it would probably be better, if it's possible.
Implementation needs to be considered from the perspective of embedded
systems with no OS or MMU, and as little as 64mb of ram (the smallest
modern systems).
Mid-range systems are 512mb and no MMU. 'next-gen' systems are
basically like little PC's with crappy OS's, so more likely a decent
GC is possible on a ps4/xbone... but very few have the luxury of
developing for just one system.

It occurred to me earlier that things like strings might enjoy their
own separate heap. And maybe some special logic for strings that
outlived their scope to be actively returned to their heap rather than
waiting for collection.
If the heap were successfully broken down into a suite of sub-heaps, I
have absolutely no idea how to make estimates about the performance of
this system, and if it would approach an acceptable level. I'm
skeptical it would, and it still won't decrease collection frequency.
But I'd be happy to be surprised.

> Depending on how tunable the GC is, I feel like it should be possible to get away with a GC even for soft real-time programs like games. The problem is it's hard to tell until we get a proper concurrent collector in D2, just like it's hard to tell how significant the impact of ARC is until we get an optimized implementation of it in the compiler. Neither of these is simple. I do quite like the idea of ARC, it's just something that someone would have to actually implement (well) in order to see how much of an impact it really has in D.

I understand the problem. The first hurdle is overcoming the hostility against it though. There is a severe prejudice.

> For the truly low frequency
> situations, you could get away with a library type for ARC as
> well, and as you mentioned, for high frequency you would get
> around ARC regardless.

Yup.

May 12, 2014
On Monday, 12 May 2014 at 18:07:51 UTC, Kapps wrote:
> Depending on how tunable the GC is, I feel like it should be
> possible to get away with a GC even for soft real-time programs
> like games.

Even if you manage to make it work for game clients you also should make it work for low latency game servers, as code sharing is an important advantage.

What a game/world server requires differs a lot, but highly dynamic and flexible worlds have to keep the physics to a single node (or tight cluster) for a region. That means you want to have as many players as possible tied to that node.

In essence you want both performance, low latency, reliability, and little overhead in an evolutionary context (it has to support heavy modification over time).

My gut feeling is that a runtime satisfying one game design will not satisfy another one as long as one insists on one global GC. In essence, it will never really work well. IMO, the same goes for ARC since RC does not perform well with multi-threading even when you use near optimal patterns and strategies. If ARC is only to be used where speed does not matter then you might as well use shared_ptr.
May 12, 2014
On 2014-05-12 19:14, Dicebot wrote:

> It lacks any good static reflection though. And this stuff is damn
> addictive when you try it of D caliber.

It has macros, that basically requires great support for static reflection to be usable.

-- 
/Jacob Carlborg