May 14, 2014
On Tuesday, 13 May 2014 at 17:53:10 UTC, Marc Schütz wrote:
> Currently it isn't, because the GC sometimes lacks type information, e.g. for dynamic arrays.

Will RC be guaranteed to always have type information? If it can, why GC can't? If it can't, what's the difference?

On Tuesday, 13 May 2014 at 18:07:42 UTC, Marc Schütz wrote:
> It's not (memory) unsafe because you cannot delete live objects accidentally, but it's "unsafe" because it leaks resources. Imagine a file object that relies on the destructor closing the file descriptor. You will quickly run out of FDs...

It's the same situation in .net, where GC doesn't guarantee calling finalizers of arbitrary classes in all scenarios, they have to be special classes like SafeHandle, and resource handles are usually implemented deriving from SafeHandle. Is it constructive to require D GC be better than .net GC?
May 14, 2014
On Wednesday, 14 May 2014 at 06:44:44 UTC, Kagamin wrote:
> On Tuesday, 13 May 2014 at 17:53:10 UTC, Marc Schütz wrote:
>> Currently it isn't, because the GC sometimes lacks type information, e.g. for dynamic arrays.
>
> Will RC be guaranteed to always have type information? If it can, why GC can't? If it can't, what's the difference?
>

RC is done by the object itself, so by definition it knows its own type, while the GC needs to be told about the type on allocation. AFAIK there is ongoing work to make this information available for non-class types.

> On Tuesday, 13 May 2014 at 18:07:42 UTC, Marc Schütz wrote:
>> It's not (memory) unsafe because you cannot delete live objects accidentally, but it's "unsafe" because it leaks resources. Imagine a file object that relies on the destructor closing the file descriptor. You will quickly run out of FDs...
>
> It's the same situation in .net, where GC doesn't guarantee calling finalizers of arbitrary classes in all scenarios, they have to be special classes like SafeHandle, and resource handles are usually implemented deriving from SafeHandle. Is it constructive to require D GC be better than .net GC?

Well, it cannot be made 100% reliable by principle. That's just an inherent property of tracing GCs. The question is, can we define which uses of destructors are "safe" in this sense and which ones are not, and ideally find ways to detect unsafe uses at compile time... That's very much in the spirit of D: Something that looks right, should be right. If it is not, it should be rejected by the compiler.
May 14, 2014
On Wednesday, 14 May 2014 at 09:39:01 UTC, Marc Schütz wrote:
> Well, it cannot be made 100% reliable by principle. That's just an inherent property of tracing GCs.

I don't think this is true. Why is this an inherent property of tracing GCs?
May 14, 2014
On Wednesday, 14 May 2014 at 10:00:29 UTC, Ola Fosheim Grøstad wrote:
> On Wednesday, 14 May 2014 at 09:39:01 UTC, Marc Schütz wrote:
>> Well, it cannot be made 100% reliable by principle. That's just an inherent property of tracing GCs.
>
> I don't think this is true. Why is this an inherent property of tracing GCs?

You're right, theoretically it's possible. I was only considering the situation with D:

- We have external code programmed in languages other than D, most prominently C and C++. These don't provide any type information, therefore the GC needs to handle their memory conservatively, which means there can be false pointers => no deterministic destruction.

- Variables on the stack and in registers. In theory, the compiler could generate that information, or existing debug information might be used, but that's complicated for the GC to handle and will probably have runtime costs. I guess it's unlikely to happen. And of course, when we call a C function, we're lost again.

- Untagged unions. The GC has no way to figure out which of the union fields is currently valid. If any of them is a pointer, it needs to treat them conservatively.

There are probably other things...
May 14, 2014
On Wednesday, 14 May 2014 at 19:45:20 UTC, Marc Schütz wrote:
> - We have external code programmed in languages other than D, most prominently C and C++. These don't provide any type information, therefore the GC needs to handle their memory conservatively, which means there can be false pointers => no deterministic destruction.

Oh yes, I agree.

However, you could have rules for collection and FFI (calling C). Like only allowing collection if all C parameters that point to GC memory have a shorter life span than other D pointers to the same memory (kind of like borrowed pointers in Rust).

> - Variables on the stack and in registers. In theory, the compiler could generate that information, or existing debug information might be used, but that's complicated for the GC to handle and will probably have runtime costs.

The easy solution is to use something that is to define safe zones where you can freeze  (kind of like rendezvous semaphores, but not quite).

> - Untagged unions. The GC has no way to figure out which of the union fields is currently valid. If any of them is a pointer, it needs to treat them conservatively.

So you need a function that can help the GC if the pointer fields of the union don't match up or don't point to class instances.

Ola.
May 15, 2014
On Wednesday, 14 May 2014 at 09:39:01 UTC, Marc Schütz wrote:
> RC is done by the object itself, so by definition it knows its own type, while the GC needs to be told about the type on allocation. AFAIK there is ongoing work to make this information available for non-class types.

If you can unify RC on binary level for any type, GC can use that unification too: when you allocate the object, you has its type and can setup necessary structures needed to call the destructor.

> Well, it cannot be made 100% reliable by principle. That's just an inherent property of tracing GCs. The question is, can we define which uses of destructors are "safe" in this sense and which ones are not, and ideally find ways to detect unsafe uses at compile time... That's very much in the spirit of D: Something that looks right, should be right. If it is not, it should be rejected by the compiler.

Does this suggest that if you slip a type with destructor into your code, it will force everything to be refcounted?
May 15, 2014
On Wednesday, 14 May 2014 at 19:45:20 UTC, Marc Schütz wrote:
> - We have external code programmed in languages other than D, most prominently C and C++. These don't provide any type information, therefore the GC needs to handle their memory conservatively, which means there can be false pointers => no deterministic destruction.

It's a very rare scenario to escape GC memory into foreign opaque data structures. Usually you don't see, where your pointer goes, as foreign API is usually completely opaque, so you have nothing to scan, even if you have a precise GC. Sometimes C API will notify your code, where it releases your data, in other cases you can store your data in a managed memory and release it after you release the foreign data structure.

> - Variables on the stack and in registers. In theory, the compiler could generate that information, or existing debug information might be used, but that's complicated for the GC to handle and will probably have runtime costs. I guess it's unlikely to happen. And of course, when we call a C function, we're lost again.

Precise GC is needed to implement moving GC, it's not needed to implement good memory management, at least on 64-bit architecture. On 32-bit architecture false pointers are possible, when you have lots of data without pointers on 32-bit architecture. It could be treated simply by allocating data without pointers (like strings) in not scanned blocks. The more valid pointers you have in your data, the smaller is probability of false pointers. The smaller is handle wrapper, the smaller is probability of a false pointer holding it. If you manage resources well, probability of handle leak goes even smaller (in C# it doesn't pose any notable difficulty even though you don't have any mechanism of eager resource management at all, only GC, in D you have non-zero opportunity for eager resource management). All these small probabilities multiply, and you get even smaller probability of an eventual resource leak.
May 15, 2014
On Thursday, 15 May 2014 at 07:27:41 UTC, Kagamin wrote:
> On Wednesday, 14 May 2014 at 09:39:01 UTC, Marc Schütz wrote:
>> RC is done by the object itself, so by definition it knows its own type, while the GC needs to be told about the type on allocation. AFAIK there is ongoing work to make this information available for non-class types.
>
> If you can unify RC on binary level for any type, GC can use that unification too: when you allocate the object, you has its type and can setup necessary structures needed to call the destructor.

Exactly.

>
>> Well, it cannot be made 100% reliable by principle. That's just an inherent property of tracing GCs. The question is, can we define which uses of destructors are "safe" in this sense and which ones are not, and ideally find ways to detect unsafe uses at compile time... That's very much in the spirit of D: Something that looks right, should be right. If it is not, it should be rejected by the compiler.
>
> Does this suggest that if you slip a type with destructor into your code, it will force everything to be refcounted?

Hmm... that's probably too strict. There are often non-critical resources that need to be released on destruction, like a hypothetical String class which owns it's data and which is itself allocated on the GC heap, because we don't need eager destruction for it. We'd want the data buffer to be released as soon the String object is destroyed. This buffer might even be allocated on the C heap, so we cannot rely on the garbage collector to clean it up later. Is that a job for a finalizer?
May 15, 2014
On Thursday, 15 May 2014 at 12:05:27 UTC, Kagamin wrote:
> On Wednesday, 14 May 2014 at 19:45:20 UTC, Marc Schütz wrote:
>> - We have external code programmed in languages other than D, most prominently C and C++. These don't provide any type information, therefore the GC needs to handle their memory conservatively, which means there can be false pointers => no deterministic destruction.
>
> It's a very rare scenario to escape GC memory into foreign opaque data structures. Usually you don't see, where your pointer goes, as foreign API is usually completely opaque, so you have nothing to scan, even if you have a precise GC. Sometimes C API will notify your code, where it releases your data, in other cases you can store your data in a managed memory and release it after you release the foreign data structure.
>

Fair point. But can this be made safer? Currently you don't get any warning if a GC pointer escapes into C land.

>> - Variables on the stack and in registers. In theory, the compiler could generate that information, or existing debug information might be used, but that's complicated for the GC to handle and will probably have runtime costs. I guess it's unlikely to happen. And of course, when we call a C function, we're lost again.
>
> Precise GC is needed to implement moving GC, it's not needed to implement good memory management, at least on 64-bit architecture. On 32-bit architecture false pointers are possible, when you have lots of data without pointers on 32-bit architecture. It could be treated simply by allocating data without pointers (like strings) in not scanned blocks. The more valid pointers you have in your data, the smaller is probability of false pointers. The smaller is handle wrapper, the smaller is probability of a false pointer holding it. If you manage resources well, probability of handle leak goes even smaller (in C# it doesn't pose any notable difficulty even though you don't have any mechanism of eager resource management at all, only GC, in D you have non-zero opportunity for eager resource management). All these small probabilities multiply, and you get even smaller probability of an eventual resource leak.

But as long as there can be false pointers, no matter how improbable, there can be no guaranteed destruction, which was my point. Maybe it becomes acceptable at very low probabilities, but it's still a gamble...
May 15, 2014
On Wednesday, 14 May 2014 at 20:02:08 UTC, Ola Fosheim Grøstad wrote:
> On Wednesday, 14 May 2014 at 19:45:20 UTC, Marc Schütz wrote:
>> - We have external code programmed in languages other than D, most prominently C and C++. These don't provide any type information, therefore the GC needs to handle their memory conservatively, which means there can be false pointers => no deterministic destruction.
>
> Oh yes, I agree.
>
> However, you could have rules for collection and FFI (calling C). Like only allowing collection if all C parameters that point to GC memory have a shorter life span than other D pointers to the same memory (kind of like borrowed pointers in Rust).

Some kind of lifetime annotation would be required for this. Not that this is a bad idea, but it will require some work...

>
>> - Variables on the stack and in registers. In theory, the compiler could generate that information, or existing debug information might be used, but that's complicated for the GC to handle and will probably have runtime costs.
>
> The easy solution is to use something that is to define safe zones where you can freeze  (kind of like rendezvous semaphores, but not quite).

This helps with getting the registers on the stack, but we still need type information for them.

>
>> - Untagged unions. The GC has no way to figure out which of the union fields is currently valid. If any of them is a pointer, it needs to treat them conservatively.
>
> So you need a function that can help the GC if the pointer fields of the union don't match up or don't point to class instances.

Which of course requires type information. And existing unions need to be updated to implement this function. I guess sometimes it might not even be possible to implement it, because the state information is not present in the union itself.