Thread overview
D may disappoint in the presence of an alien Garbage Collector?
Jul 28, 2014
Carl Sturtivant
Jul 28, 2014
Anton
Jul 30, 2014
Carl Sturtivant
Jul 28, 2014
Rene Zwanenburg
Jul 30, 2014
Carl Sturtivant
Jul 29, 2014
Kagamin
Jul 29, 2014
Kagamin
July 28, 2014
Suppose I want to use D as a system programming language to work with a library of functions written in another language, operating on dynamically typed data that has its own garbage collector, such as an algebra system or the virtual machine of a dynamically typed scripting language viewed as a library of operations on its own data type. For concreteness, suppose the library is written in C. (More generally, the data need not restricted to the kind above, but for concreteness, make that supposition.)

Data in such a system is usually a (possibly elaborate) tagged union, that is essentially a struct consisting of (say) two words, the first indicating the type and perhaps containing some bits that indicate other attributes, and the second containing the data, which may be held directly or indirectly. Call this a Descriptor.

Descriptors are small, so it's natural to want them held by value and not allocated on the heap (either D's or the library's) unless they are a part of a bigger structure that naturally resides there. And it's natural to want them to behave like values when passed as parameters or assigned. This usually fits in with the sort of heterogeneous copy semantics of such a library, where some of the dynamic types are implicitly reference types and others are not.

The trouble is that the library's alien GC needs to be made aware of each Descriptor when it appears and when it disappears, so that a call of a library function that allocates storage doesn't trigger a garbage collection that vacuums up library allocated storage that a D Descriptor points to, or fails to adjust a pointer inside a D descriptor when it moves the corresponding data, or worse, follows a garbage pointer from an invalid D Descriptor that's gone out of scope. This requirement applies to local variables, parameters and temporaries, as well as to other situations, like D arrays of Descriptors that are D-heap allocated. Ignore the latter kind of occasion for now.

Abstract the process of informing the GC of a Descriptor's existence as a Protect operation, and that it will be out of scope as an Unprotect operation. Protect and Unprotect naturally need the address of the storage holding the relevant Descriptor.

In a nutshell, the natural requirement when interfacing to such a library is to add Descriptor as a new value type in D along the lines described above, with a definition such that Protect and Unprotect operations are compiled to be performed automatically at the appropriate junctures so that the user of the library can forget about garbage collection to the usual extent.

How can this requirement be fulfilled?
July 28, 2014
On Monday, 28 July 2014 at 19:57:38 UTC, Carl Sturtivant wrote:
> Suppose I want to use D as a system programming language to work with a library of functions written in another language, operating on dynamically typed data that has its own garbage collector, such as an algebra system or the virtual machine of a dynamically typed scripting language viewed as a library of operations on its own data type. For concreteness, suppose the library is written in C. (More generally, the data need not restricted to the kind above, but for concreteness, make that supposition.)
>
> Data in such a system is usually a (possibly elaborate) tagged union, that is essentially a struct consisting of (say) two words, the first indicating the type and perhaps containing some bits that indicate other attributes, and the second containing the data, which may be held directly or indirectly. Call this a Descriptor.
>
> Descriptors are small, so it's natural to want them held by value and not allocated on the heap (either D's or the library's) unless they are a part of a bigger structure that naturally resides there. And it's natural to want them to behave like values when passed as parameters or assigned. This usually fits in with the sort of heterogeneous copy semantics of such a library, where some of the dynamic types are implicitly reference types and others are not.
>
> The trouble is that the library's alien GC needs to be made aware of each Descriptor when it appears and when it disappears, so that a call of a library function that allocates storage doesn't trigger a garbage collection that vacuums up library allocated storage that a D Descriptor points to, or fails to adjust a pointer inside a D descriptor when it moves the corresponding data, or worse, follows a garbage pointer from an invalid D Descriptor that's gone out of scope. This requirement applies to local variables, parameters and temporaries, as well as to other situations, like D arrays of Descriptors that are D-heap allocated. Ignore the latter kind of occasion for now.
>
> Abstract the process of informing the GC of a Descriptor's existence as a Protect operation, and that it will be out of scope as an Unprotect operation. Protect and Unprotect naturally need the address of the storage holding the relevant Descriptor.
>
> In a nutshell, the natural requirement when interfacing to such a library is to add Descriptor as a new value type in D along the lines described above, with a definition such that Protect and Unprotect operations are compiled to be performed automatically at the appropriate junctures so that the user of the library can forget about garbage collection to the usual extent.
>
> How can this requirement be fulfilled?

Suppose I want to do system programming...Would I choose the option with a GC ?
Just get off. The GC is just such a fagot. People are smart enough to manage memory.
July 28, 2014
On Monday, 28 July 2014 at 19:57:38 UTC, Carl Sturtivant wrote:
> Suppose I want to use D as a system programming language to work with a library of functions written in another language, operating on dynamically typed data that has its own garbage collector, such as an algebra system or the virtual machine of a dynamically typed scripting language viewed as a library of operations on its own data type. For concreteness, suppose the library is written in C. (More generally, the data need not restricted to the kind above, but for concreteness, make that supposition.)
>
> Data in such a system is usually a (possibly elaborate) tagged union, that is essentially a struct consisting of (say) two words, the first indicating the type and perhaps containing some bits that indicate other attributes, and the second containing the data, which may be held directly or indirectly. Call this a Descriptor.
>
> Descriptors are small, so it's natural to want them held by value and not allocated on the heap (either D's or the library's) unless they are a part of a bigger structure that naturally resides there. And it's natural to want them to behave like values when passed as parameters or assigned. This usually fits in with the sort of heterogeneous copy semantics of such a library, where some of the dynamic types are implicitly reference types and others are not.
>
> The trouble is that the library's alien GC needs to be made aware of each Descriptor when it appears and when it disappears, so that a call of a library function that allocates storage doesn't trigger a garbage collection that vacuums up library allocated storage that a D Descriptor points to, or fails to adjust a pointer inside a D descriptor when it moves the corresponding data, or worse, follows a garbage pointer from an invalid D Descriptor that's gone out of scope. This requirement applies to local variables, parameters and temporaries, as well as to other situations, like D arrays of Descriptors that are D-heap allocated. Ignore the latter kind of occasion for now.
>
> Abstract the process of informing the GC of a Descriptor's existence as a Protect operation, and that it will be out of scope as an Unprotect operation. Protect and Unprotect naturally need the address of the storage holding the relevant Descriptor.
>
> In a nutshell, the natural requirement when interfacing to such a library is to add Descriptor as a new value type in D along the lines described above, with a definition such that Protect and Unprotect operations are compiled to be performed automatically at the appropriate junctures so that the user of the library can forget about garbage collection to the usual extent.
>
> How can this requirement be fulfilled?

If I understand you correctly, an easy way is to use RefCounted with a simple wrapper. Something like this:

// Descriptor defined by the external library
struct DescriptorImpl
{
  size_t type;
  void* data;
}

// Tiny wrapper telling the alien GC of the existence of this reference
private struct DescriptorWrapper
{
  DescriptorImpl descriptor;
  alias descriptor this;

  @disable this();

  this(DescriptorImpl desc)
  {
    // Make alien GC aware of this reference
  }

 ~this()
  {
    // Make alien GC aware this reference is no longer valid
  }
}

// This is the type you will be working with on the D side
alias Descriptor = RefCounted!DescriptorWrapper;
July 29, 2014
Registering a descriptor with moving GC is not enough, you should also fix the pointer so that it's not moved.
July 29, 2014
The better way would be to interact through a COM interface, which would abstract tricks of the library code. Advanced environments are usually able to generate such interface.
July 30, 2014
On Monday, 28 July 2014 at 21:33:54 UTC, Rene Zwanenburg wrote:
>
> If I understand you correctly, an easy way is to use RefCounted with a simple wrapper. Something like this:
>
> // Descriptor defined by the external library
> struct DescriptorImpl
> {
>   size_t type;
>   void* data;
> }
>
> // Tiny wrapper telling the alien GC of the existence of this reference
> private struct DescriptorWrapper
> {
>   DescriptorImpl descriptor;
>   alias descriptor this;
>
>   @disable this();
>
>   this(DescriptorImpl desc)
>   {
>     // Make alien GC aware of this reference
>   }
>
>  ~this()
>   {
>     // Make alien GC aware this reference is no longer valid
>   }
> }
>
> // This is the type you will be working with on the D side
> alias Descriptor = RefCounted!DescriptorWrapper;

Just read RefCounted definition here,
http://dlang.org/phobos/std_typecons.html#.RefCounted
and it heap allocates its object, so your response above does not stack allocate the basic type that you call DescriptorWrapper, and is not a solution to the problem as stated.

If there was no alien GC, but everything else was the same, heap allocation of something containing a DescriptorImpl would be unnecessary. Now achieve the same with the alien GC present without an extra layer of indirection and heap allocation --- this is the essence of my question.

July 30, 2014
On Monday, 28 July 2014 at 20:52:01 UTC, Anton wrote:
> On Monday, 28 July 2014 at 19:57:38 UTC, Carl Sturtivant wrote:
>> Suppose I want to use D as a system programming language to work with a library of functions written in another language, operating on dynamically typed data that has its own garbage collector, such as an algebra system or the virtual machine of a dynamically typed scripting language viewed as a library of operations on its own data type. For concreteness, suppose the library is written in C. (More generally, the data need not restricted to the kind above, but for concreteness, make that supposition.)
>>
>> Data in such a system is usually a (possibly elaborate) tagged union, that is essentially a struct consisting of (say) two words, the first indicating the type and perhaps containing some bits that indicate other attributes, and the second containing the data, which may be held directly or indirectly. Call this a Descriptor.
>>
>> Descriptors are small, so it's natural to want them held by value and not allocated on the heap (either D's or the library's) unless they are a part of a bigger structure that naturally resides there. And it's natural to want them to behave like values when passed as parameters or assigned. This usually fits in with the sort of heterogeneous copy semantics of such a library, where some of the dynamic types are implicitly reference types and others are not.
>>
>> The trouble is that the library's alien GC needs to be made aware of each Descriptor when it appears and when it disappears, so that a call of a library function that allocates storage doesn't trigger a garbage collection that vacuums up library allocated storage that a D Descriptor points to, or fails to adjust a pointer inside a D descriptor when it moves the corresponding data, or worse, follows a garbage pointer from an invalid D Descriptor that's gone out of scope. This requirement applies to local variables, parameters and temporaries, as well as to other situations, like D arrays of Descriptors that are D-heap allocated. Ignore the latter kind of occasion for now.
>>
>> Abstract the process of informing the GC of a Descriptor's existence as a Protect operation, and that it will be out of scope as an Unprotect operation. Protect and Unprotect naturally need the address of the storage holding the relevant Descriptor.
>>
>> In a nutshell, the natural requirement when interfacing to such a library is to add Descriptor as a new value type in D along the lines described above, with a definition such that Protect and Unprotect operations are compiled to be performed automatically at the appropriate junctures so that the user of the library can forget about garbage collection to the usual extent.
>>
>> How can this requirement be fulfilled?
>
> Suppose I want to do system programming...Would I choose the option with a GC ?
> Just get off. The GC is just such a fagot. People are smart enough to manage memory.

It's the library to interface to that has its own GC, not my code. I just need to use D's system programming capabilities to work around the library's nasty GC so my data used by my calls to that library isn't trashed, and to do that efficiently and transparently. A system programming language should be able to efficiently interface to anything, right?