Thread overview
abi specs, multiple linkages, binary symbol information
Oct 18, 2004
Jakob Praher
Oct 19, 2004
David Friedman
Oct 19, 2004
Jakob Praher
Oct 20, 2004
David Friedman
October 18, 2004
hi David,
hi all,

I like the D language. Since I also play with gcj (the static gcc java
compiler), which has a new ABI (additional to the c++ linkage), I was
wondering about the default D ABI:

* how classes/modules/functions/methods are mangled
* which type codes exist
* is there a way to describe any type using a type code (which is
probably needed for method overloading )

* is there a way to specifiy versioning in the ABI
* since D has its own linkage (opposed to C++ linkage) I would
appreciate a less is more approach and a more stable ABI like that of C++


Yes I looked at DMD but I thought, it would be pleased to know there is
a written spec (the language reference is quite quiet about that).

What would be interesting is to support many different types of
ABIs/Linkages.
This could be done by "helping" the compiler to understand the ABI that
one is using.

And: As the language is specified today, is there a way to do a load
time linking?

I would be interested to link GCJ shared objects against D in a very
native form, so that one could use for instance the many java libs
already developed.

for instance

@gcj import org.apache.xalan...TransformerImpl;
@gcj import java.lang.String;

int main( char[][] args ) {
	TransformerImpl impl = new TransformerImpl( );
	....

}
....


Plus: In order for instance to export a D class to be linked with GCJ,
one clearly needs more meta information exposed in the object file, or
distilled from the D sources.

For me I'd favor the first approach, which would be interesting, since
one could link against a D object file without the need of the
corresponding D source code.

The metadata approach used by GCJ is very straigt forward:

* There is an UTF8 table
* There is a Method table for each method of a class
* There are some other tables for Class Descriptors ...
* The method table contains also all the referenced methods (not only
the ones defined)
* There is a Class table for each class (which contains links to the
other tables)

	- vtable  (the class's methods)
these tables are used for the java binary compatiblity stuff:
	- otable  (offset table for referenced objects by an offset)
	- atable  (address table for referenced objects via address)
	- itable  (interface table)


Surely the simplicity of the java type system allows for a simple
implementation of that. D would need some more meta information
(modules, functions, custom types. .... )

But it would be an interesting task, since then binary compatiblity in D
would be more stable. And the interoperabilty between the gcj project
and D could also be interesting.

looking forward to some discussions

-- Jakob
October 19, 2004
Jakob Praher wrote:
> hi David,
> hi all,
> 
> I like the D language. Since I also play with gcj (the static gcc java
> compiler), which has a new ABI (additional to the c++ linkage), I was
> wondering about the default D ABI:
> 
> * how classes/modules/functions/methods are mangled
> * which type codes exist
> * is there a way to describe any type using a type code (which is
> probably needed for method overloading )

There is no formal spec, so your best bet is to check out mangle.c and mtype.c in the front-end source. Basically, functions and variables have the form:

"_D" ~ <namespace mangle> ~ <type mangle>

<namepsace> is formed from the package, module, aggregates, etc. down to the declaration's identifier. The following would have the same namespace mangle:

  module a; class b { class c { int i; } }
  module a.b; class c { idouble i; }

<type> encodes the type of the declaration and may contain more namespace mangling if it involves classes, etc.

> 
> * is there a way to specifiy versioning in the ABI

I don't think there is a way to do this now.

> * since D has its own linkage (opposed to C++ linkage) I would
> appreciate a less is more approach and a more stable ABI like that of C++
> 
> 
> Yes I looked at DMD but I thought, it would be pleased to know there is
> a written spec (the language reference is quite quiet about that).
> 
> What would be interesting is to support many different types of
> ABIs/Linkages.
> This could be done by "helping" the compiler to understand the ABI that
> one is using.
> 
> And: As the language is specified today, is there a way to do a load
> time linking?
> 

Unix-style shared libraries are somewhat working now. There are still some initialization and linking issues. I can't really speak to the Windows platform.

> I would be interested to link GCJ shared objects against D in a very
> native form, so that one could use for instance the many java libs
> already developed.
> 
> for instance
> 
> @gcj import org.apache.xalan...TransformerImpl;
> @gcj import java.lang.String;
> 
> int main( char[][] args ) {
>     TransformerImpl impl = new TransformerImpl( );
>     ....
> 
> }
> ....
> 
>
> Plus: In order for instance to export a D class to be linked with GCJ,
> one clearly needs more meta information exposed in the object file, or
> distilled from the D sources.
>
> For me I'd favor the first approach, which would be interesting, since
> one could link against a D object file without the need of the
> corresponding D source code.
>

I have been thinking of doing something along these lines (with Objective C!)  In order to directly use another object ABI, it would be necessary to introduce a new basic type into D.  The capabilities of D and Java objects are similar, but they are not binary compatible.  Consider:

  Object o = someJavaObject;
  o.toString(); // Java String (another Object) or D char[] (a two-element struct) ?

If it was important to have Java object pose as D objects (and vice-versa), it would be necessary to use wrappers and/or glue code.  I think this could be mostly transparent.

There are still more issues like synchronization, and garbage collection that would need to be worked out.

An alternative method would be into implement D completely with the GCJ  ABI.  In this case, however, the ABI would have to be extended to D types like dynamic arrays and structs.

> The metadata approach used by GCJ is very straigt forward:
> 
> * There is an UTF8 table
> * There is a Method table for each method of a class
> * There are some other tables for Class Descriptors ...
> * The method table contains also all the referenced methods (not only
> the ones defined)
> * There is a Class table for each class (which contains links to the
> other tables)
> 
>     - vtable  (the class's methods)
> these tables are used for the java binary compatiblity stuff:
>     - otable  (offset table for referenced objects by an offset)
>     - atable  (address table for referenced objects via address)
>     - itable  (interface table)
> 
> 
> Surely the simplicity of the java type system allows for a simple
> implementation of that. D would need some more meta information
> (modules, functions, custom types. .... )
> 
> But it would be an interesting task, since then binary compatiblity in D
> would be more stable. And the interoperabilty between the gcj project
> and D could also be interesting.
> 

It really would be nice to have this kind of binary compatibility as D DLLs/shared libraries become more widespread.  The nice thing about it is that using the tables could be optional if you wanted to maximize performance.

David

> looking forward to some discussions
> 
> -- Jakob

October 19, 2004
David Friedman wrote:
> Jakob Praher wrote:
> 
>> hi David,
>> hi all,
>>
>> I like the D language. Since I also play with gcj (the static gcc java
>> compiler), which has a new ABI (additional to the c++ linkage), I was
>> wondering about the default D ABI:
>>
>> * how classes/modules/functions/methods are mangled
>> * which type codes exist
>> * is there a way to describe any type using a type code (which is
>> probably needed for method overloading )
> 
> 
> There is no formal spec, so your best bet is to check out mangle.c and mtype.c in the front-end source. Basically, functions and variables have the form:
> 
> "_D" ~ <namespace mangle> ~ <type mangle>

~ is a concatenation right?

> 
> <namepsace> is formed from the package, module, aggregates, etc. down to the declaration's identifier. The following would have the same namespace mangle:
> 
>   module a; class b { class c { int i; } }
>   module a.b; class c { idouble i; }
> 
> <type> encodes the type of the declaration and may contain more namespace mangling if it involves classes, etc.

ok. will look into that.
so you have
* packages
* modules
* classes

what is the difference between a package and a module?
	I have heard that modules can have initializers?
	Are they somewhat like static classes?

are packages every used now?


> 
>>
>> * is there a way to specifiy versioning in the ABI
> 
> 
> I don't think there is a way to do this now.

hmm. this is probably no that easy. but on the other hand one could do

> 
>> * since D has its own linkage (opposed to C++ linkage) I would
>> appreciate a less is more approach and a more stable ABI like that of C++
>>
>>
>> Yes I looked at DMD but I thought, it would be pleased to know there is
>> a written spec (the language reference is quite quiet about that).
>>
>> What would be interesting is to support many different types of
>> ABIs/Linkages.
>> This could be done by "helping" the compiler to understand the ABI that
>> one is using.
>>
>> And: As the language is specified today, is there a way to do a load
>> time linking?
>>
> 
> Unix-style shared libraries are somewhat working now. There are still some initialization and linking issues. I can't really speak to the Windows platform.
> 
will look at the compiler for that. but great work done sofar!

> I have been thinking of doing something along these lines (with Objective C!)  In order to directly use another object ABI, it would be necessary to introduce a new basic type into D.  The capabilities of D and Java objects are similar, but they are not binary compatible.  Consider:
> 
>   Object o = someJavaObject;
>   o.toString(); // Java String (another Object) or D char[] (a two-element struct) ?

> 
> If it was important to have Java object pose as D objects (and vice-versa), it would be necessary to use wrappers and/or glue code.  I think this could be mostly transparent.
> 
> There are still more issues like synchronization, and garbage collection that would need to be worked out.
> 
> An alternative method would be into implement D completely with the GCJ  ABI.  In this case, however, the ABI would have to be extended to D types like dynamic arrays and structs.
> 

What I'll probably do over the next (spare-)time is to define a concrete spec about the requirements of these table linkages and then perhaps we can settle on a mangling structure that is java-compatible, but also satisfies the D requirements. Objective C has also a type of mangling and structures are mangled using {<members>}, which could be a way to define them....

V	// void
I	// int
...
L...; 	// java class
[<type>;// java array

+----------------+
{<type><type>;// structure
*<type>;      // pointer or something like that
...

value types have to be mangled too... /perhaps using the structure information, this would make {I} a value type who is just a int32, and {*I;} a struct of a pointer to int32 etc.

we would of course need a away to introduce new built in types, which could be done using the _; stuff, for instance

_uint16; 	would mean unsigned int 16 or something like that

I also have looked into the .net stuff and found out that they use non-symbolic type information, but whether thats better is a question of taste ...

What would be also great is a pointer free representation of all the exported meta data of a d compilation unit.


for instance like a constant pool of items:


Item = { byte type, int size }

DCompilationUnit  = {
	byte    type;
	int	size;

	int	majorVersion
	int	minorVersion
	int	PackageRef
}	

PackageDesc = {
	
	byte 	type;
	int 	size;

	int 	moduleCount
	int	ModuleRefId
	..
}

ModuelDesc = {
	byte	type;
	int	size;

	int 	classCount
	int	varCount
}

ClassDesc = {
	byte 	type;
	int	size;
}

...
...

this coud be placed in a special section of the relocatable Object file,  for instance the .metadata section or something like that, which can be loaded read only (in the .text section and can be mmaped directly)

with that information we could use an extraction tool that uses these information and could pass this stuff to the compiler ...

Allthough I have a bit of ELF and linking knowledge I am by no means an expert. So if anyone has better ideas for laying out this stuff in Object files, please let me know.


Jakob
October 20, 2004
Jakob Praher wrote:
> David Friedman wrote:
> 
>> Jakob Praher wrote:
>>
[snip]
>>
>> There is no formal spec, so your best bet is to check out mangle.c and mtype.c in the front-end source. Basically, functions and variables have the form:
>>
>> "_D" ~ <namespace mangle> ~ <type mangle>
> 
> 
> ~ is a concatenation right?

Have to use '~' for concat on a D forum ;)

[snip]
> 
> ok. will look into that.
> so you have
> * packages
> * modules
> * classes
> 
> what is the difference between a package and a module?
>     I have heard that modules can have initializers?
>     Are they somewhat like static classes?
> 
> are packages every used now?
> 
> 

Packages are just the names/directories containing modules (e.g. std.c)

>>
>>>
>>> * is there a way to specifiy versioning in the ABI
>>
>>
>>
>> I don't think there is a way to do this now.
> 
> 
> hmm. this is probably no that easy. but on the other hand one could do
> 
>>
>>> * since D has its own linkage (opposed to C++ linkage) I would
>>> appreciate a less is more approach and a more stable ABI like that of C++
>>>
>>>
>>> Yes I looked at DMD but I thought, it would be pleased to know there is
>>> a written spec (the language reference is quite quiet about that).
>>>
>>> What would be interesting is to support many different types of
>>> ABIs/Linkages.
>>> This could be done by "helping" the compiler to understand the ABI that
>>> one is using.
>>>
>>> And: As the language is specified today, is there a way to do a load
>>> time linking?
>>>
>>
>> Unix-style shared libraries are somewhat working now. There are still some initialization and linking issues. I can't really speak to the Windows platform.
>>
> will look at the compiler for that. but great work done sofar!
> 
>> I have been thinking of doing something along these lines (with Objective C!)  In order to directly use another object ABI, it would be necessary to introduce a new basic type into D.  The capabilities of D and Java objects are similar, but they are not binary compatible.  Consider:
>>
>>   Object o = someJavaObject;
>>   o.toString(); // Java String (another Object) or D char[] (a two-element struct) ?
> 
> 
>>
>> If it was important to have Java object pose as D objects (and vice-versa), it would be necessary to use wrappers and/or glue code.  I think this could be mostly transparent.
>>
>> There are still more issues like synchronization, and garbage collection that would need to be worked out.
>>
>> An alternative method would be into implement D completely with the GCJ  ABI.  In this case, however, the ABI would have to be extended to D types like dynamic arrays and structs.
>>
> 
> What I'll probably do over the next (spare-)time is to define a concrete spec about the requirements of these table linkages and then perhaps we can settle on a mangling structure that is java-compatible, but also satisfies the D requirements. Objective C has also a type of mangling and structures are mangled using {<members>}, which could be a way to define them....
> 
> V    // void
> I    // int
> ...
> L...;     // java class
> [<type>;// java array
> 
> +----------------+
> {<type><type>;// structure
> *<type>;      // pointer or something like that
> ...
> 
> value types have to be mangled too... /perhaps using the structure information, this would make {I} a value type who is just a int32, and {*I;} a struct of a pointer to int32 etc.
> 
> we would of course need a away to introduce new built in types, which could be done using the _; stuff, for instance
> 
> _uint16;     would mean unsigned int 16 or something like that
> 
> I also have looked into the .net stuff and found out that they use non-symbolic type information, but whether thats better is a question of taste ...
> 
> What would be also great is a pointer free representation of all the exported meta data of a d compilation unit.
> 
> 
> for instance like a constant pool of items:
> 
> 
> Item = { byte type, int size }
> 
> DCompilationUnit  = {
>     byte    type;
>     int    size;
> 
>     int    majorVersion
>     int    minorVersion
>     int    PackageRef
> }   
> 
> PackageDesc = {
>         byte     type;
>     int     size;
> 
>     int     moduleCount
>     int    ModuleRefId
>     ..
> }
> 
> ModuelDesc = {
>     byte    type;
>     int    size;
> 
>     int     classCount
>     int    varCount
> }
> 
> ClassDesc = {
>     byte     type;
>     int    size;
> }
> 
> ...
> ...
> 
> this coud be placed in a special section of the relocatable Object file,  for instance the .metadata section or something like that, which can be loaded read only (in the .text section and can be mmaped directly)
> 
> with that information we could use an extraction tool that uses these information and could pass this stuff to the compiler ...
> 
> Allthough I have a bit of ELF and linking knowledge I am by no means an expert. So if anyone has better ideas for laying out this stuff in Object files, please let me know.
> 
> 
> Jakob