Thread overview
DMD Intermediate Representation
Aug 23, 2004
Andy Friesen
Aug 23, 2004
David Friedman
Aug 24, 2004
Owen Anderson
Aug 25, 2004
David Friedman
August 23, 2004
I've been looking at trying to hook the DMD frontend up to LLVM (www.llvm.org),
but I've been having
some trouble.  The LLVM IR (Intermediate Representation) is very well
documented, but I'm having a
rough time figuring out how DMD holds its IR.  Since at least three people
(David, Ben, and Walter) seem
to have understand, I thought I'd ask for guidance.

What's the best way to traverse the DMD IR once I've run the three semantic
phases?  As far as I can tell
it's all held in the SymbolTable as a bunch of Symbols.  Is there a good way to
traverse that and
reconstruct it into another IR?

-Owen


August 23, 2004
resistor AT mac DOT com wrote:
> I've been looking at trying to hook the DMD frontend up to LLVM (www.llvm.org),
> but I've been having some trouble.  The LLVM IR (Intermediate Representation) is very well
> documented, but I'm having a rough time figuring out how DMD holds its IR.  Since at least three people
> (David, Ben, and Walter) seem to have understand, I thought I'd ask for guidance.
> 
> What's the best way to traverse the DMD IR once I've run the three semantic
> phases?  As far as I can tell it's all held in the SymbolTable as a bunch of Symbols.  Is there a good way to
> traverse that and reconstruct it into another IR?

Have you checked out DLI?

It's very old and badly out of sync with the latest DMD frontend, but it does sport a working x86 backend.  You can still grab the last version from <http://opend.org>

 -- andy
August 23, 2004
There isn't a generic visitor interface.  Instead, there are several methods with are responsible for emiting code/data and then calling that method for child objects.  Start by implementing Module::genobjfile and loop over the 'members' array, calling each Dsymbol object's toObjFile method.  From there, you will need to implement these methods:

Dsymbol (and descendents) ::toObjFile -- Emits code and data for objects that have generally have a symbol name and storage in memory. Containers like ClassDeclaration also have a 'members' array with child Dsymbols.  Most of these are descendents of the Declaration class.

Statement (and descendents) ::toIR -- Emits instructions.  Usually, you just call toObjFile, toIR, toElem, etc. on the statement's fields and string  the results together in the IR.

Expression (and descendents) ::toElem -- Returns a back end representation of numeric constants, variable references, and operations that expression trees are composed of.  This was very simple for GCC because the back end already had the code to convert expression trees to ordered instructions.  If LLVM doesn't do this, I think you could generate the instructions here since LLVM has SSA.

Type (and descendents) ::toCtype -- Returns the back end representation of the type.  Note that a lot of classes don't override this -- you just need to do a switch on the 'ty' field in Type::toCtype.

Dsymbol (and descendents) ::toSymbol -- returns the back end reference to the object.  For example, FuncDeclaration::toSymbol could return a llvm::Function. These are already implemented in tocsym.c, but you will probably rewrite them to create LLVM objects.

David


resistor AT mac DOT com wrote:
> I've been looking at trying to hook the DMD frontend up to LLVM (www.llvm.org),
> but I've been having some trouble.  The LLVM IR (Intermediate Representation) is very well
> documented, but I'm having a rough time figuring out how DMD holds its IR.  Since at least three people
> (David, Ben, and Walter) seem to have understand, I thought I'd ask for guidance.
> 
> What's the best way to traverse the DMD IR once I've run the three semantic
> phases?  As far as I can tell it's all held in the SymbolTable as a bunch of Symbols.  Is there a good way to
> traverse that and reconstruct it into another IR?
> 
> -Owen
> 
> 

August 24, 2004
Awesome.  Thanks for all the help.  I'm starting out by removing all backend calls and replacing them with debug printf's so I can keep track of what needs replacing.

Question:  How does your GDC code still have these includes -

#include	"cc.h"
#include	"el.h"
#include	"oper.h"
#include	"global.h"
#include	"code.h"
#include	"type.h"
#include	"dt.h"

Mine complains about them not existing, so I assumed they were backend headers.

-Owen

David Friedman wrote:
> There isn't a generic visitor interface.  Instead, there are several methods with are responsible for emiting code/data and then calling that method for child objects.  Start by implementing Module::genobjfile and loop over the 'members' array, calling each Dsymbol object's toObjFile method.  From there, you will need to implement these methods:
> 
> Dsymbol (and descendents) ::toObjFile -- Emits code and data for objects that have generally have a symbol name and storage in memory. Containers like ClassDeclaration also have a 'members' array with child Dsymbols.  Most of these are descendents of the Declaration class.
> 
> Statement (and descendents) ::toIR -- Emits instructions.  Usually, you just call toObjFile, toIR, toElem, etc. on the statement's fields and string  the results together in the IR.
> 
> Expression (and descendents) ::toElem -- Returns a back end representation of numeric constants, variable references, and operations that expression trees are composed of.  This was very simple for GCC because the back end already had the code to convert expression trees to ordered instructions.  If LLVM doesn't do this, I think you could generate the instructions here since LLVM has SSA.
> 
> Type (and descendents) ::toCtype -- Returns the back end representation of the type.  Note that a lot of classes don't override this -- you just need to do a switch on the 'ty' field in Type::toCtype.
> 
> Dsymbol (and descendents) ::toSymbol -- returns the back end reference to the object.  For example, FuncDeclaration::toSymbol could return a llvm::Function. These are already implemented in tocsym.c, but you will probably rewrite them to create LLVM objects.
> 
> David
> 
> 
> resistor AT mac DOT com wrote:
> 
>> I've been looking at trying to hook the DMD frontend up to LLVM (www.llvm.org),
>> but I've been having some trouble.  The LLVM IR (Intermediate Representation) is very well
>> documented, but I'm having a rough time figuring out how DMD holds its IR.  Since at least three people
>> (David, Ben, and Walter) seem to have understand, I thought I'd ask for guidance.
>>
>> What's the best way to traverse the DMD IR once I've run the three semantic
>> phases?  As far as I can tell it's all held in the SymbolTable as a bunch of Symbols.  Is there a good way to
>> traverse that and reconstruct it into another IR?
>>
>> -Owen
>>
>>
> 
August 25, 2004
GDC doesn't use the original todt.c and tocsym.c, so it doesn't need those headers.  Instead of recreating the DMD back end types (elem, dt_t, etc.), I just typedef'd them to be GCC nodes (except for the Symbol struct.)

David

Owen Anderson wrote:
> Awesome.  Thanks for all the help.  I'm starting out by removing all backend calls and replacing them with debug printf's so I can keep track of what needs replacing.
> 
> Question:  How does your GDC code still have these includes -
> 
> #include    "cc.h"
> #include    "el.h"
> #include    "oper.h"
> #include    "global.h"
> #include    "code.h"
> #include    "type.h"
> #include    "dt.h"
> 
> Mine complains about them not existing, so I assumed they were backend headers.
> 
> -Owen
> 
> David Friedman wrote:
> 
>> There isn't a generic visitor interface.  Instead, there are several methods with are responsible for emiting code/data and then calling that method for child objects.  Start by implementing Module::genobjfile and loop over the 'members' array, calling each Dsymbol object's toObjFile method.  From there, you will need to implement these methods:
>>
>> Dsymbol (and descendents) ::toObjFile -- Emits code and data for objects that have generally have a symbol name and storage in memory. Containers like ClassDeclaration also have a 'members' array with child Dsymbols.  Most of these are descendents of the Declaration class.
>>
>> Statement (and descendents) ::toIR -- Emits instructions.  Usually, you just call toObjFile, toIR, toElem, etc. on the statement's fields and string  the results together in the IR.
>>
>> Expression (and descendents) ::toElem -- Returns a back end representation of numeric constants, variable references, and operations that expression trees are composed of.  This was very simple for GCC because the back end already had the code to convert expression trees to ordered instructions.  If LLVM doesn't do this, I think you could generate the instructions here since LLVM has SSA.
>>
>> Type (and descendents) ::toCtype -- Returns the back end representation of the type.  Note that a lot of classes don't override this -- you just need to do a switch on the 'ty' field in Type::toCtype.
>>
>> Dsymbol (and descendents) ::toSymbol -- returns the back end reference to the object.  For example, FuncDeclaration::toSymbol could return a llvm::Function. These are already implemented in tocsym.c, but you will probably rewrite them to create LLVM objects.
>>
>> David
>>
>>
>> resistor AT mac DOT com wrote:
>>
>>> I've been looking at trying to hook the DMD frontend up to LLVM (www.llvm.org),
>>> but I've been having some trouble.  The LLVM IR (Intermediate Representation) is very well
>>> documented, but I'm having a rough time figuring out how DMD holds its IR.  Since at least three people
>>> (David, Ben, and Walter) seem to have understand, I thought I'd ask for guidance.
>>>
>>> What's the best way to traverse the DMD IR once I've run the three semantic
>>> phases?  As far as I can tell it's all held in the SymbolTable as a bunch of Symbols.  Is there a good way to
>>> traverse that and reconstruct it into another IR?
>>>
>>> -Owen
>>>
>>>
>>