Thread overview
Writing Compilers in D
Aug 12, 2004
Kevin A
Aug 12, 2004
Deja Augustine
Aug 12, 2004
Stephan Wienczny
Aug 16, 2004
Sampsa Lehtonen
Aug 16, 2004
Ilya Minkov
Aug 16, 2004
Sampsa Lehtonen
Aug 16, 2004
Ilya Minkov
August 12, 2004
Hello!  I am an experienced compiler/interpreter writer and I have been considering using D instead of C/C++ as the implementation language.  I have a few questions that I am seeking answers to before I begin writing it:

- Has anyone else here written a compiler in D?
- Is D well-suited to compiler writing? And if so, what features of D are
particularly good for this?
- Is there a visual debugger available for D?  If not, is there any debugger?
How good is the debugger?
- Is there a good IDE for D?

Your help will be greatly appreciated.

Sincerely,
Kevin A


August 12, 2004
Kevin A wrote:
> Hello!  I am an experienced compiler/interpreter writer and I have been
> considering using D instead of C/C++ as the implementation language.  I have a
> few questions that I am seeking answers to before I begin writing it:
> 
> - Has anyone else here written a compiler in D?

I've written part of one.  I wrote a D preprocessor that parsed in the D code and did some rudimentary semantic analysis.  I was originally going to use that as the base for D.NET until I discovered that the front-end source was available.

> - Is D well-suited to compiler writing? And if so, what features of D are
> particularly good for this?

It's string handling is definately nicer than C++ as are the dynamic arrays.  Most of that can be done in C++ via the STL, however.

> - Is there a visual debugger available for D?  If not, is there any debugger?
> How good is the debugger?

As I understand it, you can use a variety of "3rd party" debuggers. I've never tried, though.  Contracts make it pretty easy to code without needing a separate debugger.

> - Is there a good IDE for D?

Check out the links page on the D site.

-Deja
August 12, 2004
Kevin A wrote:
> Hello!  I am an experienced compiler/interpreter writer and I have been
> considering using D instead of C/C++ as the implementation language.  I have a
> few questions that I am seeking answers to before I begin writing it:
> 
> - Has anyone else here written a compiler in D?
> - Is D well-suited to compiler writing? And if so, what features of D are
> particularly good for this?
> - Is there a visual debugger available for D?  If not, is there any debugger?
> How good is the debugger?
> - Is there a good IDE for D?
> 
> Your help will be greatly appreciated.
> 
> Sincerely,
> Kevin A
> 
> 

I'm trying to do such a thing. I can tell you my experience (so far).
D makes implementing something a little bit more easy than C/C++.
You have class definition near its implementation; no redundancy when writing a new function. Then you have got some advanced D features, like  dynamic arrays with slicing which makes parsers/lexers awful fast

IMHO D is more readable than C/C++ and can be a lot faster...

It should be possible to use a visual debugger. There is non in the package, but you should find one on the net.
There have been some affords to write a special IDE for D (dide, leds) and there is a config files for others (Eclipse, MS Visual Studio)

 Stephan
August 16, 2004
On Thu, 12 Aug 2004 21:04:00 +0200, Stephan Wienczny <Stephan@Wienczny.de> wrote:

> I'm trying to do such a thing. I can tell you my experience (so far).
> D makes implementing something a little bit more easy than C/C++.
> You have class definition near its implementation; no redundancy when writing a new function. Then you have got some advanced D features, like   dynamic arrays with slicing which makes parsers/lexers awful fast
>
> IMHO D is more readable than C/C++ and can be a lot faster...
>
> It should be possible to use a visual debugger. There is non in the package, but you should find one on the net.
> There have been some affords to write a special IDE for D (dide, leds) and there is a config files for others (Eclipse, MS Visual Studio)

I've considered making a compiler too, perhaps for D. I've made one for MiniJava which is a subset of Java. It produced native code (MIPS). Now I would like to try my skills on something more involved. C/C++ syntax seems too complex, and Java is a bit too abstract (it isn't meant for native code though such compilers exist).

Do you guys have any suggestions which tools to use? I've been thinking about making the compiler in Java, as it is easiest and fastest to code (using refactoring tools). I don't care about the compilation times at the moment, getting the compiler running and producing code is such a task on itself. I've used JavaCC and Antlr too, but is there better alternatives?

For industrial compiler I'd choose C++ as development language and x86 instruction set as output, but making cisc compilers is so much harder than risc compilers, so maybe I'll go with the MIPS here too.

My primary goal is to get my hands on different optimization techniques and to get familiar with complex flow- and data-analyses.

Btw, if anyone has pointers to some nice documents about OBJ-file structure and such, I'd be interested.

-texmex/sampsa lehtonen

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
August 16, 2004
Sampsa Lehtonen schrieb:
> I've considered making a compiler too, perhaps for D. I've made one for  MiniJava which is a subset of Java. It produced native code (MIPS). Now I  would like to try my skills on something more involved. C/C++ syntax seems  too complex, and Java is a bit too abstract (it isn't meant for native  code though such compilers exist).

Hum. D would be a large undertaking, though not as large as C++. C++ is a problem more semantically than syntactically, i seem to think.

> Do you guys have any suggestions which tools to use? I've been thinking  about making the compiler in Java, as it is easiest and fastest to code  (using refactoring tools). I don't care about the compilation times at the  moment, getting the compiler running and producing code is such a task on  itself. I've used JavaCC and Antlr too, but is there better alternatives?

Refractoring tools? whatdoyoumean?

I'm under the imression that ANTLR is the best, but my fav is COCO/R. There are complete ports of COCO/R for C++, Java, C# and Delphi, and there is a good chance i can get a D version up. Warning: Java and C# versions generate non-reentrant parsers.

I would not recommand writing a compiler in Java. So far, i have had a lot of fun writing my first real lexer, and i came to like the D array semantics and slicing, which are probably unique. That is, they can be easily emulated in C++, but they are not native in any language. The difference from, say, Python, is that you can have slices still refer to the original array where the data is stored. For example, one would load a text into a large buffer or memory-mapped file, and have a lexeme contain a slice into it, instead of a position and length. String semantics without overhead. Plus one would insert asserts here and there to make sure that such a slice keeps pointing into the loaded text.

> For industrial compiler I'd choose C++ as development language and x86  instruction set as output, but making cisc compilers is so much harder  than risc compilers, so maybe I'll go with the MIPS here too.

Non-processor targets might also be interesting... e.g. ANDF, the architecture- neutral distribution format. There are converters from ANDF to target code for different architectures.

> My primary goal is to get my hands on different optimization techniques  and to get familiar with complex flow- and data-analyses.

I had just chatted with MadMan/TAP (aka MadenMann) yesterday. Perhaps he would also be interested... He wanted to write a custom compiler for Sega Mega-Drive.

> Btw, if anyone has pointers to some nice documents about OBJ-file  structure and such, I'd be interested.

These are different. Digitalmars, Watcom, Borland: look for OMF. Microsoft and some others use COFF. Other operating systems use either some variant of COFF or some variant of ELF, or even something completely different. The format of object files need not necessarily be in correspondence with OS executable format, although i guess it makes linker's life easier.

-eye/PaC
August 16, 2004
On Mon, 16 Aug 2004 18:34:14 +0200, Ilya Minkov <minkov@cs.tum.edu> wrote:

> Hum. D would be a large undertaking, though not as large as C++. C++ is a problem more semantically than syntactically, i seem to think.

Well I plan to implement just a subset of D first. Leave the exceptions, templates, mixins for later, just to get the primitive things running. Probably the most gratifying thing in compiler construction is the moment when you actually get something compiled and it runs!

>> Do you guys have any suggestions which tools to use? I've been
> Refractoring tools? whatdoyoumean?

By tools I meant parser generators and such, and refactoring tools are a whole different story though they are related to code parsing. Refactoring means transformatinos on code that do not break the meaning of the code. Pretty fun stuff, really. There are plenty of different refactoring types, starting from simple variable renaming to super-class extraction etc. Basicly they are tools to aid programming by automating the tedious primitive tasks programmers do daily. Check out more at http://www.refactoring.com/

BTW, nice thing about D is that D programs can be parsed easily as there are no preprocessor. This makes refactoring possible, unlike on C++ where the preprocessor can really f*ck up things. Currently I'm waiting a D ide that would include refactoring tools ;)

> I'm under the imression that ANTLR is the best, but my fav is COCO/R. There are complete ports of COCO/R for C++, Java, C# and Delphi, and there is a good chance i can get a D version up. Warning: Java and C# versions generate non-reentrant parsers.

COCO/R, hmm, haven't heard of it. I'll check it out. But probably I'll go with JavaCC (or Antlr, as it generates parsers in C++ too).

> I would not recommand writing a compiler in Java. So far, i have had a lot of fun writing my first real lexer, and i came to like the D array semantics and slicing, which are probably unique. That is, they can be easily emulated in C++, but they are not native in any language. The difference from, say, Python, is that you can have slices still refer to the original array where the data is stored. For example, one would load a text into a large buffer or memory-mapped file, and have a lexeme contain a slice into it, instead of a position and length. String semantics without overhead. Plus one would insert asserts here and there to make sure that such a slice keeps pointing into the loaded text.

Umm, I don't quite follow you. After the lexer has tokenized a token and the parser has accepted it, the actual text comes unnecessary (unless it is an identifier). So why would I want to load the whole file into memory and have tokens pointing into that big piece of text?...
With lexer for an ide where parsing needs to be done constantly and on varying places it is a different story, I guess...

>> For industrial compiler I'd choose C++ as development language and x86  instruction set as output, but making cisc compilers is so much harder  than risc compilers, so maybe I'll go with the MIPS here too.
> Non-processor targets might also be interesting... e.g. ANDF, the architecture- neutral distribution format. There are converters from ANDF to target code for different architectures.

Yeah, but that is a bit too much of rocket science for me :) Getting the compiler to do proper executable is hard enough, I don't want to hinder the development with unnecessarily complex target platforms :)

>> My primary goal is to get my hands on different optimization techniques  and to get familiar with complex flow- and data-analyses.
> I had just chatted with MadMan/TAP (aka MadenMann) yesterday. Perhaps he would also be interested... He wanted to write a custom compiler for Sega Mega-Drive.

I was thinking of making a compiler for ARM's RISC processor for GBA, but that project never really took off. It would have been an interesting project though, because the device has its restrictions and the instruction set is so simple.

>> Btw, if anyone has pointers to some nice documents about OBJ-file  structure and such, I'd be interested.
> These are different. Digitalmars, Watcom, Borland: look for OMF. Microsoft and some others use COFF. Other operating systems use either some variant of COFF or some variant of ELF, or even something completely different. The format of object files need not necessarily be in correspondence with OS executable format, although i guess it makes linker's life easier.

Hmm, so different compilers need different kind of OBJ files? So I can't use Watcom objs/libs on VC++... oh well.

Thanks for the info!

-texmex/sampsa lehtonen
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
August 16, 2004
Sampsa Lehtonen schrieb:

> Well I plan to implement just a subset of D first. Leave the exceptions,  templates, mixins for later, just to get the primitive things running.  Probably the most gratifying thing in compiler construction is the moment  when you actually get something compiled and it runs!

Okeydokey.

> By tools I meant parser generators and such, and refactoring tools are a  whole different story though they are related to code parsing. Refactoring  means transformatinos on code that do not break the meaning of the code.  Pretty fun stuff, really. There are plenty of different refactoring types,  starting from simple variable renaming to super-class extraction etc.  Basicly they are tools to aid programming by automating the tedious  primitive tasks programmers do daily. Check out more at  http://www.refactoring.com/

Gotta look at them.

> BTW, nice thing about D is that D programs can be parsed easily as there  are no preprocessor. This makes refactoring possible, unlike on C++ where  the preprocessor can really f*ck up things. Currently I'm waiting a D ide  that would include refactoring tools ;)

Yes. But i guess, someone would have to write refractoring tools before someone else would integrate them into an editor.

> COCO/R, hmm, haven't heard of it. I'll check it out. But probably I'll go  with JavaCC (or Antlr, as it generates parsers in C++ too).

Yup. ANTLR is really worth it, but i just find COCO/R nice. The whole program is tiny, and generated compilers are small, complete, fast, readable. It has a few special features like comment, pragma processing, etc, extendable lookup both in parser and lexer, and context dependancy can be used. It should even be able to parse C++, i think. Though not many people have been writing grammars for it.

> Umm, I don't quite follow you. After the lexer has tokenized a token and  the parser has accepted it, the actual text comes unnecessary (unless it  is an identifier). So why would I want to load the whole file into memory  and have tokens pointing into that big piece of text?...
> With lexer for an ide where parsing needs to be done constantly and on  varying places it is a different story, I guess...

I have found that a lexeme need almost only carry the text and the pointer to the lexer. Type of the lexeme is taken from the class hierarchy. Concrete subtypes may contain further information or methods. So far i have following types of lexeme defined:

* Indetifier;
* Numerical (matching both integer and floating-point);
* Crude.

I only need to read the first symbol to guess the type of the lexeme: a letter or underscore makes it an identifier, a number makes it numeric, and everything else is "crude", and is matched using a large switch of switches which includes operators and everything else unwieldy. I have language keywords be identifiers in the lexer, and only checked in the parser later. The lexeme is parsed in the constructor of the corresponding type - thus there is no stepping back, if there is a mismatch it is a fatal error.

So far i discriminate Crudes and keywords by text in the parser, this is very fast. No copies of data are being made, and in fact there are usually only few comparisons taking place each time because first characters carry the most information.

Note also that one doesn't have to have the lexemes store their position in the file for error reporting and such - in a function to get file and line i assert that the lexeme string is within the lexer's storage, and then i slice the lexer's storage from the beginning to the start of the lexeme. Then i only need to count the line ends in there to figure out the line. :) Or, i can even have a table with offsets of all line ends and simply scan through it.

Like, it all is nothing that couldn't be done some other way, but it just works so nicely!

> Yeah, but that is a bit too much of rocket science for me :) Getting the  compiler to do proper executable is hard enough, I don't want to hinder  the development with unnecessarily complex target platforms :)

I thought it might be a bit simpler. But a real simple CPU is perhaps better suited.

> I was thinking of making a compiler for ARM's RISC processor for GBA, but  that project never really took off. It would have been an interesting  project though, because the device has its restrictions and the  instruction set is so simple.

Ever heard of "Gamepark-32", a korean game handheld? It is very popular with crazy developers. There are almost no games for it other than in korean language, and the handheld itself has to be imported, but it's cheap (aroung 120 eur IIRC), has the GCC devtools, and is accessed by USB, and runs programs from SmartMedia cards. :) It is based upon an ARM9 (as opposed to ARM7 in GBA) clocked with frequencies of 66, 100, 133 or even (unwarranted) 166 MHz - the freq can be manipulated programmatically. It is a pure framebuffer device, it doesn't have any sprite acceleration like GBA, but a much better display (320x240 hicolor), and one may have fun to figure out some cool software tricks to make it reach some notable performance.

The specs and some links for example here:

http://darkfader.net/gp32/

-eye