Jump to page: 1 2
Thread overview
Writing a (dis-)assembler for 8-bit code in D - blog posts
Apr 19, 2021
Dukc
Apr 19, 2021
Brian
Apr 20, 2021
starcanopy
Apr 20, 2021
Dukc
Apr 20, 2021
Brian
Apr 20, 2021
Brian
Apr 20, 2021
Paul Backus
Apr 20, 2021
Brian
Apr 20, 2021
Dukc
Apr 23, 2021
Imperatorn
Apr 23, 2021
Paulo Pinto
Apr 23, 2021
Imperatorn
May 04, 2021
Ali Çehreli
May 05, 2021
Matheus
May 05, 2021
Brian
May 05, 2021
matheus
April 19, 2021

You remember Brian Callahan, the one who finished OpenBSD support for the D language? He has more posts that I think people here might find interesting. He has written a disassembler and an assembler for a Z80 processor in D.

The main point in his articles is seemingly to demonstrate how programs behave, choice of the language is an implementation detail. His D is still rough, as it is for anyone new to the language, but he knows a lot about low-level programming in general. If you're thinking about low-level programming or compiler technology, these are worth a look.

https://briancallahan.net/blog/archive.html

April 19, 2021

Hello Dukc --

On Monday, 19 April 2021 at 15:01:07 UTC, Dukc wrote:

>

You remember Brian Callahan, the one who finished OpenBSD support for the D language? He has more posts that I think people here might find interesting. He has written a disassembler and an assembler for a Z80 processor in D.

The main point in his articles is seemingly to demonstrate how programs behave, choice of the language is an implementation detail. His D is still rough, as it is for anyone new to the language, but he knows a lot about low-level programming in general. If you're thinking about low-level programming or compiler technology, these are worth a look.

https://briancallahan.net/blog/archive.html

I do exist in these parts and on this mailing list. :)
Turns out I have at least one more post in the series--I decided to make the parser match the CP/M assembler for strings after all. But most certainly fixed that in a "C" way. Guess which language I'm in my head mapping my D on top of.

And you are correct, choice of language is an implementation detail. It wasn't happenstance though. D has some nice facilities that, while perhaps not exclusive to D are nonetheless quite useful for achieving the goals of the dis/assembler. Also, perhaps people want to collect D tutorials for new D coders/new coders in general and this dis/assembler can eventually be a part of that.

But... that doesn't mean I can't or won't take constructive critiques. But perhaps some context on the blog series:
The whole point of the disassembler and assembler was to answer (for myself, really) if I could successfully teach someone with effectively no formal CS education how to write such tools. Imagine someone who never learned data structures wanting to write their own tools. You could hand them this dis/assembler, they could learn everything about them with just the code, the blog posts, and the skills they have now. And once they've done that, then you say "great, now go learn DS and algo and ..." So ways to make the code more readable to that target audience are appreciated and will almost certainly become the topic of their own post on the blog in that series.

I'll also take tips for better idiomatic D in general, for my own sake.

As an unrelated aside: I'm giving a talk about all the different languages I have helped port to OpenBSD (about 40 or so that I can remember as of now). It won't all be about D, but D will be an exclusive, highlighted, part of it. Humorously, everyone is going around calling it "the D on OpenBSD talk" because that one blog post really gained traction in the *BSD community too. Anyhow, it's May 5 at 18:45 NY time. Free and virtual (Zoom): https://www.nycbug.org/index?action=view&id=10683

~Brian

April 20, 2021
On Monday, 19 April 2021 at 20:05:57 UTC, Brian wrote:
> The whole point of the disassembler and assembler was to answer (for myself, really) if I could successfully teach someone with effectively no formal CS education how to write such tools. Imagine someone who never learned data structures wanting to write their own tools. You could hand them this dis/assembler, they could learn everything about them with just the code, the blog posts, and the skills they have now. And once they've done that, then you say "great, now go learn DS and algo and ..." So ways to make the code more readable to that target audience are appreciated and will almost certainly become the topic of their own post on the blog in that series.

I'm in your target audience (never taken a CS class), and while I've only read the initial entry in the series, I think your prose is good, and your elucidations just as so. I look forward to reading the proceeding posts, so thanks for writing!
With regards to using D, I think it's a good choice if only for the fact that it appears similar to C and one may translate code to-and-fro with relatively little hassle, but doesn't necessitate the degree of circumspection that the latter demands of a beginner. So, perhaps, progressing from "un-idiomatic" D (i.e. C-like?) to more modern practices could be a sub-plot of sorts.
April 20, 2021

On Monday, 19 April 2021 at 20:05:57 UTC, Brian wrote:

>

I'll also take tips for better idiomatic D in general, for my own sake.

Here goes some tips.

Don't bother with static before the functions. It does nothing in D, unless the definition is local scope (inside a struct, class, union or another function). If you want to limit symbol visibility, see https://dlang.org/spec/attribute.html#visibility_attributes. TlDr: private means the symbol can only be used in the same file. public is the default and means the symbol can be imported from another file. export is used when making .so files and tells the symbol must be dynamically linkable.

You probably want to learn about foreach loop. For starters, you can replace almost all for loops this way: foreach(i; 0 .. array1.length) array2[i] = array1[i];. There are even better ways to do the same, http://ddili.org/ders/d.en/foreach.html is a good introduction.

A word of warning about foreach though: it should not be used if the length of the array is going to change while iterating. In my example, array.length1 is only calculated once, at start of the loop, so shortening array1 in the body would lead to out-of-bounds condition.

static foreach has a lot of potential to shorten your assembler code. For example, instead of

if (op == "nop")
    nop();
else if (op == "lxi")
    lxi();
else if (op == "stax")
    stax();
else if (op == "inx")
    inx();
else if (op == "inr")
    inr();
else if (op == "dcr")
    dcr();
else if (op == "mvi")
    mvi();
<...>
else
    err("unknown opcode: " ~ op);

its better to write

sw: switch (op)
{   //iterates over the array at compile time
    static foreach(opStr; ["nop", "lxi", "stax", "inx", "inr", "dcr", "mvi", <...>])
    {	case opStr:
        mixin(opStr)(); //inserts a call to function with name opStr.
        break sw; //breaks out of the switch statement
    }

    default: err("unknown opcode: " ~ op);
}

Even better is if you make a global array of opCodes and pass that to static foreach. The opcode array needs to be available in compile time, so mark it enum instead of immutable, like: enum opcodes = ["nop", "lxi", "stax", <...>];

>

As an unrelated aside: I'm giving a talk about all the different languages I have helped port to OpenBSD (about 40 or so that I can remember as of now).

Wow, that's a lot! Congratulations!

April 20, 2021

On Tuesday, 20 April 2021 at 10:02:05 UTC, Dukc wrote:

>

TlDr: private means the symbol can only be used in the same file.

That's clearly what I was looking for, thanks.

>

You probably want to learn about foreach loop.

Oh, yes. I certainly know what a foreach loop is. D is hardly the only language to provide it. I actively chose against using it; long boring story short, it's the result of some self-bias I have from many years on a particular kind of coding research team I used to be on as a grad student. Looking back at the finished code, it was a decision that didn't bear out as I was hoping. As there are only 12 for loops (and no other types of loops, again, on purpose) in the whole assembler, that's probably a design decision that I would not have selected if I were going to start over. In fact, the new parser abandons the idea already.

So I agree: converting all the for loops to foreach loops would be a nice additional blog post. Thanks.

>

static foreach has a lot of potential to shorten your assembler code.

sw: switch (op)
{   //iterates over the array at compile time
    static foreach(opStr; ["nop", "lxi", "stax", "inx", "inr", "dcr", "mvi", <...>])
    {	case opStr:
        mixin(opStr)(); //inserts a call to function with name opStr.
        break sw; //breaks out of the switch statement
    }

    default: err("unknown opcode: " ~ op);
}

Even better is if you make a global array of opCodes and pass that to static foreach. The opcode array needs to be available in compile time, so mark it enum instead of immutable, like: enum opcodes = ["nop", "lxi", "stax", <...>];

I should spend some time looking at mixins. The rest you have there makes intuitive sense looking at it. (Crazy question: is there a way to dump internal state after semantic analysis?)

Looking at the mixins page here: https://dlang.org/articles/mixin.html, I am already disappointed to learn about the monkey business it won't let me do :) I was really hoping to radically alter the syntax of D to create my own language and then implement an entirely different language in that (I kid, but only slightly. That's exactly what Arthur Whitney does in his language development: K being a good example of this.)

~Brian

April 20, 2021

On Tuesday, 20 April 2021 at 13:38:37 UTC, Brian wrote:

>

[..]

Looking at the mixins page here: https://dlang.org/articles/mixin.html, I am already disappointed to learn about the monkey business it won't let me do :) I was really hoping to radically alter the syntax of D to create my own language and then implement an entirely different language in that (I kid, but only slightly. That's exactly what Arthur Whitney does in his language development: K being a good example of this.)

~Brian

You can almost do what you want, which may be sufficient for your needs :P

Have a look these two projects as general examples:

Basically, you can embed a DSL with arbitrary syntax and semantics in a D program, just as long as the DSL code is encapsulated in a string mixin. So, you can either have every D file be just a big string mixin - mixin(myDSL("<lots of code here>")) (see Pegged), or you can put the DSL code as separate files and then string-import it at compile-time from the rest of the D code (see vibe.d).

April 20, 2021

On Tuesday, 20 April 2021 at 13:38:37 UTC, Brian wrote:

>

I should spend some time looking at mixins. The rest you have there makes intuitive sense looking at it. (Crazy question: is there a way to dump internal state after semantic analysis?)

-vcg-ast will dump the AST after semantic analysis.

April 20, 2021

On Tuesday, 20 April 2021 at 13:38:37 UTC, Brian wrote:

> >

static foreach has a lot of potential to shorten your assembler code.

sw: switch (op)
{   //iterates over the array at compile time
    static foreach(opStr; ["nop", "lxi", "stax", "inx", "inr", "dcr", "mvi", <...>])
    {	case opStr:
        mixin(opStr)(); //inserts a call to function with name opStr.
        break sw; //breaks out of the switch statement
    }

    default: err("unknown opcode: " ~ op);
}

Even better is if you make a global array of opCodes and pass that to static foreach. The opcode array needs to be available in compile time, so mark it enum instead of immutable, like: enum opcodes = ["nop", "lxi", "stax", <...>];

I should spend some time looking at mixins. The rest you have there makes intuitive sense looking at it. (Crazy question: is there a way to dump internal state after semantic analysis?)

Not sure if that's quite what you want, but you can use pragma(msg, typeOrValueKnownAtCompileTime) ² to print stuff at CT.

The following program:

void main()
{
    string op;
	sw: switch (op)
    {
        static foreach(opStr; ["nop", "lxi", "stax", "inx", "inr", "dcr", "mvi"])
        {	case opStr:
            pragma(msg, "Inside case: '", opStr, "'");
            break sw;
        }
        default: return;
    }
}

Prints during compilation:

Inside case: 'nop'
Inside case: 'lxi'
Inside case: 'stax'
Inside case: 'inx'
Inside case: 'inr'
Inside case: 'dcr'
Inside case: 'mvi'

(Try online: https://run.dlang.io/is/czbSXe)

If you want read an in-depth explanation regarding the fine distinction between static compile-time code and "dynamic" compile-time code, you can check this wiki page:
https://wiki.dlang.org/User:Quickfur/Compile-time_vs._compile-time

P.S. As Paul mentioned, you can also use the -vcg-ast compiler switch (not sure if other compilers than dmd support it): https://run.dlang.io/is/AHHwAs

April 20, 2021

On Tuesday, 20 April 2021 at 14:52:08 UTC, Petar Kirov [ZombineDev] wrote:

>

You can almost do what you want, which may be sufficient for your needs :P

Have a look these two projects as general examples:

Basically, you can embed a DSL with arbitrary syntax and semantics in a D program, just as long as the DSL code is encapsulated in a string mixin. So, you can either have every D file be just a big string mixin - mixin(myDSL("<lots of code here>")) (see Pegged), or you can put the DSL code as separate files and then string-import it at compile-time from the rest of the D code (see vibe.d).

Oh, that was definitely tongue-in-cheek to the point of being obnoxious on my part :)
I was amused that the mixins page specifically called out that specific potential as monkey business. But I appreciate the links.

~Brian

April 20, 2021

On Tuesday, 20 April 2021 at 15:26:28 UTC, Petar Kirov [ZombineDev] wrote:

>

Not sure if that's quite what you want, but you can use [pragma(msg, typeOrValueKnownAtCompileTime)][1] [²][2] to print stuff at CT.

I suppose what I want to do is traverse the compiler's transformation of the mixin. The mixin page suggests it performs that work at semantic evaluation time.

~Brian

« First   ‹ Prev
1 2