Migrating dmd to D? (page 17) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Migrating dmd to D? (page 17)

March 03, 2013

Re: Migrating dmd to D?

Posted by Zach the Mystic
in reply to Daniel Murphy

Zach the Mystic

Posted in reply to Daniel Murphy

On Sunday, 3 March 2013 at 07:27:51 UTC, Daniel Murphy wrote:
>> Now you're as up-to-date as I am on what I'm thinking.
>
> I did something like that before (token-level pattern matching) and found
> the number of special cases to be much much too high.  You need so much
> context information you're better off just building an ast and operating on
> that.

What were the biggest and most common reasons you needed context information?

March 03, 2013

Re: Migrating dmd to D?

Posted by Dicebot
in reply to Walter Bright

Dicebot

Posted in reply to Walter Bright

On Sunday, 3 March 2013 at 05:48:30 UTC, Walter Bright wrote:
> ...
> Yes, and those will fail to link.

Ok, checked this out. While it is cool that you can get all fat stuff out and get your hello world of the same binary size as plain C one, resulting language is actually less usable than C (array literals) and lacks my main reason to use D (templates and friends). May be it can be used in pair with custom re-written from scratch run-time to create something usable - I'd argue that at least templates should not require run-time stuff at all.

March 04, 2013

Re: Migrating dmd to D?

Posted by Daniel Murphy
in reply to Zach the Mystic

Daniel Murphy

Posted in reply to Zach the Mystic

"Zach the Mystic" <reachBUTMINUSTHISzach@gOOGLYmail.com> wrote in message news:kidboshnjpowpyqrtwjl@forum.dlang.org...
> On Sunday, 3 March 2013 at 07:27:51 UTC, Daniel Murphy wrote:
>>> Now you're as up-to-date as I am on what I'm thinking.
>>
>> I did something like that before (token-level pattern matching) and found
>> the number of special cases to be much much too high.  You need so much
>> context information you're better off just building an ast and operating
>> on
>> that.
>
> What were the biggest and most common reasons you needed context information?

Turning implicit into explicit conversions.  A big one is 0 -> Loc(0). dinteger_t -> size_t.  void* -> char*.  string literal to char*.  string literal to unsigned char*.  unsigned -> unsigned char.  int -> bool.

March 04, 2013

Re: Migrating dmd to D?

Posted by Iain Buclaw
in reply to Daniel Murphy

Iain Buclaw

Posted in reply to Daniel Murphy

Attachments:

text/html part

On Mar 4, 2013 2:41 AM, "Daniel Murphy" <yebblies@nospamgmail.com> wrote:
>
> "Zach the Mystic" <reachBUTMINUSTHISzach@gOOGLYmail.com> wrote in message news:kidboshnjpowpyqrtwjl@forum.dlang.org...
> > On Sunday, 3 March 2013 at 07:27:51 UTC, Daniel Murphy wrote:
> >>> Now you're as up-to-date as I am on what I'm thinking.
> >>
> >> I did something like that before (token-level pattern matching) and
found
> >> the number of special cases to be much much too high.  You need so much context information you're better off just building an ast and
operating
> >> on
> >> that.
> >
> > What were the biggest and most common reasons you needed context information?
>
> Turning implicit into explicit conversions.  A big one is 0 -> Loc(0). dinteger_t -> size_t.  void* -> char*.  string literal to char*.  string literal to unsigned char*.  unsigned -> unsigned char.  int -> bool.
>
>

All look fine except for dinteger_t, which should be -> long (it should always be the widest integer type supported by the host eg: longlong.

Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

March 04, 2013

Re: Migrating dmd to D?

Posted by David Nadlinger
in reply to SomeDude

David Nadlinger

Posted in reply to SomeDude

On Saturday, 2 March 2013 at 14:55:19 UTC, SomeDude wrote:
> On Saturday, 2 March 2013 at 14:47:55 UTC, David Nadlinger wrote:
>> You simply use another host system (e.g. Windows/Linux x86) until the new backend/runtime is stable enough for the compiler to self-host.
>>
>> David
>
> And what if you *don't* have a cross compiler ? You compile the D subset (bootstrapper) in C and off you go (provided you have a reasonable C compiler on that platform).

I think you are misunderstanding something here.

You need a backend for the new platform anyway for a D compiler on it to be of any use. Or do you envision building x86 D binaries on <fancy_new_architecture> to be an important use case?

David

March 04, 2013

Re: Migrating dmd to D?

Posted by Daniel Murphy
in reply to Iain Buclaw

Daniel Murphy

Posted in reply to Iain Buclaw

"Iain Buclaw" <ibuclaw@ubuntu.com> wrote in message news:mailman.215.1362390328.14496.digitalmars-d@puremagic.com...
>
> All look fine except for dinteger_t, which should be -> long (it should always be the widest integer type supported by the host eg: longlong.
>
> Regards
> -- 
> Iain Buclaw
>
> *(p < e ? p++ : p) = (c & 0x0f) + '0';
>

I know, it's nasty, but dmd does this _everywhere_.  Expression::toInteger returns toInteger, then it is used to index arrays, set offsets, etc.

I'm now using a modified compiler that accepts all these conversions, doesn't error on variable shadowing, and lets you compare pointers with null without 'is'.

I've managed to processes, compile and link the frontend.  Next root, then glue.

March 04, 2013

Re: Migrating dmd to D?

Posted by David Nadlinger
in reply to Andrei Alexandrescu

David Nadlinger

Posted in reply to Andrei Alexandrescu

On Thursday, 28 February 2013 at 15:24:07 UTC, Andrei Alexandrescu wrote:
> On 2/28/13 5:03 AM, deadalnix wrote:
>> That will impair GDC and LDC quite a lot.
>
> Let's see what the respective project leaders say.

Well, let me first emphasize that I agree that having the D reference implementation written in D is a desirable goal for a number of reasons, such as outlined by Andrei in his initial post. I am not sure whether using DMD as a basis is the ideal approach as far as the ultimate outcome is concerned, but it certainly has its merits considering the limited time budget.

That being said, moving parts of the front-end source to D will in any case cause quite a bit of minor work all over the place for LDC (porting LDC-specific changes, adapting the build system, ...), and I would be glad if somebody new could take this as an opportunity to join LDC development, as the time that Kai and I (the current main contributors) can spend on LDC right now is unfortunately rather limited anyway.

Apart from such minor effects, I only really see two possible issues to be aware of:

First, requiring a D compiler to build LDC will make life harder for people preparing distribution packages, at least for packages in the actual upstream repositories where the packages usually have to be buildable from source (with dependencies also being met out of the distro's repositories). This is not at all an unsolvable issue, but the migration should be coordinated with the packaging crowd to ensure a smooth transition.

In this regard, we should also make sure that the front-end (and thus GDC and LDC) can be bootstrapped of a Free/OSS D compiler, otherwise integration of GDC/LDC into Debian, Fedora, ... might become a problem. Not that this should be a huge issue with GDC and LDC being around, but I thought I would mention it.

Second, rewriting all of *LDC's* code in D would be a huge task, as the use of C++ templates is pervasive through the LLVM C++ API (even if they are used pretty judiciously), and the LLVM C API is a lot less powerful in some aspects. Thus, care should be taken that the D frontend can actually be used with some of the virtual method implementations still in C++ (e.g. toElem/toElemDtor and similar LDC-specific ones).

Your (Andrei's) initial post sounded like this would be the case. But if I interpreted some of the posts correctly, Daniel Murphy has an automatic translator in the works for porting over the whole compiler (except for the backend) at once, which might be a problem for LDC.

David

March 04, 2013

Re: Migrating dmd to D?

Posted by Zach the Mystic
in reply to Daniel Murphy

Zach the Mystic

Posted in reply to Daniel Murphy

On Monday, 4 March 2013 at 02:36:23 UTC, Daniel Murphy wrote:
>> What were the biggest and most common reasons you needed context information?
>
> Turning implicit into explicit conversions.  A big one is 0 -> Loc(0).
> dinteger_t -> size_t.  void* -> char*.  string literal to char*.  string
> literal to unsigned char*.  unsigned -> unsigned char.  int -> bool.

I would like to play devil's advocate myself, at least on 0 -> Loc(0).

I found that in the source, the vast, vast majority of Loc instances were named, of course, 'loc'. Of the few other ones, only 'endloc' was ever assigned to 0. The token matcher could substitute:

'loc = 0' -> 'loc = Loc(0)'
'endloc = 0' -> 'endloc = Loc(0)'

As long as it had a list of the D's AST classes, a pretty conservative attempt to knock out a huge number of additional cases is:
'new DmdClassName(0' -> 'new DmdClassName(Loc(0)'

The core principle with the naive approach is to take advantage of specific per-project conventions such as always giving the Loc first. The more uniformity with which the project has been implemented, the more likely this approach will work.

A lot of those other implicit conversions I do agree seem daunting. The naive approach would require two features, one, a basic way of tracking a variable's type. For example, it could have a list of known 'killer' types which cause problems. When it sees one it records the next identifier it finds and associates it to that type for the rest of the function. It may then be slightly better able to known patterns where conversion is desirable. The second feature would be a brute force way of saying, "You meet pattern ZZZ: if in function XXX::YYY, replace it with WWW, else replace with UUU." This is clearly the point of diminishing returns for the naive approach, at which point I could only hope that a good abstraction could make up a lot of  ground when found necessary.

The point of diminishing returns for the whole naive approach is reached when for every abstraction you add, you end up breaking as much code as you fix. Then you're stuck with the grunt work of adding special case after special case, and you might as well try something else at that point...

My current situation is that my coding skills will lag behind my ability to have ideas, so I don't have anything rearding my approach up and running for comparison, but I want the conversation to be productive, so I'll give you the ideas I've had since yesterday.

I would start by creating a program which converts the source by class, one class at a time, and one file for each. It has a list of classes to convert, and a list of data, methods, and overrides for each class - it will only include what's on the list, so you can add classes and functions one step at a time. For each method or override, a file to find it in, and maybe a hint as to about where the function begins in said file.

You may have already thought of these, but just to say them out loud, some more token replacements I was thinking of:

'SameName::SameName(...ABC...) : DifferentName(...XYZ...) {'
->
'this(...ABC...)
{
    super(...XYZ...);'

Standard reference semantics:
'DTreeClass *' -> 'DTreeClass'

Combined, they look like this:
'OrOrExp::OrOrExp(Loc loc, Expression *e1, Expression *e2)
        : BinExp(loc, TOKoror, sizeof(OrOrExp), e1, e2)
{'
->
'this(Loc loc, Expression e1, Expression e2)
{
    super(loc, TOKoror, sizeof(OrOrExp), e1, e2);'

March 04, 2013

Re: Migrating dmd to D?

Posted by SomeDude
in reply to David Nadlinger

SomeDude

Posted in reply to David Nadlinger

On Monday, 4 March 2013 at 13:40:27 UTC, David Nadlinger wrote:
> I think you are misunderstanding something here.
>
> You need a backend for the new platform anyway for a D compiler on it to be of any use. Or do you envision building x86 D binaries on <fancy_new_architecture> to be an important use case?
>
> David

Oh ok. Maybe I was implying the gcc backend, which has been ported to several platforms.

March 05, 2013

Re: Migrating dmd to D?

Posted by Daniel Murphy
in reply to Zach the Mystic

Daniel Murphy

Posted in reply to Zach the Mystic

"Zach the Mystic" <reachBUTMINUSTHISzach@gOOGLYmail.com> wrote in message news:oxcqgprnwnsuzngfijyg@forum.dlang.org...
>
> I would like to play devil's advocate myself, at least on 0 -> Loc(0).
>
> I found that in the source, the vast, vast majority of Loc instances were named, of course, 'loc'. Of the few other ones, only 'endloc' was ever assigned to 0. The token matcher could substitute:
>
> 'loc = 0' -> 'loc = Loc(0)'
> 'endloc = 0' -> 'endloc = Loc(0)'
>

This is fairly rare.

> As long as it had a list of the D's AST classes, a pretty conservative
> attempt to knock out a huge number of additional cases is:
> 'new DmdClassName(0' -> 'new DmdClassName(Loc(0)'
>

Yes, this mostly works, and is exactly what I did in a previous attempt.

> The core principle with the naive approach is to take advantage of specific per-project conventions such as always giving the Loc first. The more uniformity with which the project has been implemented, the more likely this approach will work.
>
> A lot of those other implicit conversions I do agree seem daunting. The naive approach would require two features, one, a basic way of tracking a variable's type. For example, it could have a list of known 'killer' types which cause problems. When it sees one it records the next identifier it finds and associates it to that type for the rest of the function. It may then be slightly better able to known patterns where conversion is desirable. The second feature would be a brute force way of saying, "You meet pattern ZZZ: if in function XXX::YYY, replace it with WWW, else replace with UUU." This is clearly the point of diminishing returns for the naive approach, at which point I could only hope that a good abstraction could make up a lot of  ground when found necessary.
>

My experience was that you don't need to explicitly track which function you are in, just keeping track of the file and matching a longer pattern is enough.

Here is one of the files of patterns I made: http://dpaste.dzfl.pl/3c9be703 Obviously this could be shorter with a dsl, and towards the end I started using a less verbose SM + DumpOut approach.

> The point of diminishing returns for the whole naive approach is reached when for every abstraction you add, you end up breaking as much code as you fix. Then you're stuck with the grunt work of adding special case after special case, and you might as well try something else at that point...
>

Yeah...

> My current situation is that my coding skills will lag behind my ability to have ideas, so I don't have anything rearding my approach up and running for comparison, but I want the conversation to be productive, so I'll give you the ideas I've had since yesterday.
>
> I would start by creating a program which converts the source by class, one class at a time, and one file for each. It has a list of classes to convert, and a list of data, methods, and overrides for each class - it will only include what's on the list, so you can add classes and functions one step at a time. For each method or override, a file to find it in, and maybe a hint as to about where the function begins in said file.
>

That is waaaay to much information to gather manually.  There are a LOT of classes and functions in dmd.

> You may have already thought of these, but just to say them out loud, some more token replacements I was thinking of:
>
> 'SameName::SameName(...ABC...) : DifferentName(...XYZ...) {'
> ->
> 'this(...ABC...)
> {
>     super(...XYZ...);'
>
> Standard reference semantics:
> 'DTreeClass *' -> 'DTreeClass'
>
> Combined, they look like this:
> 'OrOrExp::OrOrExp(Loc loc, Expression *e1, Expression *e2)
>         : BinExp(loc, TOKoror, sizeof(OrOrExp), e1, e2)
> {'
> ->
> 'this(Loc loc, Expression e1, Expression e2)
> {
>     super(loc, TOKoror, sizeof(OrOrExp), e1, e2);'
>

Like I said, I went down this path before, and made some progress.  It
resulted in a huge list of cases.
My second attempt was to 'parse' c++, recognising preprocessor constructs as
regular ones.  The frequent use of #ifdef cutting expressions makes this
very, very difficult.
So my current approach is to filter out the preprocessor conditionals first,
before parsing.  #defines and #pragmas survive to parsing.

In short, doing this at the token level works, but because you're transforming syntax, not text, it's better to work on a syntax tree.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation