Potential of a compiler that creates the executable at once (page 3)

Settings

Help

Index » General » Potential of a compiler that creates the executable at once (page 3)

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by rempas
in reply to Walter Bright

Permalink

rempas

Posted in reply to Walter Bright

Permalink

On Thursday, 10 February 2022 at 20:39:33 UTC, Walter Bright wrote:
> This is actually the reason behind why dmd will create a single object file when given multiple source files on the command line. It's also why dmd can create a library directly.
>
> I've toyed with the idea of generating an executable directly many times.

That's nice to hear! However, does DMD generates object files directly or "asm" files that are passed to a C compile? If I remember correctly, LDC2 needs to pass the output to a C compiler as people told me so what's the case from DMD?

I tried to compile a C library (code converted in D to use with DMD rather than using "ImportC") using GCC and DMD. And it turns out that DMD is about 70-80% faster than GCC which is good but I would suppose it could have been better given the design of the D as a language and if DMD outputs object files directly.

Do you think that there are any very bad places in DMD's backend? Has anyone in the team thought about re-writing the backend (or parts of it) from the beginning?

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by sfp
in reply to Walter Bright

Permalink

sfp

Posted in reply to Walter Bright

Permalink

On Friday, 11 February 2022 at 06:33:20 UTC, Walter Bright wrote:
> On 2/10/2022 7:45 PM, max haughton wrote:
>> If by hook you mean a callback of sorts that can be overrided, then the problem solved is not strictly the same as a weakly defined function. If you have multiple library's in the same playpen then it simply doesn't work to have them all trying to override the same symbols. If they can neatly hook and unhook things that goes away.
>
> That's not how multiple libraries work.
>
> Suppose you have 3 libraries, A, B, and C. You have an object file X. The linker command is:
>
>     link X.obj A.lib B.lib C.lib
>
> X refers to "foo". All 4 define "foo". Which one gets picked?
>
>    X.foo
>
> That's it. There are no unresolved symbols to look for.
>
> Now, suppose only B and C define "foo". Which one gets picked?
>
>    B.foo
>
> because it is not in X. Then, A is looked at, and it is not in A. Then, B is looked at, and it is in B. C is not looked at because it is now resolved.
>
> It has nothing to do with weak definitions. It's a simple "foo" is referenced. Got to find a definition. Look in the libraries in the order they are supplied to the linker.
>
> That's it.
>
> Want to not use the library definition? Define it yourself in X. No need for hooking. No need for anything clever at all. Just define it in your .obj file.
>
> ----
>
> Now suppose X.obj and Y.obj both define foo. Link with:
>
>     link X.obj Y.obj A.lib B.lib C.lib
>
> You get a message:
>
>     Multiple definition of "foo", found in X.obj and Y.obj
>
> because order does not matter for .obj files as far as symbols go. All the symbols in .obj files get added.

You have now successfully explained this to at least one programmer! :-) Very good explanation, and very simple mechanism indeed. Had no idea it worked this way.

Inspired by this, I did a little searching and found this blog post:

http://www.samanbarghi.com/blog/2014/09/05/how-to-wrap-a-system-call-libc-function-in-linux/

One of these days I should get around to learning all the things the toolchain can actually do for me!

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by rempas
in reply to Dave P.

Permalink

rempas

Posted in reply to Dave P.

Permalink

On Thursday, 10 February 2022 at 22:06:30 UTC, Dave P. wrote:

I think it would be interesting to combine a compiler and a linker into a single executable. Not necessarily for speed reasons, but for better diagnostics and the possibility of type checking external symbols. Linker errors can sometimes be hard to understand in the presence of inlining and optimizations. The linker will report references to symbols not present in your code or present in completely different places.

For example:

extern(D) int some_func(int x);

pragma(inline, true)
private int foo(int x){
    return some_func(x);
}

pragma(inline, true)
private int bar(int x){
    return foo(x);
}

pragma(inline, true)
private int baz(int x){
    return bar(x);
}

pragma(inline, true)
private int qux(int x){
    return baz(x);
}

int main(){
    return qux(2);
}

When you go to compile it:

Undefined symbols for architecture arm64:
  "__D7example9some_funcFiZi", referenced from:
      __D7example3fooFiZi in example.o
      __D7example3barFiZi in example.o
      __D7example3bazFiZi in example.o
      __D7example3quxFiZi in example.o
      __Dmain in example.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Error: /usr/bin/cc failed with status: 1

The linker sees references to the extern function in places where I never wrote that in my source code. In a nontrivial project this can be quite confusing if you’re not used to this quirk of the linking process.

If the compiler is invoking the linker for you anyway, why can’t it read the object files and libraries and tell you exactly what is missing and where in your code you reference it?

Yeah, error messages could ALWAYS be better in any compiler (even rustc) at any time. This design would make it even easier to do like you explained. Thank you!

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by rempas
in reply to Era Scarecrow

Permalink

rempas

Posted in reply to Era Scarecrow

Permalink

On Friday, 11 February 2022 at 04:18:42 UTC, Era Scarecrow wrote:

I believe most of the compilers base is involving optimization for various architectures and versions of CPU's, along with cross-compiling.

Yeah but when I don't cross-compile, I only compile for one OS and one instruction set. Code for other cases will not get executed so I cannot see how this can play a role. TCC also support a lot of architectures and Operating Systems (even Windows natively If I'm not wrong). Unless I don't understand what you mean...

GNU/GCC has tons of legacy code in the back that it still uses i believe.

Yeah, that's the problem we will never be able to solve. New and better practices will always be invented so to get the best possible performance, we must always re-write stuff (or parts of it) and in the case of big compilers, this will be a pain in the ass and I understand it...

To note, back in 1996 or about there i wrote an assembler that took x86 and could compiler itself. But wasn't compatible with any other code and couldn't use object files or anything (as it was all made from scratch when i was 12-14). However it did compiler directly to a COM file. I'll just say from experience, there are advantages but they don't outweigh the disadvantages. That's my flat opinion going from here.

I wonder what we can do to keep the advantages and take away the disadvantages. The second idea I had is probably the answer but I would like someone to say something about it directly. Thank you for your time!

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by max haughton
in reply to rempas

Permalink

max haughton

Posted in reply to rempas

Permalink

On Friday, 11 February 2022 at 12:34:21 UTC, rempas wrote:
> On Thursday, 10 February 2022 at 20:39:33 UTC, Walter Bright wrote:
>> [...]
>
> That's nice to hear! However, does DMD generates object files directly or "asm" files that are passed to a C compile? If I remember correctly, LDC2 needs to pass the output to a C compiler as people told me so what's the case from DMD?
>
> [...]

The object emission code in the backend is quite inefficient, it needs to be rewritten (it's horrible old code anyway)

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by rempas
in reply to max haughton

Permalink

rempas

Posted in reply to max haughton

Permalink

On Friday, 11 February 2022 at 14:52:09 UTC, max haughton wrote:
>
> The object emission code in the backend is quite inefficient, it needs to be rewritten (it's horrible old code anyway)

I would love if they would do it but I can't complain that they don't. Openhub reports that [DMD] consists of 961K LoC!! I know that D is a huge language so the frontend will be a good part of it and that code for some other stuff (including a lot of stuff for the backend) will probably not change. But this is A LOT to do still!

Maybe they can do that for D 3.0 along with removing the need for GC to use Phobos (and giving the ability to only close that in the compiler) then I can see D becoming as big as it was intended! But dreams are free...

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by Dennis
in reply to rempas

Permalink

Dennis

Posted in reply to rempas

Permalink

On Friday, 11 February 2022 at 12:34:21 UTC, rempas wrote:

That's nice to hear! However, does DMD generates object files directly or "asm" files that are passed to a C compile? If I remember correctly, LDC2 needs to pass the output to a C compiler as people told me so what's the case from DMD?

DMD goes from its own backend block tree to an object file, without writing assembly. In fact, only recently was the ability to output asm added for debugging purposes:
https://dlang.org/blog/2022/01/24/the-binary-language-of-moisture-vaporators/

On Linux dmd invokes gcc by default to create an executable, but only to link the resulting object files, not to compile C/assembly code.

LDC goes from LLVM IR to machine code, but it can output assembly with the -output-s flag.

GDC does generate assembly text to the tmp folder and then invokes gas the GNU assembler, it can't directly write machine code.

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by user1234
in reply to rempas

Permalink

user1234

Posted in reply to rempas

Permalink

On Friday, 11 February 2022 at 15:17:16 UTC, rempas wrote:
> On Friday, 11 February 2022 at 14:52:09 UTC, max haughton wrote:
>>
>> The object emission code in the backend is quite inefficient, it needs to be rewritten (it's horrible old code anyway)
>
> I would love if they would do it but I can't complain that they don't. Openhub reports that [DMD] consists of 961K LoC!!

Openhub and their metrics are old trash. It's more 170K according to D-Scanner.

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by user1234
in reply to user1234

Permalink

user1234

Posted in reply to user1234

Permalink

On Friday, 11 February 2022 at 16:41:33 UTC, user1234 wrote:
> On Friday, 11 February 2022 at 15:17:16 UTC, rempas wrote:
>> On Friday, 11 February 2022 at 14:52:09 UTC, max haughton wrote:
>>>
>>> The object emission code in the backend is quite inefficient, it needs to be rewritten (it's horrible old code anyway)
>>
>> I would love if they would do it but I can't complain that they don't. Openhub reports that [DMD] consists of 961K LoC!!
>
> Openhub and their metrics are old trash. It's more 170K according to D-Scanner.

wait... it's 175K. I had not pulled since 8 monthes or so. There's much new code that was commited since, with importC notably.

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by max haughton
in reply to Walter Bright

Permalink

max haughton

Posted in reply to Walter Bright

Permalink

On Friday, 11 February 2022 at 06:33:20 UTC, Walter Bright wrote:
> On 2/10/2022 7:45 PM, max haughton wrote:
>> If by hook you mean a callback of sorts that can be overrided, then the problem solved is not strictly the same as a weakly defined function. If you have multiple library's in the same playpen then it simply doesn't work to have them all trying to override the same symbols. If they can neatly hook and unhook things that goes away.
>
> That's not how multiple libraries work.
>
> Suppose you have 3 libraries, A, B, and C. You have an object file X. The linker command is:
>
>     link X.obj A.lib B.lib C.lib
>
> X refers to "foo". All 4 define "foo". Which one gets picked?
>
>    X.foo
>
> That's it. There are no unresolved symbols to look for.
>
> Now, suppose only B and C define "foo". Which one gets picked?
>
>    B.foo
>
> because it is not in X. Then, A is looked at, and it is not in A. Then, B is looked at, and it is in B. C is not looked at because it is now resolved.
>
> It has nothing to do with weak definitions. It's a simple "foo" is referenced. Got to find a definition. Look in the libraries in the order they are supplied to the linker.
>
> That's it.
>
> Want to not use the library definition? Define it yourself in X. No need for hooking. No need for anything clever at all. Just define it in your .obj file.
>
> ----
>
> Now suppose X.obj and Y.obj both define foo. Link with:
>
>     link X.obj Y.obj A.lib B.lib C.lib
>
> You get a message:
>
>     Multiple definition of "foo", found in X.obj and Y.obj
>
> because order does not matter for .obj files as far as symbols go. All the symbols in .obj files get added.

If all the libraries rely on hooking something you will silently break all but one, whereas the process of overriding a runtime hook can be made into an atomic operation that can fail in a reasonable manner if wielded incorrectly.

Doing things based on the order at link-time is simply not good practice in the general case. It's OK if you control all the things in the stack and want to (say) override malloc, but controlling what happens on an assertion is exactly the kind of thing that resolution at link-time can make into a real nightmare to do cleanly (and mutably, you might want to catch assertions differently when acting as a web server than when loading data).

Also linking (especially around shared libraries) doesn't work in exactly the same way on all platforms, so basically maximizing the entropy of a given link (minimize possible outcomes, so minimal magic) can be a real win when it comes to making a program that builds and runs reliably on different platforms. At Symmetry we have had real issues with shared libraries, for reasons more complicated than mentioned here granted, so we actually cannot ship anything with dmd even if we wanted to.

Top | Forum index | About this forum

Forums