Potential of a compiler that creates the executable at once (page 2)

Settings

Help

Index » General » Potential of a compiler that creates the executable at once (page 2)

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by rikki cattermole
in reply to Walter Bright

Permalink

rikki cattermole

Posted in reply to Walter Bright

Permalink

On 11/02/2022 11:52 AM, Walter Bright wrote:
> I have never been able to explain these to people. I wonder if it is because it is so simple, people think "that can't be right". With the hook thing, they'll ask me to re-explain it several times, then they'll say "are you sure?" and they still don't believe it.

It does depend on a few factors.

Compiler, linker, build/package manager all playing along.

Not to mention shared library support actually good enough with clear common use cases all described.

For me personally there are a few unknowns for the general case that I would avoid using it in production.

February 10, 2022

Re: Potential of a compiler that creates the executable at once

Posted by Walter Bright
in reply to rikki cattermole

Permalink

Walter Bright

Posted in reply to rikki cattermole

Permalink

On 2/10/2022 4:03 PM, rikki cattermole wrote:
> On 11/02/2022 11:52 AM, Walter Bright wrote:
>> I have never been able to explain these to people. I wonder if it is because it is so simple, people think "that can't be right". With the hook thing, they'll ask me to re-explain it several times, then they'll say "are you sure?" and they still don't believe it.
> 
> It does depend on a few factors.
> 
> Compiler, linker, build/package manager all playing along.
> 
> Not to mention shared library support actually good enough with clear common use cases all described.
> 
> For me personally there are a few unknowns for the general case that I would avoid using it in production.

All linkers work this way.

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by max haughton
in reply to Walter Bright

Permalink

max haughton

Posted in reply to Walter Bright

Permalink

On Thursday, 10 February 2022 at 22:52:45 UTC, Walter Bright wrote:
> On 2/10/2022 2:06 PM, Dave P. wrote:
>> Undefined symbols for architecture arm64:
>>    "__D7example9some_funcFiZi", referenced from:
>>        __D7example3fooFiZi in example.o
>>        __D7example3barFiZi in example.o
>>        __D7example3bazFiZi in example.o
>>        __D7example3quxFiZi in example.o
>>        __Dmain in example.o
>> ld: symbol(s) not found for architecture arm64
>
> Things I have never been able to explain, even to long time professional programmers:
>
> 1. what "undefined symbol" means
>
> 2. what "multiply defined symbol" means
>
> 3. how linkers resolve symbols
>
> Our own runtime library illustrates this bafflement. In druntime, there are these "hooks" where one can replace the default function that deals with assertion errors.
>
> Such hooks are entirely unnecessary.
>
> To override a symbol in a library, just write your own function with the same name and link it in before the library.
>
> I have never been able to explain these to people. I wonder if it is because it is so simple, people think "that can't be right". With the hook thing, they'll ask me to re-explain it several times, then they'll say "are you sure?" and they still don't believe it.

If by hook you mean a callback of sorts that can be overrided, then the problem solved is not strictly the same as a weakly defined function. If you have multiple library's in the same playpen then it simply doesn't work to have them all trying to override the same symbols. If they can neatly hook and unhook things that goes away.

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by Era Scarecrow
in reply to rempas

Permalink

Era Scarecrow

Posted in reply to rempas

Permalink

On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:

A couple of months ago, I found out about a language called Vox which uses a design that I haven't seen before by any other compiler which is to not create object files and then link them together but instead, always create an executable at once.

TCC (Tiny C Compiler) does this like 10 years ago. TCC was originally made as part of the obfuscation programming challenge, and then got updated to be more complete.

https://www.bellard.org/tcc/

I believe most of the compilers base is involving optimization for various architectures and versions of CPU's, along with cross-compiling. GNU/GCC has tons of legacy code in the back that it still uses i believe.

To note, back in 1996 or about there i wrote an assembler that took x86 and could compiler itself. But wasn't compatible with any other code and couldn't use object files or anything (as it was all made from scratch when i was 12-14). However it did compiler directly to a COM file. I'll just say from experience, there are advantages but they don't outweigh the disadvantages. That's my flat opinion going from here.

February 10, 2022

Re: Potential of a compiler that creates the executable at once

Posted by Walter Bright
in reply to max haughton

Permalink

Walter Bright

Posted in reply to max haughton

Permalink

On 2/10/2022 7:45 PM, max haughton wrote:
> If by hook you mean a callback of sorts that can be overrided, then the problem solved is not strictly the same as a weakly defined function. If you have multiple library's in the same playpen then it simply doesn't work to have them all trying to override the same symbols. If they can neatly hook and unhook things that goes away.

That's not how multiple libraries work.

Suppose you have 3 libraries, A, B, and C. You have an object file X. The linker command is:

    link X.obj A.lib B.lib C.lib

X refers to "foo". All 4 define "foo". Which one gets picked?

   X.foo

That's it. There are no unresolved symbols to look for.

Now, suppose only B and C define "foo". Which one gets picked?

   B.foo

because it is not in X. Then, A is looked at, and it is not in A. Then, B is looked at, and it is in B. C is not looked at because it is now resolved.

It has nothing to do with weak definitions. It's a simple "foo" is referenced. Got to find a definition. Look in the libraries in the order they are supplied to the linker.

That's it.

Want to not use the library definition? Define it yourself in X. No need for hooking. No need for anything clever at all. Just define it in your .obj file.

----

Now suppose X.obj and Y.obj both define foo. Link with:

    link X.obj Y.obj A.lib B.lib C.lib

You get a message:

    Multiple definition of "foo", found in X.obj and Y.obj

because order does not matter for .obj files as far as symbols go. All the symbols in .obj files get added.

February 10, 2022

Re: Potential of a compiler that creates the executable at once

Posted by Walter Bright
in reply to Era Scarecrow

Permalink

Walter Bright

Posted in reply to Era Scarecrow

Permalink

On 2/10/2022 8:18 PM, Era Scarecrow wrote:
>   To note, back in 1996 or about there i wrote an assembler that took x86 and could compiler itself. But wasn't compatible with any other code and couldn't use object files or anything (*as it was all made from scratch when i was 12-14*). However it did compiler directly to a COM file. I'll just say from experience, there are advantages but they don't outweigh the disadvantages. That's my flat opinion going from here.

Back in the olden days, creating a DOS executable was trivial. Things have gotten much more complicated.

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by max haughton
in reply to Era Scarecrow

Permalink

max haughton

Posted in reply to Era Scarecrow

Permalink

On Friday, 11 February 2022 at 04:18:42 UTC, Era Scarecrow wrote:

On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:

TCC (Tiny C Compiler) does this like 10 years ago. TCC was originally made as part of the obfuscation programming challenge, and then got updated to be more complete.

https://www.bellard.org/tcc/

Optimizations are slow, and optimizations that aren't a total mess when implemented require abstraction. Making those abstractions cheap is difficult, so you end up with LLVM and GCC being slower even on debug builds because they have more layers of abstraction (or rather take less shortcuts). It's probably very possible to equalise this performance with a more niche compiler, but it would also probably require a really immense effort and probably starting from scratch around a new concept (a la LLVM).

As for legacy code, there probably are branches being tested for old processors in places, but for the most part GCC's algorithms may look a bit crude (i.e. some of GCC's development practices are very 1980s compared to LLVM and will probably scare off new money and minds and kill the project in the long run) because of their C heritage, but they are still the benchmark to beat. The Itanium scheduler won't be running on an X86 target, to be clear.

I'm also not convinced the compiler assembling code itself is all that useful, it probably is marginally faster but on a modern system I couldn't measure it as significant on basically any workload. It's basically performance theatre, the performance of the semantic analysis or moving bytes around prior to object code however it's emitted is much more important.

The dmd backend gets a 6/10 for me when it comes to performance. The algorithms are very simple, it should really be faster than it is. The parts that actually emit the object code are particularly slow.

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by Dennis
in reply to Walter Bright

Permalink

Dennis

Posted in reply to Walter Bright

Permalink

On Friday, 11 February 2022 at 06:33:20 UTC, Walter Bright wrote:

Now suppose X.obj and Y.obj both define foo. Link with:

link X.obj Y.obj A.lib B.lib C.lib

You get a message:

Multiple definition of "foo", found in X.obj and Y.obj

Unless your compiler places all functions in COMDATs of course.

https://github.com/dlang/dmd/blob/a176f0359a07fa5a252518b512f3b085a43a77d8/src/dmd/backend/backconfig.d#L303
https://issues.dlang.org/show_bug.cgi?id=15342

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by Dennis
in reply to Walter Bright

Permalink

Dennis

Posted in reply to Walter Bright

Permalink

On Friday, 11 February 2022 at 06:33:20 UTC, Walter Bright wrote:

Now suppose X.obj and Y.obj both define foo. Link with:

link X.obj Y.obj A.lib B.lib C.lib

You get a message:

Multiple definition of "foo", found in X.obj and Y.obj

Don't rely on this when using DMD though, since it likes to place all functions in COMDATs, meaning the linker will just pick one foo instead of raising an error.

https://github.com/dlang/dmd/blob/a176f0359a07fa5a252518b512f3b085a43a77d8/src/dmd/backend/backconfig.d#L303
https://issues.dlang.org/show_bug.cgi?id=15342

February 11, 2022

Re: Potential of a compiler that creates the executable at once

Posted by rempas
in reply to bauss

Permalink

rempas

Posted in reply to bauss

Permalink

On Thursday, 10 February 2022 at 11:54:59 UTC, bauss wrote:

You see, there's a large misconception here.

Typically slow compile times aren't due to the LoC a project has, but rather what happens during the compilation.

Ex. template instantiation, functions executed at ctfe, preprocessing, optimization etc.

I've seen projects with only a couple thousand lines of code compile slower than projects with hundreds of thousands of lines of code.

Yeah, Of course! There is no misconception here. Templates play a role. When talking about LoC/s I'm talking about clear lines and this is why I made it clear that in my example with TCC, I didn't used any preprocessors hence the 4M LoC were exactly 4.

Generally most compiles can read large source files and parse their tokens etc. really fast, it's usually what happens afterwards that are the bottleneck.

Say if you have a project that is compiling very slow, usually you won't start out by cutting the amount of lines you have, because that's often not as easy or even possible, but rather you profile where the compiler is spending most of its time and then you attempt to resolve it, ex. perhaps you're running nested loops that are unnecessary etc. at compile-time and so on.

Of course, the backed is what matters. TCC goes from source file to object file directly. GCC/D/Rust etc. Go from source file, to IR (maybe DMD doesn't but LDC, GDC do), then Assembly and then object file so this takes many times more than if you did it directly. But even then, TCC/Vox are many more times faster so still there is something more. Idk...

Top | Forum index | About this forum

Forums