February 11, 2022

On Friday, 11 February 2022 at 04:18:42 UTC, Era Scarecrow wrote:

>

On Thursday, 10 February 2022 at 09:41:12 UTC, rempas wrote:

>

A couple of months ago, I found out about a language called Vox which uses a design that I haven't seen before by any other compiler which is to not create object files and then link them together but instead, always create an executable at once.

TCC (Tiny C Compiler) does this like 10 years ago. TCC was originally made as part of the obfuscation programming challenge, and then got updated to be more complete.

https://www.bellard.org/tcc/

If one wants to get really historic it is also what made Turbo Pascal did up to version 3.0. With Turbo Pascal 4.0 they went back to more classic object file/linker and there is a good reason for that. Separate compilation and linking modules and libraries are a thing. If you build the compiler for direct executable production you have to still support normal object file/library handling i.e. you put the functionality of the linker into your compiler.

February 11, 2022
On Fri, Feb 11, 2022 at 04:47:46PM +0000, user1234 via Digitalmars-d wrote:
> On Friday, 11 February 2022 at 16:41:33 UTC, user1234 wrote:
> > On Friday, 11 February 2022 at 15:17:16 UTC, rempas wrote:
> > > On Friday, 11 February 2022 at 14:52:09 UTC, max haughton wrote:
> > > > 
> > > > The object emission code in the backend is quite inefficient, it needs to be rewritten (it's horrible old code anyway)
> > > 
> > > I would love if they would do it but I can't complain that they don't. Openhub reports that [DMD] consists of 961K LoC!!
> > 
> > Openhub and their metrics are old trash. It's more 170K according to D-Scanner.
> 
> wait... it's 175K. I had not pulled since 8 monthes or so. There's much new code that was commited since, with importC notably.

I pulled just this week, and running `wc` on *.d *.c *.h says there are 365K lines.  I'm not sure what the *.h files are for, since DMD is now bootstrapping. Excluding *.h yields 347K lines.  But a lot of those are actually blank lines and comments; excluding // comments, /**/ and /++/ block comments, and blank lines yields 175K.

The 961K probably comes from the myriad test cases in the testsuite, where more lines is actually a *good* thing.

But really, LoC is an unreliable measure of code complexity. Token count would be more reflective of the actual complexity of the code, though even that is questionable. Writing `enum x = 1 + 1;` would be 7 tokens vs. `enum x = 2;` which is 5 tokens, for example, but the former may actually make code easier to read in certain cases (e.g., if the longer expression makes intent clearer that the shorter one).

Compressed size may be an even better approximation, because a high degree of complexity approaches Kolgomorov complexity in the limit, which is a measure of the information content of the data. Stripping comments and compressing (with the best compression algorithm you can find), for example, would give a good approximation to the actual complexity in the code.  Though of course, even that fails to measure the inherent level of complexity in language constructs. So you couldn't meaningfully compare compressed sizes across different languages, for example.


T

-- 
Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn
February 11, 2022
On Friday, 11 February 2022 at 17:36:37 UTC, H. S. Teoh wrote:

> I pulled just this week, and running `wc` on *.d *.c *.h says...

https://github.com/AlDanial/cloc would yield a more practical metric, at least as far as "practical metric" in terms of LoC goes.
February 11, 2022
On Friday, 11 February 2022 at 17:44:45 UTC, Stanislav Blinov wrote:
> On Friday, 11 February 2022 at 17:36:37 UTC, H. S. Teoh wrote:
>
>> I pulled just this week, and running `wc` on *.d *.c *.h says...
>
> https://github.com/AlDanial/cloc would yield a more practical metric, at least as far as "practical metric" in terms of LoC goes.

```
---------------------------------------------------------------------------------------
Language                             files          blank        comment           code
---------------------------------------------------------------------------------------
D                                     3867          75824          88426         431299
HTML                                   114          11405            967          61083
C/C++ Header                            57           2729            992          23332
C                                       93            830            797           3346
C++                                     19            532            139           2249
```
this includes the test suite and other stuff that isn't technically the compiler-proper.
February 11, 2022
On Fri, Feb 11, 2022 at 05:44:45PM +0000, Stanislav Blinov via Digitalmars-d wrote:
> On Friday, 11 February 2022 at 17:36:37 UTC, H. S. Teoh wrote:
> 
> > I pulled just this week, and running `wc` on *.d *.c *.h says...
> 
> https://github.com/AlDanial/cloc would yield a more practical metric, at least as far as "practical metric" in terms of LoC goes.

I'm skeptical of any LoC metric.


T

-- 
What do you mean the Internet isn't filled with subliminal messages? What about all those buttons marked "submit"??
February 11, 2022
On Friday, 11 February 2022 at 17:36:37 UTC, H. S. Teoh wrote:
> On Fri, Feb 11, 2022 at 04:47:46PM +0000, user1234 via Digitalmars-d wrote:
>> On Friday, 11 February 2022 at 16:41:33 UTC, user1234 wrote:
>> > On Friday, 11 February 2022 at 15:17:16 UTC, rempas wrote:
>> > > [...]
>> > 
>> > Openhub and their metrics are old trash. It's more 170K according to D-Scanner.
>> 
>> wait... it's 175K. I had not pulled since 8 monthes or so. There's much new code that was commited since, with importC notably.
>
> I pulled just this week, and running `wc` on *.d *.c *.h says there are 365K lines.  I'm not sure what the *.h files are for,

Ah yes, the h files... D-Scanner does not take them in account.
They are still used by GDC I believe.



February 11, 2022

On Friday, 11 February 2022 at 16:40:42 UTC, Dennis wrote:

>

DMD goes from its own backend block tree to an object file, without writing assembly. In fact, only recently was the ability to output asm added for debugging purposes:
https://dlang.org/blog/2022/01/24/the-binary-language-of-moisture-vaporators/

On Linux dmd invokes gcc by default to create an executable, but only to link the resulting object files, not to compile C/assembly code.

LDC goes from LLVM IR to machine code, but it can output assembly with the -output-s flag.

GDC does generate assembly text to the tmp folder and then invokes gas the GNU assembler, it can't directly write machine code.

Thank you! This sums it up perfectly! Can you choose to pass it directly to the linker with DMD on Linux? Something like setting "ld" (or another linker of course) as the "C" compiler, idk...

February 11, 2022
On Friday, 11 February 2022 at 16:47:46 UTC, user1234 wrote:
>> Openhub and their metrics are old trash. It's more 170K according to D-Scanner.
>
> wait... it's 175K. I had not pulled since 8 monthes or so. There's much new code that was commited since, with importC notably.

Thank you for the information! It seems pretty impressive to me that DMD only has 175K LoC in it's code base given the fact of how huge D is! Even without the recent commits (which how much could they be?), this seems to little to me. In that case, we can talk about re-writing it but again, that's up to the developers to decide.
February 11, 2022

On Friday, 11 February 2022 at 17:36:03 UTC, Patrick Schluter wrote:

>

If one wants to get really historic it is also what made Turbo Pascal did up to version 3.0. With Turbo Pascal 4.0 they went back to more classic object file/linker and there is a good reason for that. Separate compilation and linking modules and libraries are a thing. If you build the compiler for direct executable production you have to still support normal object file/library handling i.e. you put the functionality of the linker into your compiler.

Yep and that's what I love about it! You can have 2 ways to do the same thing and choose based on what's best for the case.

For example, if your projects has 10M LoC, even if you can compiler 1M LoC/S (which is a very big number), your project will need 10 seconds to build which will make it very annoying. In that case, we use the classic method of creating object files to the files that were changed and then link them together.

However, if your project is 1M LoC or less, that is less than 1 second to build it which is not noticeable at all. The same happens when the end-user compiles the software from source and doesn't care (and won't even keep) about the object files because he/she is not a developer. In that case it makes sense to not waste time creating the object file and go straight creating the executable/library.

If we are to make a new compiler (which I plan to), we should create a whole toolchain that will consist of all the tools. Sounds complex, I know but what's the point if we don't advance? Make another compiler that outputs assembly so it will always have dependencies and it will be slow to compile (slow compared to if we outputted machine language directly)?

February 11, 2022
On Friday, 11 February 2022 at 18:02:21 UTC, max haughton wrote:
> On Friday, 11 February 2022 at 17:44:45 UTC, Stanislav Blinov wrote:
>> On Friday, 11 February 2022 at 17:36:37 UTC, H. S. Teoh wrote:
>>
>>> I pulled just this week, and running `wc` on *.d *.c *.h says...
>>
>> https://github.com/AlDanial/cloc would yield a more practical metric, at least as far as "practical metric" in terms of LoC goes.
>
> ```
> ---------------------------------------------------------------------------------------
> Language                             files          blank        comment           code
> ---------------------------------------------------------------------------------------
> D                                     3867          75824          88426         431299
> HTML                                   114          11405            967          61083
> C/C++ Header                            57           2729            992          23332
> C                                       93            830            797           3346
> C++                                     19            532            139           2249
> ```
> this includes the test suite and other stuff that isn't technically the compiler-proper.

Interesting! We could remove the "test-suit" directory and we could tell it to only parse "D" language files which will give us more "clean" results. "cloc" is actually what I use and for DragonFlyBSD, it gave me the same number "OpenHub" gave so I really wonder how other source code or languages have different results...