November 23, 2018
On Friday, 23 November 2018 at 13:23:22 UTC, welkam wrote:
> If we run these steps in different thread on the same core with SMT we could better use core`s resources. Reading file with kernel, decoding UTF-8 with vector instructions and lexing/parsing with scalar operations while all communication is done trough L1 and L2 cache.

You might save some pages from the data cache, but by doing more work at once, the code might stop fitting in the execution-related caches (code pages, microcode, branch prediction) instead.

November 23, 2018
On Thursday, 22 November 2018 at 04:48:09 UTC, Vladimir Panteleev wrote:
>
> Sorry about that. I'll have to think of two titles next time, one for the D community and one for everyone else.
>
> If it's of any consolation, the top comments in both discussion threads point out that the title is inaccurate on purpose.

Your post on reddit received more comments than D front ends inclusion to GCC. If you titled your post differently you probably wouldn't had such success so from my perspective its a net positive. Sure there are few people that took the wrong message but there are more people who saw your post
November 23, 2018
On Friday, 23 November 2018 at 14:32:39 UTC, Vladimir Panteleev wrote:
> On Friday, 23 November 2018 at 13:23:22 UTC, welkam wrote:
>> If we run these steps in different thread on the same core with SMT we could better use core`s resources. Reading file with kernel, decoding UTF-8 with vector instructions and lexing/parsing with scalar operations while all communication is done trough L1 and L2 cache.
>
> You might save some pages from the data cache, but by doing more work at once, the code might stop fitting in the execution-related caches (code pages, microcode, branch prediction) instead.

Its not about saving tlb pages or fitting better in cache. Compilers are considered streaming applications - they dont utilize cpu caches effectively. You cant read one character and emit machine code then read next character you have to go over all data multiple times while you modify it. I can find white papers if you interested where people test GCC with different cache architectures and it doesnt make much of a difference. GCC is popular application when testing caches.

Here are profiling data from DMD
 Performance counter stats for 'dmd -c main.d':

            600.77 msec task-clock:u              #    0.803 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
            33,209      page-faults:u             # 55348.333 M/sec
     1,072,289,307      cycles:u                  # 1787148.845 GHz
       870,175,210      stalled-cycles-frontend:u #   81.15% frontend cycles idle
       721,897,927      stalled-cycles-backend:u  #   67.32% backend cycles idle
       881,895,208      instructions:u            #    0.82  insn per cycle
                                                  #    0.99  stalled cycles per insn
       171,211,752      branches:u                # 285352920.000 M/sec
        11,287,327      branch-misses:u           #    6.59% of all branches

       0.747720395 seconds time elapsed

       0.497698000 seconds user
       0.104165000 seconds sys

Most important data in this conversation is 0.82  insn per cycle. My CPU could do ~2 IPC so there are plenty of CPU resources available. New Intel desktop processors are designed to do 4 insn/cycle. What is limiting DMD performance is slow RAM, data fetching and not what you listed.
code pages - you mean TLB here?

microcode cache. Not all processors have it and those who have only benefit trivial loops. DMD have complex loops.

branch prediction. More entries in branch predictor wont help here because branches are missed because data is unpredictable not because there are too many branches. Also branch missprediction penalty is around 30 cycles while reading from RAM could be over 200 cycles.

L1 code cache. You didnt mention this but running those tasks in SMT mode might trash L1$ so execution might not be optimal.

Instead of parallel reading of imports DMD needs more data oriented data structures instead of old OOP inspired data structures. Ill give you example why its the case.

Consider
struct {
    bool isAlive;
    <other data at least 7 bytes of size>
}

If you want to read data from that bool CPU needs to fetch 8 bytes of data(cache line of 64 bits). What this means is that for one bit of information CPU fetches 64 bits of data resulting in 1/64 = 0.015625 or ~1.6 % signal to noise ratio. This is terrible!

AFAIK DMD doesnt make this kind of mistake but its full of large structs and classes that are not efficient to read.  To fix this we need to split those large data structures into smaller ones that only contain what is needed for particular algorithm. I predict 2x speed improvement if we transform all data structures in DMD. Thats improvement without improving algorithms only changing data structures. This getting too longs so i will stop right now
November 23, 2018
On 11/23/2018 2:12 AM, Jacob Carlborg wrote:
> Would it be possible to have one string table per thread and merge them to one single shared string table before continuing with the next phase?

It'd probably be even slower because one would have to rewrite all the pointers into the string table.

November 23, 2018
On 11/23/2018 5:23 AM, welkam wrote:
> Currently D reads the all files that are passed in command line before starting lexing/parsing, but in principle we could start lexing/parsing after first file is read. In fact we could start after first file`s first line is read.

DMD used to do that. But it was removed because:

1. nobody understood the logic

2. it didn't seem to make a difference

You can still see the vestiges by the:

    static if (ASYNCREAD)

blocks in the code.
November 23, 2018
On 11/23/2018 6:37 AM, welkam wrote:
> Your post on reddit received more comments than D front ends inclusion to GCC. If you titled your post differently you probably wouldn't had such success so from my perspective its a net positive. Sure there are few people that took the wrong message but there are more people who saw your post

It definitely shows the value of a provocative title!
November 23, 2018
On Friday, 23 November 2018 at 19:21:03 UTC, Walter Bright wrote:
> On 11/23/2018 5:23 AM, welkam wrote:
>> Currently D reads the all files that are passed in command line before starting lexing/parsing, but in principle we could start lexing/parsing after first file is read. In fact we could start after first file`s first line is read.
>
> DMD used to do that. But it was removed because:
>
> 1. nobody understood the logic
>
> 2. it didn't seem to make a difference
>
> You can still see the vestiges by the:
>
>     static if (ASYNCREAD)
>
> blocks in the code.

I didnt expect huge wins. This would be useful when you start your computer and files have to be read from old spinning rust and the project has many files. Otherwise files will be cached and memcopy is fast. I was surprised on how fast modern computers copy data from one place to another.

Speaking of memcpy here is a video you might like. It has memcpy, assembler and a bit of compiler. Its very easy watch for when you want to relax.
Level1 Diagnostic: Fixing our Memcpy Troubles (for Looking Glass)
https://www.youtube.com/watch?v=idauoNVwWYE
November 26, 2018
On Thursday, 22 November 2018 at 04:48:09 UTC, Vladimir Panteleev wrote:
> On Wednesday, 21 November 2018 at 20:51:17 UTC, Walter Bright wrote:
>> Unfortunately, you're right. The title will leave the impression "D is slow at compiling". You have to carefully read the article to see otherwise, and few will do that.
>
> Sorry about that. I'll have to think of two titles next time, one for the D community and one for everyone else.
>
> If it's of any consolation, the top comments in both discussion threads point out that the title is inaccurate on purpose.

Please don't get me wrong, it's an excellent article, a provocative title, and fantastic work going on. I didn't meant to hurt!

In my opinion language adoption is a seduction/sales process very much like business-to-consumer is, the way I see it it's strikingly similar to marketing B2C apps, unless there will be no "impulse buy".

Actually no less than 3 programmer friends came to (I'm the weirdo-using-D and people are _always_ in disbelief and invent all sorts of reasons not to try) saying they saw an article on D on HN, with "D compilation is slow", and on further examination they didn't read or at best the first paragraph. But they did remember the title. They may rationally think their opinion of D hasn't changed: aren't we highly capable people?

I'm not making that up! So why is it a problem ?

HN may be the only time they hear about D. The words of the title may be their only contact with it. The first 3 words of the title may be the only thing associated with the "D language" chunk in their brain.

The associative mind doesn't know _negation_ so even a title like "D compilation wasn't fast so I forked the compiler" is better from a marketing point of view since it contains the word "fast" in it! That's why marketing people have the annoying habit of using positive words, you may think this stuff is unimportant but this is actually the important meat.

Reasonable people may think marketing and biases don't apply to them but they do, it works without your consent.

November 26, 2018
On Monday, 26 November 2018 at 16:00:36 UTC, Guillaume Piolat wrote:
> On Thursday, 22 November 2018 at 04:48:09 UTC, Vladimir Panteleev wrote:
>> On Wednesday, 21 November 2018 at 20:51:17 UTC, Walter Bright wrote:
>>> Unfortunately, you're right. The title will leave the impression "D is slow at compiling". You have to carefully read the article to see otherwise, and few will do that.
>>
>> Sorry about that. I'll have to think of two titles next time, one for the D community and one for everyone else.
>>
>> If it's of any consolation, the top comments in both discussion threads point out that the title is inaccurate on purpose.
>
> Please don't get me wrong, it's an excellent article, a provocative title, and fantastic work going on. I didn't meant to hurt!
>
> In my opinion language adoption is a seduction/sales process very much like business-to-consumer is, the way I see it it's strikingly similar to marketing B2C apps, unless there will be no "impulse buy".

I find that hard to believe: we are talking about a technical tool here.

Also, regardless of how languages are chosen as they get into the majority, D is very much still in the innovators/early-adopters stage:

https://en.m.wikipedia.org/wiki/Technology_adoption_life_cycle

That is a very different type of sales process, much more geared towards what the new tech can actually do.

> Actually no less than 3 programmer friends came to (I'm the weirdo-using-D and people are _always_ in disbelief and invent all sorts of reasons not to try) saying they saw an article on D on HN, with "D compilation is slow", and on further examination they didn't read or at best the first paragraph. But they did remember the title. They may rationally think their opinion of D hasn't changed: aren't we highly capable people?

With people like that, it's almost impossible to get them in the early adopter stage. They will only jump on the bandwagon once it's full, ie as part of the late majority.

> I'm not making that up! So why is it a problem ?
>
> HN may be the only time they hear about D. The words of the title may be their only contact with it. The first 3 words of the title may be the only thing associated with the "D language" chunk in their brain.
>
> The associative mind doesn't know _negation_ so even a title like "D compilation wasn't fast so I forked the compiler" is better from a marketing point of view since it contains the word "fast" in it! That's why marketing people have the annoying habit of using positive words, you may think this stuff is unimportant but this is actually the important meat.
>
> Reasonable people may think marketing and biases don't apply to them but they do, it works without your consent.

I agree that it was a risky title, as many who don't know D will simply see it and go, "Yet another slow compiler, eh, I'll pass" and not click on the link. Whereas others who have heard something of D will be intrigued, as they know it's already supposed to compile fast. And yet more others will click on it purely for the controversy, just to gawk at some technical bickering.

Given how well it did on HN/reddit/lobste.rs, I think Vlad's gamble probably paid off. We can't run the counterfactual of choosing a safer title to see if it would have done even better, let's just say it did well enough. ;)
November 26, 2018
On Monday, 26 November 2018 at 16:21:39 UTC, Joakim wrote:

> I agree that it was a risky title, as many who don't know D will simply see it and go, "Yet another slow compiler, eh, I'll pass" and not click on the link. Whereas others who have heard something of D will be intrigued, as they know it's already supposed to compile fast. And yet more others will click on it purely for the controversy, just to gawk at some technical bickering.

I don't actually think it was risky. What are the odds that someone was going to start using D for a major project but then changed her mind upon seeing a title on HN or Reddit? Probably very small that even one person did that.

On the other hand, it says a lot of other things:

- There's an active community that cares about the language.
- It's not a dying language.
- Fast compilation is a realistic possibility.
- There are users with the technical ability to make the compiler faster.

And then there is always the fact that there was a story on HN/Reddit about D. It's hard for publicity for a language like D to be bad when so few people use it.