February 26, 2016
On 02/26/2016 10:38 AM, David Nadlinger wrote:
> On Thursday, 25 February 2016 at 23:06:43 UTC, H. S. Teoh wrote:
>> Are there any low-hanging fruit left that could make dmd faster?
>
> A big one would be overhauling the template mangling scheme so it does
> not generate mangled names a few hundred kilo (!) bytes in size anymore
> for code that uses templates and voldemort types.

My understanding is the main problem is the _same_ templates are repeatedly instantiated with the same exact parameters - the epitome of redundant work. -- Andrei

February 26, 2016
On Fri, Feb 26, 2016 at 01:53:21PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
> On 02/26/2016 10:38 AM, David Nadlinger wrote:
> >On Thursday, 25 February 2016 at 23:06:43 UTC, H. S. Teoh wrote:
> >>Are there any low-hanging fruit left that could make dmd faster?
> >
> >A big one would be overhauling the template mangling scheme so it does not generate mangled names a few hundred kilo (!) bytes in size anymore for code that uses templates and voldemort types.
> 
> My understanding is the main problem is the _same_ templates are repeatedly instantiated with the same exact parameters - the epitome of redundant work.
[...]

I must be missing something, but why can't we use the obvious solution to use some kind of hash table to track previous instantiations?


T

-- 
There are two ways to write error-free programs; only the third one works.
February 26, 2016
On Friday, 26 February 2016 at 18:53:21 UTC, Andrei Alexandrescu wrote:
> My understanding is the main problem is the _same_ templates are repeatedly instantiated with the same exact parameters - the epitome of redundant work. -- Andrei

Within one compiler execution, there might be some optimization potential in the way semantically equivalent template instantiations are merged, yes – it's been a while since I have looked at the related code (see e.g. TemplateDeclaration.findExistingInstance).

Another area matching your description would be that of the same template being instantiated from multiple compilation units, where it can be omitted from some of the compilation units (i.e. object files). Our current logic for that is broken anyway, see e.g. https://issues.dlang.org/show_bug.cgi?id=15318.

I was referring to something different in my post, though, as the question concerned "low-hanging fruit". The problem there is really just that template names sometimes grow unreasonably long pretty quickly. As an example, without wanting to divulge any internals, some of the mangled symbols (!) in the Weka codebase are several hundred kilobytes in size. core.demangle gives up on them anyway, and they appear to be extremely repetitive. Note that just like in Steven's post which I linked earlier, the code in question does not involve any crazy recursive meta-templates, but IIRC makes use of Voldemort types. Tracking down and fixing this – one would almost be tempted to just use standard data compression – would lead to a noticeable decrease in compile and link times for affected code.

 — David
February 26, 2016
On 26.02.2016 19:34, Iain Buclaw via Digitalmars-d wrote:
> On 26 Feb 2016 9:45 am, "Walter Bright via Digitalmars-d"
> <digitalmars-d@puremagic.com <mailto:digitalmars-d@puremagic.com>> wrote:
>  >
>  > On 2/26/2016 12:20 AM, Iain Buclaw via Digitalmars-d wrote:
>  >>
>  >> I thought that mulithreaded I/O did not change anything, or slowed
> compilation
>  >> down in some cases?
>  >>
>  >> Or I recall seeing a slight slowdown when I first tested it in gdc
> all those
>  >> years ago.  So left it disabled - probably for the best too.
>  >
>  >
>  >
>  > Running one test won't really give much useful information. I also wrote:
>  >
>  > "On a machine with local disk and running nothing else, no speedup.
> With a slow filesystem, like an external, network, or cloud (!) drive,
> yes. I would also expect it to speed up when the machine is running a
> lot of other stuff."
>
> Ah ha. Yes I can sort of remember that comment.
>
> One interesting line of development (though would be difficult to
> implement) would be to do all three semantic passes asynchronously using
> fibers.
>
> If I understand correctly, sdc already does this with many cases that
> need ironing out.
>

Different passes are not really required once semantic analysis becomes asynchronous. Just keep track of semantic analysis dependencies, with strong and weak dependencies and different means to resolve cycles of weak dependencies. Then write the semantic analysis of each component in a linear fashion and pause it whenever it depends on information that has not yet been obtained, until that information is computed.
February 26, 2016
On 26 Feb 2016 10:16 pm, "Timon Gehr via Digitalmars-d" < digitalmars-d@puremagic.com> wrote:
>
> On 26.02.2016 19:34, Iain Buclaw via Digitalmars-d wrote:
>>
>> On 26 Feb 2016 9:45 am, "Walter Bright via Digitalmars-d"
>> <digitalmars-d@puremagic.com <mailto:digitalmars-d@puremagic.com>> wrote:
>>  >
>>  > On 2/26/2016 12:20 AM, Iain Buclaw via Digitalmars-d wrote:
>>  >>
>>  >> I thought that mulithreaded I/O did not change anything, or slowed
>> compilation
>>  >> down in some cases?
>>  >>
>>  >> Or I recall seeing a slight slowdown when I first tested it in gdc
>> all those
>>  >> years ago.  So left it disabled - probably for the best too.
>>  >
>>  >
>>  >
>>  > Running one test won't really give much useful information. I also
wrote:
>>  >
>>  > "On a machine with local disk and running nothing else, no speedup.
>> With a slow filesystem, like an external, network, or cloud (!) drive,
>> yes. I would also expect it to speed up when the machine is running a
>> lot of other stuff."
>>
>> Ah ha. Yes I can sort of remember that comment.
>>
>> One interesting line of development (though would be difficult to implement) would be to do all three semantic passes asynchronously using fibers.
>>
>> If I understand correctly, sdc already does this with many cases that need ironing out.
>>
>
> Different passes are not really required once semantic analysis becomes
asynchronous. Just keep track of semantic analysis dependencies, with strong and weak dependencies and different means to resolve cycles of weak dependencies. Then write the semantic analysis of each component in a linear fashion and pause it whenever it depends on information that has not yet been obtained, until that information is computed.

Yes.  In our case, it may be best to go for small steps.  First remove the 'deferred' semantic pass, then merge semantic 1+2, then finally as you describe above.

Easier said than done I guess though.


February 26, 2016
On 2/26/2016 5:15 AM, Steven Schveighoffer wrote:
> I think it's much stronger when the email/logs are maintained by a disinterested
> third party.
>
> For example, I'd say emails that were maintained on a private server by one of
> the parties in the case would be less reliable than logs stored on yahoo's
> servers that neither party has access to.
>
> There would also be no shortage of witnesses "Yes, I remember the day Walter
> added feature x, and github's logs are correct".
>
> I think Walter is on solid ground there.
>
> -Steve

Not only that, everyone who has accessed the github repository has their own copy of the repository. It's a distributed repository, not a single sourced one.

I also keep the email logs I get from it.

It's a thousand times better than producing a date stamp from a file on my backup hard disk.

February 26, 2016
On 2/26/2016 3:45 AM, Dibyendu Majumdar wrote:
> On Friday, 26 February 2016 at 11:35:04 UTC, Dibyendu Majumdar wrote:
>> On Friday, 26 February 2016 at 06:19:27 UTC, Walter Bright wrote:
>>> [...]
>>
>> I recall there was a thread in the LLVM mailing list last year about moving to
>> a different license. So maybe that is on the cards, and the D community could
>> chip on that conversation.
>>
>
> I am referring to this thread:
>
> http://lists.llvm.org/pipermail/llvm-dev/2015-October/091536.html

Thanks for the pointer. If anyone wants to chip in on that thread, feel free!
February 26, 2016
On 2/26/2016 11:17 AM, David Nadlinger wrote:
> I was referring to something different in my post, though, as the question
> concerned "low-hanging fruit". The problem there is really just that template
> names sometimes grow unreasonably long pretty quickly. As an example, without
> wanting to divulge any internals, some of the mangled symbols (!) in the Weka
> codebase are several hundred kilobytes in size. core.demangle gives up on them
> anyway, and they appear to be extremely repetitive. Note that just like in
> Steven's post which I linked earlier, the code in question does not involve any
> crazy recursive meta-templates, but IIRC makes use of Voldemort types. Tracking
> down and fixing this – one would almost be tempted to just use standard data
> compression – would lead to a noticeable decrease in compile and link times for
> affected code.

A simple solution is to just use lz77 compression on the strings. This is used for Win32 and works well. (I had a PR to put that in Phobos, but it was rejected.)

https://www.digitalmars.com/sargon/lz77.html

As a snide aside, the mangling schemes used by Microsoft and g++ have a built-in compression scheme, but they are overly complex and produce lousy results. Lz77 is simpler and far more effective :-)

An alternative is to generate an SHA hash of the name, which will be unique, but the downside is it is not reversible and so cannot be demangled.
February 26, 2016
On 2/26/2016 10:34 AM, Iain Buclaw via Digitalmars-d wrote:
> One interesting line of development (though would be difficult to implement)
> would be to do all three semantic passes asynchronously using fibers.

I'd be terrified of all the synchronizing that would be necessary there. The lexing, and code generation would be far easier to parallelize.

> If I understand correctly, sdc already does this with many cases that need
> ironing out.

The "many cases that need ironing out" is always the problem :-)

February 26, 2016
On 2/26/2016 1:10 PM, Timon Gehr wrote:
> Different passes are not really required once semantic analysis becomes
> asynchronous. Just keep track of semantic analysis dependencies, with strong and
> weak dependencies and different means to resolve cycles of weak dependencies.
> Then write the semantic analysis of each component in a linear fashion and pause
> it whenever it depends on information that has not yet been obtained, until that
> information is computed.

I'll put you in charge of debugging that :-)