Thread overview
Compiler scalability. Question inspired by OOM errors seen by Crystal language
Aug 30, 2017
Pradeep Gowda
Aug 30, 2017
Nicholas Wilson
Aug 30, 2017
H. S. Teoh
Aug 31, 2017
Stefan Koch
Aug 31, 2017
H. S. Teoh
Aug 31, 2017
Stefan Koch
Aug 30, 2017
H. S. Teoh
August 30, 2017
I'm referring to this thread about Crystal --  https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable

> Crystal is strongly typed, but overwhelmingly uses type inference, rather than explicit types. Because Crystal aims to be spiritually—and frequently literally—compatible with Ruby, that’s a problem: to accomplish that, Crystal relies on sometimes-nullable types with implicit structure and implicit unions, such that, frequently, the only way to even begin type inference is to load the entire program’s AST into RAM all at once and then start your massive type inference pass. What you’re seeing in this thread is how a “simple” fix to a YAML parser error reporting hit that problem, causing Crystal to use a critical amount too much RAM and OOM.

How does D compare in this regard, especially in cases where `auto` storage class specifiers are used liberally throughout the code base?
August 30, 2017
On Wednesday, 30 August 2017 at 01:30:30 UTC, Pradeep Gowda wrote:
> I'm referring to this thread about Crystal --  https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable
>
>> Crystal is strongly typed, but overwhelmingly uses type inference, rather than explicit types. Because Crystal aims to be spiritually—and frequently literally—compatible with Ruby, that’s a problem: to accomplish that, Crystal relies on sometimes-nullable types with implicit structure and implicit unions, such that, frequently, the only way to even begin type inference is to load the entire program’s AST into RAM all at once and then start your massive type inference pass. What you’re seeing in this thread is how a “simple” fix to a YAML parser error reporting hit that problem, causing Crystal to use a critical amount too much RAM and OOM.
>
> How does D compare in this regard, especially in cases where `auto` storage class specifiers are used liberally throughout the code base?

Auto is not the problem you can easily figure out the return type of function that return a primitive type, and aggregates have to be specified.

The problem with D is the memory hogging nature of CTFE and the sheer number of templates that get instantiated when compiling big codebases. Symbol length is also a problem but that eats you dose space not your RAM.
August 30, 2017
On Wednesday, 30 August 2017 at 02:53:42 UTC, Nicholas Wilson wrote:
> On Wednesday, 30 August 2017 at 01:30:30 UTC, Pradeep Gowda wrote:
>> I'm referring to this thread about Crystal --  https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable
>>
>>> Crystal is strongly typed, but overwhelmingly uses type inference, rather than explicit types. Because Crystal aims to be spiritually—and frequently literally—compatible with Ruby, that’s a problem: to accomplish that, Crystal relies on sometimes-nullable types with implicit structure and implicit unions, such that, frequently, the only way to even begin type inference is to load the entire program’s AST into RAM all at once and then start your massive type inference pass. What you’re seeing in this thread is how a “simple” fix to a YAML parser error reporting hit that problem, causing Crystal to use a critical amount too much RAM and OOM.
>>
>> How does D compare in this regard, especially in cases where `auto` storage class specifiers are used liberally throughout the code base?
>
> Auto is not the problem you can easily figure out the return type of function that return a primitive type, and aggregates have to be specified.
>
> The problem with D is the memory hogging nature of CTFE and the sheer number of templates that get instantiated when compiling big codebases. Symbol length is also a problem but that eats you dose space not your RAM.

Symbols do eat your RAM because, the compiler has to generate them in RAM before writing them to a binary, and also don't forget `pragma (msg, symbol.mangle)` - the mangled name of each symbol is available for use in CTFE. However the mangling problems should finally be put to rest, thanks to Rainer Schuetze, now that
https://github.com/dlang/dmd/pull/5855 / https://github.com/dlang/dmd/pull/6998 was merged.
August 30, 2017
On Wednesday, 30 August 2017 at 01:30:30 UTC, Pradeep Gowda wrote:
> I'm referring to this thread about Crystal --  https://lobste.rs/s/dyitr0/its_fun_program_crystal_is_it_scalable
>
>> Crystal is strongly typed, but overwhelmingly uses type inference, rather than explicit types. Because Crystal aims to be spiritually—and frequently literally—compatible with Ruby, that’s a problem: to accomplish that, Crystal relies on sometimes-nullable types with implicit structure and implicit unions, such that, frequently, the only way to even begin type inference is to load the entire program’s AST into RAM all at once and then start your massive type inference pass. What you’re seeing in this thread is how a “simple” fix to a YAML parser error reporting hit that problem, causing Crystal to use a critical amount too much RAM and OOM.
>
> How does D compare in this regard, especially in cases where `auto` storage class specifiers are used liberally throughout the code base?

D supports separate compilation by design. I.e. it doesn't require all the source files corresponding to all the object files being linked to produce the final executable, to be loaded in memory by the compiler.
August 30, 2017
On Wed, Aug 30, 2017 at 02:53:42AM +0000, Nicholas Wilson via Digitalmars-d wrote: [...]
> The problem with D is the memory hogging nature of CTFE and the sheer number of templates that get instantiated when compiling big codebases. Symbol length is also a problem but that eats you dose space not your RAM.

The symbol length problem has been greatly reduced by Rainer's recent mangle PR, which is now in git master and will be in the following release. There is still room for even more reduction but they will be comparatively minor. Rainer's fix has eliminated the bulk of the symbol length problem.

The CTFE problem will be fixed soon, once Stefan Koch finishes his newCTFE engine.

Templates are still an area where a lot of improvement can be made. But Stefan has indicated interest in visiting this area in the compiler after the newCTFE project is done.  So eventually we'll get there.


T

-- 
Famous last words: I *think* this will work...
August 30, 2017
On Wed, Aug 30, 2017 at 08:01:17AM +0000, via Digitalmars-d wrote: [...]
> D supports separate compilation by design. I.e. it doesn't require all the source files corresponding to all the object files being linked to produce the final executable, to be loaded in memory by the compiler.

Yes, I think the general recommendation for very large codebases is to compile one package at a time, i.e., one subdir at a time.

Nevertheless, compiler memory usage remains an issue on low-memory systems.  My old work PC simply cannot handle running dmd without grinding to a halt because there's just not enough RAM to go around (only 500MB total).  It's generally not a problem for modern PCs with at least a few GB of memory.  Walter chose compilation speed over memory efficiency, so that's just the way it is.  In theory, one could turn on GC in the compiler so that it will work on low-memory systems, but I'm not sure if such a change will be accepted into dmd.


T

-- 
Questions are the beginning of intelligence, but the fear of God is the beginning of wisdom.
August 30, 2017
On Wednesday, 30 August 2017 at 16:39:32 UTC, H. S. Teoh wrote:
> On Wed, Aug 30, 2017 at 08:01:17AM +0000, via Digitalmars-d wrote: [...]
>> D supports separate compilation by design. I.e. it doesn't require all the source files corresponding to all the object files being linked to produce the final executable, to be loaded in memory by the compiler.
>
> Yes, I think the general recommendation for very large codebases is to compile one package at a time, i.e., one subdir at a time.

Yep, that avoids parsing the same file over and over again (like would be the case in C++ with commonly used header files for each translation unit), yet provides enough granularity for parallelism and also somewhat limits the memory problem.

> Nevertheless, compiler memory usage remains an issue on low-memory systems.  My old work PC simply cannot handle running dmd without grinding to a halt because there's just not enough RAM to go around (only 500MB total).  It's generally not a problem for modern PCs with at least a few GB of memory.  Walter chose compilation speed over memory efficiency, so that's just the way it is.  In theory, one could turn on GC in the compiler so that it will work on low-memory systems, but

(Nods)

> I'm not sure if such a change will be accepted into dmd.
>
>
> T

We could always make that a compiler switch. I'm not sure if a substantial benefit could be gained during parsing and semantic analysis (e.g. consider staticIota - each instantiation needs to be kept in memory, since you never know when it will be needed again), but certainly once you convert the AST to the backend IR, you could immediately drop all memory allocated by the frontend.
August 31, 2017
On Wednesday, 30 August 2017 at 16:34:13 UTC, H. S. Teoh wrote:
>
> The CTFE problem will be fixed soon, once Stefan Koch finishes his newCTFE engine.
>
> Templates are still an area where a lot of improvement can be made. But Stefan has indicated interest in visiting this area in the compiler after the newCTFE project is done.  So eventually we'll get there.
>
>
> T

Yes that is true.
However since I've just taken a full-time job I cannot say when it will be finished.

As for templates I have chosen to work around the issue entirely and integrate type-manipulation and introspection abilities with ctfe. (Which is inherently more scalable then templates)
August 31, 2017
On Thu, Aug 31, 2017 at 04:36:41PM +0000, Stefan Koch via Digitalmars-d wrote:
> On Wednesday, 30 August 2017 at 16:34:13 UTC, H. S. Teoh wrote:
> > 
> > The CTFE problem will be fixed soon, once Stefan Koch finishes his newCTFE engine.
> > 
> > Templates are still an area where a lot of improvement can be made. But Stefan has indicated interest in visiting this area in the compiler after the newCTFE project is done.  So eventually we'll get there.
> > 
> > 
> > T
> 
> Yes that is true.
> However since I've just taken a full-time job I cannot say when it
> will be finished.
> 
> As for templates I have chosen to work around the issue entirely and integrate type-manipulation and introspection abilities with ctfe. (Which is inherently more scalable then templates)

I think D's conception of templates, which IMO goes beyond C++'s (conceptually, anyway, the current implementation is not too different fundamentally), can be implemented in a much better way that makes it more scalable.

C++ sees templates as code stencils to be instantiated with the given template arguments, i.e., copy-n-paste the stencil and substitute template parameters with the given arguments.

D's templates for the most part still retains that, and certainly the current implementation essentially just follows the C++ model. But IMO there's an important conceptual difference: D sees template parameters not so much as parameters for code stencils to be copy-n-pasted, but as *compile-time* parameters to a function / set of declarations.  I.e., they are parameters that are known at compile-time rather than runtime.

The distinction is subtle but IMO important, because it allows us to break out of the C++ mold of copy-n-paste-this-stencil that is one of the causes of template scalability problems (pass too many different template arguments, and you get massive code bloat, one copy per instantiation).  By treating template arguments as arguments known at compile-time instead, the compiler can get smarter with when to copy-n-paste a declaration, and when to just evaluate the template with the argument without actually instantiating anything (no copying of AST nodes, etc.).

A common example is a templated linked-list container, where much of the linked-list pointer manipulation code is actually independent of the type of the contained data.  The compiler really only needs to generate distinct copies of the code when the data itself is being manipulated; all the other code can be shared between template instantiations.

This is just one example off the top of my head; I'm sure there are plenty of others that we can come up with once we break free of the C++ mold of thinking of templates as "copy-n-paste except substitute X with Y".  Another example that just occurred to me is the various recursive templates in std.meta for manipulating AliasSeq's.  There is actually no need for the compiler to instantiate anything at all, except perhaps for the final result. By treating the AliasSeq as an in-compiler data type with compile-time operations performed on it, the compiler ought to be able to evaluate the template without needing to instantiate anything past the first level of recursion.


T

-- 
"How are you doing?" "Doing what?"
August 31, 2017
On Thursday, 31 August 2017 at 17:39:35 UTC, H. S. Teoh wrote:

>
> This is just one example off the top of my head; I'm sure there are plenty of others that we can come up with once we break free of the C++ mold of thinking of templates as "copy-n-paste except substitute X with Y".  Another example that just occurred to me is the various recursive templates in std.meta for manipulating AliasSeq's.  There is actually no need for the compiler to instantiate anything at all, except perhaps for the final result. By treating the AliasSeq as an in-compiler data type with compile-time operations performed on it, the compiler ought to be able to evaluate the template without needing to instantiate anything past the first level of recursion.
>
>
> T

That is a good example and it is one where we could maybe deduce what is doing on.
However the classification of a template (rather the deduction of unused intanciations) actually requires us to instantiate the template.
Because we cannot predict to what it might evaluate given specific parameters.
For this to work we have to produce all possible instantiation behaviors of a template, and deduplicate this set.
Which happens to be infinite, in most cases.
Even predicting if the set of instantiations can be finite is essentially an instance of the halting problem.

I've worked on this for some time until I realized that I was fighting without gaining ground.

It would end up a myriad of special cases and would be impossible to maintain.

Considering that the current template system which does not try anything smart is already fiendishly complicated, it'd rather not introduce an even more complicated template-instance-shape-predictor which will probably only work on the simplest of cases and is not guaranteed to bring wins that would justify the time it'll take to implement it.

TL;DR

The change to optimize certain groups of templates will require MASSIVE amounts of work and is unlikely to pay for itself.