Speeding up importing Phobos files (page 7)

June 08, 2019

Re: Speeding up importing Phobos files

Posted by Amex
in reply to Walter Bright

Permalink

Amex

Posted in reply to Walter Bright

Permalink

On Saturday, 19 January 2019 at 08:45:27 UTC, Walter Bright wrote:
> Andrei and I were talking on the phone today, trading ideas about speeding up importation of Phobos files. Any particular D file tends to import much of Phobos, and much of Phobos imports the rest of it. We've both noticed that file size doesn't seem to matter much for importation speed, but file lookups remain slow.
>
> So looking up fewer files would make it faster.
>
> Here's the idea: Place all Phobos source files into a single zip file, call it phobos.zip (duh). Then,
>
>     dmd myfile.d phobos.zip
>
> and the compiler will look in phobos.zip to resolve, say, std/stdio.d. If phobos.zip is opened as a memory mapped file, whenever std/stdio.d is read, the file will be "faulted" into memory rather than doing a file lookup / read. We're speculating that this should be significantly faster, besides being very convenient for the user to treat Phobos as a single file rather than a blizzard. (phobos.lib could also be in the same file!)
>
> It doesn't have to be just phobos, this can be a general facility. People can distribute their D libraries as a zip file that never needs unzipping.
>
> We already have https://dlang.org/phobos/std_zip.html to do the dirty work. We can experiment to see if compressed zips are faster than uncompressed ones.
>
> This can be a fun challenge! Anyone up for it?
>
> P.S. dmd's ability to directly manipulate object library files, rather than going through lib or ar, has been a nice success.

Why not compile phobos to an object file? Basically store the AST directly in to a file and just load it. Phobos never changes so why recompile it over and over and over and over?

This should be done with all files... sorta like rdmd, so to speak.

It might take some work to figure out how to get it all to work but maybe the time has come to stop using ancient design patterns and move on to a higher level?

After all, the issue is mainly templates since they cannot be compiled to a library... but if they could, then they wouldn't be an issue. .til -> .lib -> .exe

.til is a higher level library that includes objectified templates from D code, it is basically an extension of a lib file and eventually the lib is compiled in to the exe.

On 1/21/19 2:46 PM, Neia Neutuladh wrote: > On Sat, 19 Jan 2019 00:45:27 -0800, Walter Bright wrote: >> Andrei and I were talking on the phone today, trading ideas about >> speeding up importation of Phobos files. Any particular D file tends to >> import much of Phobos, and much of Phobos imports the rest of it. We've >> both noticed that file size doesn't seem to matter much for importation >> speed, but file lookups remain slow. > > I should have started out by testing this. > > I replaced the file lookup with, essentially: > > if (name.startsWith("std")) > filename = "/phobos/" ~ name.replace(".", "/") ~ ".d"; > else > filename = "/druntime/" ~ name.replace(".", "/") ~ ".d"; > > Plus a hard-coded set of package.d references. > > Before, compiling my test file took about 0.67 to 0.70 seconds. After, it > took about 0.67 to 0.70 seconds. > > There is no point in optimizing filesystem access for importing phobos at > this time. Word. (Unless the libs are installed over a networked mount. Not sure how much we need to worry about that.)

On 1/21/19 2:35 PM, Neia Neutuladh wrote: > On Mon, 21 Jan 2019 19:10:11 +0000, Vladimir Panteleev wrote: >> On Monday, 21 January 2019 at 19:01:57 UTC, Steven Schveighoffer wrote: >>> I still find it difficult to believe that calling exists x4 is a huge >>> culprit. But certainly, caching a directory structure is going to be >>> more efficient than reading it every time. >> >> For large directories, opendir+readdir, especially with stat, is much >> slower than open/access. > > We can avoid stat() except with symbolic links. > > Opendir + readdir for my example would be about 500 system calls, so it > breaks even with `import std.stdio;` assuming the cost per call is > identical and we're reading eagerly. Testing shows that this is the case. Another simple test: import std.experimental.all; void main(){} Use "time -c test.d". On my SSD laptop that takes 0.55 seconds. Without the import, it takes 0.02 seconds. In an ideal world there should be no difference. Those 0.53 seconds are the upper bound of the gains to be made by first-order improvements to import mechanics. (IMHO: low impact yet not negligible.)

On Sat, Jun 08, 2019 at 09:02:46AM +0200, Andrei Alexandrescu via Digitalmars-d wrote: > On 1/21/19 4:52 PM, H. S. Teoh wrote: > > split up std.algorithm (at Andrei's protest) > > Shouldn't have been splitted. It should have been. The old std.algorithm was a monster of 10,000 LOC that caused the compiler to exhaust my RAM and thrash on swap before dying horribly, when building unittests. It was an embarrassment. The old std.datetime had the same problem and I'm very glad Jonathan eventually also split it up into more sensible chunks. T -- Береги платье снову, а здоровье смолоду.

On Saturday, 8 June 2019 at 06:29:16 UTC, Amex wrote: > Why not compile phobos to an object file? Basically store the AST directly in to a file and just load it. Phobos never changes so why recompile it over and over and over and over? > +1 for AST. and even more: AST-builder make as RT-module. bonuses: - storing packages as AST-asssembly: no parsing/building-AST for packages, only for user code - that increase compile speed. - can be stored more compacted than source coz words (var names, keywords) are repeated many times through source. and many source files can be stored in one "assembly/AST-package" with one string-literal-table, with one typeinfo-table. - DSL metaprogramming moves to a higher level: steps "parsing DSL - generate D code - parse D code - generate AST" will become "parsing DSL - generate AST" that increase compiling time and helps to appear many DSL for JSON/XML/DB-schemas, and will be used for UI description (same as QML) and many more. - LDC(dunno with DMD/GCC) can generate code dynamically at runtime already (probably it stores LLVM-IR now) then can generate code from AST in runtime (not only in compile time for metaprogramming): same bonuses/possibilities as Expression Trees and Scripting for .NET. Yes, Dlang can use 3rd parties script engines but they are not native - interop is not free, many wrappers/dispatchers to and from, two different GC, failed with threads and TLS, so its for a toy now not more. With native scripting absent interop at all, execution speed as compiled C++ (LLVM generate fast code), one GC for all objects, one thread pools with same TLS.. I see only one minus: dont use AST for DRT and as module itself for programmers. In any case AST-builder already exists in compiler, just bring it outside to public space and allow store packages as AST too.

> - can be stored more compacted than source coz words (var names, keywords) are repeated many times through source. and no need store any unittests and doc/comments in AST-packages. docs will be generated to html. unittests of packages need only to builders of those packages. if u needed u take sources of packages, build it and run unittests for it

On Saturday, 8 June 2019 at 10:35:58 UTC, KnightMare wrote: > +1 for AST. This doesn't make a significant difference. Parsing D source into an AST is quick and easy. The problem is applying that ast to user types.

On Saturday, 8 June 2019 at 11:58:36 UTC, Adam D. Ruppe wrote: >> +1 for AST. > > This doesn't make a significant difference. Parsing D source into an AST is quick and easy. The problem is applying that ast to user types. one more idea. allow to store in AST-packages user/any metadata. LDC can store there optimized LLVM-IR code for AST-tree, so compilation time will be: compile user code only, link and write result to FS

On Saturday, 8 June 2019 at 12:26:56 UTC, KnightMare wrote: > LDC can store there optimized LLVM-IR code for AST-tree, so compilation time will be: compile user code only, link and write result to FS template code will be taken from AST still. (we just cant compile template for all types in universe to IR-code. also maybe IR can store some generics. dunno)

Forums