Speeding up importing Phobos files (page 4)

On Mon, Jan 21, 2019 at 07:10:11PM +0000, Vladimir Panteleev via Digitalmars-d wrote: > On Monday, 21 January 2019 at 19:01:57 UTC, Steven Schveighoffer wrote: > > I still find it difficult to believe that calling exists x4 is a huge culprit. But certainly, caching a directory structure is going to be more efficient than reading it every time. > > For large directories, opendir+readdir, especially with stat, is much slower than open/access. Most filesystems already use a hash table or equivalent, so looking up a known file name is faster because it's a hash table lookup. > > This whole endeavor generally seems like poorly reimplementing what the OS should already be doing. I can't help wondering why we're making so much noise about a few milliseconds on opening/reading import files, when there's the elephant in the room of the 3-5 *seconds* of compile-time added by the mere act of using a single instance of std.regex.Regex. Shouldn't we be doing something about that first?? T -- Verbing weirds language. -- Calvin (& Hobbes)

On Sat, 19 Jan 2019 00:45:27 -0800, Walter Bright wrote: > Andrei and I were talking on the phone today, trading ideas about speeding up importation of Phobos files. Any particular D file tends to import much of Phobos, and much of Phobos imports the rest of it. We've both noticed that file size doesn't seem to matter much for importation speed, but file lookups remain slow. I should have started out by testing this. I replaced the file lookup with, essentially: if (name.startsWith("std")) filename = "/phobos/" ~ name.replace(".", "/") ~ ".d"; else filename = "/druntime/" ~ name.replace(".", "/") ~ ".d"; Plus a hard-coded set of package.d references. Before, compiling my test file took about 0.67 to 0.70 seconds. After, it took about 0.67 to 0.70 seconds. There is no point in optimizing filesystem access for importing phobos at this time.

On Monday, 21 January 2019 at 19:42:57 UTC, H. S. Teoh wrote: > > I can't help wondering why we're making so much noise about a few milliseconds on opening/reading import files, when there's the elephant in the room of the 3-5 *seconds* of compile-time added by the mere act of using a single instance of std.regex.Regex. > > Shouldn't we be doing something about that first?? > > > T I am on it :P I cannot do it any faster than I am currently doing it though. 1st Class Functions to replace recursive templates are still pursued by me as well.

On Saturday, 19 January 2019 at 08:45:27 UTC, Walter Bright wrote: > This can be a fun challenge! Anyone up for it? I wrote about this idea in my blog today: http://dpldocs.info/this-week-in-d/Blog.Posted_2019_01_21.html#my-thoughts-on-forum-discussions In short, it may be a fun challenge, and may be useful to some library distributors, but I don't think it is actually worth it.

On 1/21/19 4:13 PM, Adam D. Ruppe wrote: > On Saturday, 19 January 2019 at 08:45:27 UTC, Walter Bright wrote: >> This can be a fun challenge! Anyone up for it? > > I wrote about this idea in my blog today: > > http://dpldocs.info/this-week-in-d/Blog.Posted_2019_01_21.html#my-thoughts-on-forum-discussions > > > > In short, it may be a fun challenge, and may be useful to some library distributors, but I don't think it is actually worth it. Lot of good thoughts there, most of which I agree with. Thanks for sharing. One note -- I don't think modules like std.datetime were split up for the sake of the compiler parsing speed, I thought they were split up to a) avoid the insane ddoc generation that came from it, and b) reduce dependencies on symbols that you didn't care about. Not to mention that github would refuse to load std.datetime for any PRs :) But it does help to consider the cost of finding the file and the cost of using the file separately, and see how they compare. -Steve

On 1/21/2019 1:02 AM, Vladimir Panteleev wrote: > Even if you have obvious answers to these questions, they still need to be implemented, so the speed gain from such a change would need to be significant in order to justify the disruption. The only way to get definitive answers is to try it. Fortunately, it isn't that difficult.

January 21, 2019

Re: Speeding up importing Phobos files

Posted by H. S. Teoh
in reply to Steven Schveighoffer

Permalink

H. S. Teoh

Posted in reply to Steven Schveighoffer

Permalink

On Mon, Jan 21, 2019 at 04:38:21PM -0500, Steven Schveighoffer via Digitalmars-d wrote: [...]
> One note -- I don't think modules like std.datetime were split up for the sake of the compiler parsing speed, I thought they were split up to a) avoid the insane ddoc generation that came from it, and b) reduce dependencies on symbols that you didn't care about. Not to mention that github would refuse to load std.datetime for any PRs :)

And also, I originally split up std.algorithm (at Andrei's protest) because it was so ridiculously huge that I couldn't get unittests to run on my PC without dying with out-of-memory errors.

> But it does help to consider the cost of finding the file and the cost of using the file separately, and see how they compare.
[...]

I still think a lot of this effort is misdirected -- we're trying to hunt small fish while there's a shark in the pond.  Instead of trying to optimize file open / read times, what we *should* be doing is to reduce the number of recursive templates heavy-weight Phobos modules like std.regex are using, or improving the template expansion strategies (e.g., the various PRs that have been checked in to replace O(n) recursive template expansions with O(log n), or replace O(n^2) with O(n), etc.).  Or, for that matter, optimizing how the compiler processes templates so that it performs better.

Optimizing file open / file read in the face of these much heavier components in the compiler sounds to me like straining out the gnat while swallowing the camel.

T

-- 
Жил-был король когда-то, при нём блоха жила.

On Monday, 21 January 2019 at 21:52:01 UTC, H. S. Teoh wrote: > On Mon, Jan 21, 2019 at 04:38:21PM -0500, Steven Schveighoffer via Digitalmars-d wrote: [...] >> [...] > > And also, I originally split up std.algorithm (at Andrei's protest) because it was so ridiculously huge that I couldn't get unittests to run on my PC without dying with out-of-memory errors. > > [...] Does dmd ever do dynamic programming when it does recursive templates? -Alex

On Saturday, 19 January 2019 at 08:45:27 UTC, Walter Bright wrote: > Andrei and I were talking on the phone today, trading ideas about speeding up importation of Phobos files. Any particular D file tends to import much of Phobos, and much of Phobos imports the rest of it. We've both noticed that file size doesn't seem to matter much for importation speed, but file lookups remain slow. > > So looking up fewer files would make it faster. > > If phobos.zip is opened as a memory mapped file, whenever std/stdio.d is read, the file will be "faulted" into memory rather than doing a file lookup / read. We're speculating that this should be significantly faster, Speaking from Linux, the kernel already caches the file (after the first read) unless `echo 3 > /proc/sys/vm/drop_caches` is triggered. I've tested with the entire phobos cached and the compilation is still slow. IO is not the bottleneck here. The compilation needs to be speeded up. If you still think the file read is the culprint, why does recompilation take the same amount of time as the first compilation (albeit kernel file cache)? > being very convenient for the user to treat Phobos as a single file rather than a blizzard. (phobos.lib could also be in the same file!) We already have std.experimental.all for convenience.

On Monday, January 21, 2019 5:46:32 PM MST Arun Chandrasekaran via Digitalmars-d wrote: > > being very convenient for the user to treat Phobos as a single file rather than a blizzard. (phobos.lib could also be in the same file!) > > We already have std.experimental.all for convenience. If I understand correctly, that's an orthogonal issue. What Walter is proposing wouldn't change how any code imported anything. Rather, it would just change how the compiler reads the files. So, anyone wanting to import all of Phobos at once would still need something like std.experimental.all, but regardless of how much you were importing from Phobos, dmd would read in all of Phobos at once, because it would be a single zip file. It would then only actually compile what it needed to for the imports in your program, but it would have read the entire zip file into memory so that it would only have to open one file instead of searching for and opening each file individually. - Jonathan M Davis

Forums