Jump to page: 1 2
Thread overview
A few measurements of stat()'s speed
Mar 26, 2019
H. S. Teoh
Mar 26, 2019
Jonathan Marler
Mar 26, 2019
Vladimir Panteleev
Mar 28, 2019
Kagamin
Mar 27, 2019
Bastiaan Veelo
Mar 27, 2019
Bastiaan Veelo
Mar 27, 2019
Bastiaan Veelo
March 26, 2019
The current process of searching for imports spans the following directories:

* the current directory
* each of the paths specified in the cmdline with -I, in that order
* each of the paths specified in DFLAGS, in that order

For each of these paths, first the ".di" extension is tried, then the ".d" extension. The function used is stat(). For a majority of cases, the ".di" files doesn't exist so at least 50% of stat() calls fail. The number of failed stat() calls increases with the -I flags, i.e. with the size of the project. (For std imports, that means each will be looked up two times in each of the project directories.)

One alternative would be to use opendir()/readdir()/closedir() once for each directory searched, and cache the directory's contents. Then, subsequent attempts can simply look up the local cache and avoid stat() calls in directories that have been previously visited. This approach would accelerate imports if stat() is slow "enough".

On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.

The results change drastically when network mounts are tested. For sftp and sshfs mounts on a high speed local connection, one failed stat() takes 6-7 milliseconds, so an import like std.all (and many other imports liable to transitively pull others) would cause significant overheads.

So the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks.


Andrei
March 26, 2019
On Tue, Mar 26, 2019 at 02:06:08PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: [...]
> On a Linux moderately-loaded local directory (146 files) mounted from
> an SSD drive, one failed stat() takes only about 0.5 microseconds.
> That means e.g.  if a module imports std.all (which fails 142 times),
> the overhead accountable to failed stat() calls is about 70
> microseconds, i.e. negligible.
> 
> The results change drastically when network mounts are tested. For sftp and sshfs mounts on a high speed local connection, one failed stat() takes 6-7 milliseconds, so an import like std.all (and many other imports liable to transitively pull others) would cause significant overheads.
> 
> So the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks.
[...]

Does caching the contents of import directories cause significant overhead?  If not, why not just cache it anyway, regardless of whether the import happens across network mounts. Making excessive OS roundtrips (calling stat() hundreds of times) should be reduced anyway.

//

On a slightly different note, why are we paying so much attention to import speeds anyway?  We can optimize import speeds to hell and back again until they cost practically zero time, yet the act of actually *using* one of those imports -- ostensibly the reason you'd want to import anything in the first place -- immediately adds a huge amount of overhead that by far overshadows those niggly microseconds we pinched. Ergo:

	import std.regex;
	void main() {
		version(withRegex)
			auto re = regex("a");
	}

This takes about 0.5 seconds to compile without -version=withRegex on my machine. With -version=withRegex, it takes about *4.5 seconds* to compile.  We have a 4 second bottleneck here and yet we're trying to shave off microseconds elsewhere.  Why does instantiating a single-character regex add FOUR SECONDS to compilation time?  I think *that* is the question we should be answering.


T

-- 
People say I'm arrogant, and I'm proud of it.
March 26, 2019
On 3/26/19 2:36 PM, H. S. Teoh wrote:
> Does caching the contents of import directories cause significant
> overhead?  If not, why not just cache it anyway, regardless of whether
> the import happens across network mounts.

Because testing takes 10 minutes and implementation takes one day or more. We want to make sure there's impact.

> On a slightly different note, why are we paying so much attention to
> import speeds anyway?

You destroy your own opening point: work should be put where there's potential for impact, not "regardless".

> We can optimize import speeds to hell and back
> again until they cost practically zero time, yet the act of actually
> *using*  one of those imports -- ostensibly the reason you'd want to
> import anything in the first place -- immediately adds a huge amount of
> overhead that by far overshadows those niggly microseconds we pinched.
> Ergo:
> 
> 	import std.regex;
> 	void main() {
> 		version(withRegex)
> 			auto re = regex("a");
> 	}
> 
> This takes about 0.5 seconds to compile without -version=withRegex on my
> machine. With -version=withRegex, it takes about *4.5 seconds* to
> compile.  We have a 4 second bottleneck here and yet we're trying to
> shave off microseconds elsewhere.  Why does instantiating a
> single-character regex add FOUR SECONDS to compilation time?  I think
> *that*  is the question we should be answering.

There's a matter of difficulty. I don't have a good attack on dramatically improving regexen. If you do, it would be of course a high impact project. There's also a matter of paying for what you don't use. Unused imports are, well, unused. Used imports should be paid for in proportion. Agreed, 4.5 seconds is not quite proportionate.
March 26, 2019
On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
> On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.

I have some related experience with this:

- The eternal battle of keeping The Server's load levels down involves some deal of I/O profiling. The pertinent observation was that opening a file by name can be much faster than enumerating files in a directory. The reason for that is many filesystems implementing directories using some variant of hash table, with accessing a file by name being one hash table lookup, while enumerating all files meaning reading the entire thing.

- stat() is slow. It fetches a lot of information. Many filesystems do not have all of that information as readily accessible as a file name. This is observable through a simple test: on Ubuntu, drop caches, then, in a big directory, compare the execution time of `ls|cat` vs. `ls`. Explanation: when ls's output is a terminal, it will fetch extra information to colorize objects depending on their properties. These are fetched using stat(), but that's not done when it's piped into a file / another program. I had to take this into account when implementing a fast directory iterator [1] (stat only until necessary). dirEntries from std.file does some of this too, but not to the full extent.

My suggestion is: if we are going to read the file if it exists, don't even stat(), just open it. It might result in faster total performance as a result.

I would not recommend tricks like readdir() and caching. This ought to be done at the filesystem layer, and smells of problems like TOCTOU / cache invalidation. In any case, I would not suggest spending time on it unless someone encounters a specific, real-life situation where the additional complexity would make it worthwhile to research workarounds.

> So the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks.

Honestly, this sounds like you have a solution in search of a problem.

[1]: https://github.com/CyberShadow/ae/blob/25850209e03ee97640a9b0715efe7e25b1fcc62d/sys/file.d#L740
March 26, 2019
On Tuesday, 26 March 2019 at 20:09:52 UTC, Andrei Alexandrescu wrote:
> On 3/26/19 2:36 PM, H. S. Teoh wrote:
>> Does caching the contents of import directories cause significant
>> overhead?  If not, why not just cache it anyway, regardless of whether
>> the import happens across network mounts.
>
> Because testing takes 10 minutes and implementation takes one day or more. We want to make sure there's impact.
>
>> On a slightly different note, why are we paying so much attention to
>> import speeds anyway?
>
> You destroy your own opening point: work should be put where there's potential for impact, not "regardless".
>
>> We can optimize import speeds to hell and back
>> again until they cost practically zero time, yet the act of actually
>> *using*  one of those imports -- ostensibly the reason you'd want to
>> import anything in the first place -- immediately adds a huge amount of
>> overhead that by far overshadows those niggly microseconds we pinched.
>> Ergo:
>> 
>> 	import std.regex;
>> 	void main() {
>> 		version(withRegex)
>> 			auto re = regex("a");
>> 	}
>> 
>> This takes about 0.5 seconds to compile without -version=withRegex on my
>> machine. With -version=withRegex, it takes about *4.5 seconds* to
>> compile.  We have a 4 second bottleneck here and yet we're trying to
>> shave off microseconds elsewhere.  Why does instantiating a
>> single-character regex add FOUR SECONDS to compilation time?  I think
>> *that*  is the question we should be answering.
>
> There's a matter of difficulty. I don't have a good attack on dramatically improving regexen. If you do, it would be of course a high impact project. There's also a matter of paying for what you don't use. Unused imports are, well, unused. Used imports should be paid for in proportion. Agreed, 4.5 seconds is not quite proportionate.

I've included a script below to generate and run a performance test.  Save it to your box as "gen", then run "./gen" to generate then test, then "./build" to run it.

I tried changing the "stat" calls to use "access" instead, but with around 70,000 system calls (found out using strace), it didn't make any noticeable difference.  With "stat" it was around 2.2 seconds and was about the same with "access". So the issue is not with how much memory stat is returning, it the overhead of performing any system call.


#!/usr/bin/env python3
# Run "./gen" to generate file for performance test
# Run "./build" to run the test
import os
import stat

mod_count = 1000
path_count = 20

def mkdir(dir):
    if not os.path.exists(dir):
        os.mkdir(dir)

mkdir("out")
for i in range(0, path_count):
    mkdir("out/lib{}".format(i))

mkdir("out/mods")
for i in range(0, mod_count):
    with open("out/mods/mod{}.d".format(i), "w") as file:
    	for j in range(0, mod_count):
            file.write("import mod{};\n".format(j))

with open("out/main.d", "w") as file:
    for i in range(0, mod_count):
        file.write("import mod{};\n".format(i))
    file.write('void main() { import std.stdio;writeln("working"); }')

with open("build", "w") as file:
    file.write("#!/usr/bin/env bash\n")
    file.write('[ "$DMD" != "" ] || DMD=dmd\n')
    file.write("set -x\n")
    file.write("time $DMD \\\n")
    for i in range(0, path_count):
        file.write("  -I=out/lib{} \\\n".format(i))
    file.write("  -I=out/mods out/main.d $@")
os.chmod("build", stat.S_IRWXU | stat.S_IRWXG | stat.S_IROTH | stat.S_IXOTH)
March 26, 2019
On 3/26/19 6:04 PM, Vladimir Panteleev wrote:
> On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
>> On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.
> 
> I have some related experience with this:
> 
> - The eternal battle of keeping The Server's load levels down involves some deal of I/O profiling. The pertinent observation was that opening a file by name can be much faster than enumerating files in a directory. The reason for that is many filesystems implementing directories using some variant of hash table, with accessing a file by name being one hash table lookup, while enumerating all files meaning reading the entire thing.
> 
> - stat() is slow. It fetches a lot of information. Many filesystems do not have all of that information as readily accessible as a file name. This is observable through a simple test: on Ubuntu, drop caches, then, in a big directory, compare the execution time of `ls|cat` vs. `ls`. Explanation: when ls's output is a terminal, it will fetch extra information to colorize objects depending on their properties. These are fetched using stat(), but that's not done when it's piped into a file / another program. I had to take this into account when implementing a fast directory iterator [1] (stat only until necessary). dirEntries from std.file does some of this too, but not to the full extent.
> 
> My suggestion is: if we are going to read the file if it exists, don't even stat(), just open it. It might result in faster total performance as a result.
> 
> I would not recommend tricks like readdir() and caching. This ought to be done at the filesystem layer, and smells of problems like TOCTOU / cache invalidation. In any case, I would not suggest spending time on it unless someone encounters a specific, real-life situation where the additional complexity would make it worthwhile to research workarounds.

That's solid, thanks very much!

What seems to be the case according to https://github.com/dlang/dmd/blob/master/src/dmd/dmodule.d is that a bunch of "exists" are invoked (presumably those would call stat()). Then a filename is returned, which is used to create a File object, see https://github.com/dlang/dmd/blob/master/src/dmd/root/file.d. In turn, that calls open() and then fstat() again on the opened handle. Quite wasteful on the face of it, but hey if the measurable benefit is low not worth optimizing.

>> So the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks.
> 
> Honestly, this sounds like you have a solution in search of a problem.
> 
> [1]: https://github.com/CyberShadow/ae/blob/25850209e03ee97640a9b0715efe7e25b1fcc62d/sys/file.d#L740 

Agreed. Just looking for low-hanging fruit to pluck.


Andrei

March 27, 2019
On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
> On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.

It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes...

Bastiaan.

March 27, 2019
On 3/27/19 5:23 AM, Bastiaan Veelo wrote:
> On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
>> On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.
> 
> It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes...
> 
> Bastiaan.

Really simple. Here's the C code Eduard and I used. Run it a few times with a variety of paths (change of course to use Windows naming) and divide total run time by n.

#include<stdio.h>
#include<string.h>
#include<sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>

int main(int argc, char** argv)
{
    size_t i;
    size_t n = 1000000;
    const char* s = "/home/user/gd/Google Photos/xyz";
    //s = "/home/user/dir/xyz";
    //s = "/run/user/1000/gvfs/mount/xyz";

    struct stat sfile;
    for (i = 0; i < n; ++i)
    {
      stat(s, &sfile);
    }
    return 0;
}


March 27, 2019
On Wednesday, 27 March 2019 at 12:06:11 UTC, Andrei Alexandrescu wrote:
> On 3/27/19 5:23 AM, Bastiaan Veelo wrote:
>> On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
>>> On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.
>> 
>> It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes...
>> 
>> Bastiaan.
>
> Really simple. Here's the C code Eduard and I used. Run it a few times with a variety of paths (change of course to use Windows naming) and divide total run time by n.
>
> #include<stdio.h>
> #include<string.h>
> #include<sys/stat.h>
> #include <sys/types.h>
> #include <fcntl.h>
>
> int main(int argc, char** argv)
> {
>     size_t i;
>     size_t n = 1000000;
>     const char* s = "/home/user/gd/Google Photos/xyz";
>     //s = "/home/user/dir/xyz";
>     //s = "/run/user/1000/gvfs/mount/xyz";
>
>     struct stat sfile;
>     for (i = 0; i < n; ++i)
>     {
>       stat(s, &sfile);
>     }
>     return 0;
> }

On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, I get ca. 40x worse timings. Compiled with MSVC 2017, no options (cl teststat.c). Timed in PowerShell using `Measure-Command {.\teststat.exe}`.

For "/home/user/gd/Google Photos/xyz", a directory that does not exist, total running time is 17 seconds (+/- 0.2).

For "/Users/bastiaan/Documents/D/tests/stat/teststat.c", an existing file in a directory with two other files, total running time is a whopping 44 seconds (+/- 1.0).

For "/Coin/Coin_source/src/nodes/xyz", a nonexisting file in an existing directory with 114 items, total running time is 19.5 seconds (+/- 1.0). So for me, 142 failed stats cost close to 2.8 milliseconds.

Bastiaan.
March 27, 2019
On Wednesday, 27 March 2019 at 22:52:09 UTC, Bastiaan Veelo wrote:
> On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, I get ca. 40x worse timings.

File system is NTFS.
« First   ‹ Prev
1 2