A few measurements of stat()'s speed - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » A few measurements of stat()'s speed

Thread overview

A few measurements of stat()'s speed
Mar 26, 2019 Andrei Alexandrescu
Mar 26, 2019 H. S. Teoh
Mar 26, 2019 Andrei Alexandrescu
Mar 26, 2019 Jonathan Marler
Mar 26, 2019 Vladimir Panteleev
Mar 27, 2019 Andrei Alexandrescu
Mar 28, 2019 Kagamin
Mar 27, 2019 Bastiaan Veelo
Mar 27, 2019 Andrei Alexandrescu
Mar 27, 2019 Bastiaan Veelo
Mar 27, 2019 Bastiaan Veelo

March 26, 2019

A few measurements of stat()'s speed

Posted by Andrei Alexandrescu

Andrei Alexandrescu

The current process of searching for imports spans the following directories:

* the current directory
* each of the paths specified in the cmdline with -I, in that order
* each of the paths specified in DFLAGS, in that order

For each of these paths, first the ".di" extension is tried, then the ".d" extension. The function used is stat(). For a majority of cases, the ".di" files doesn't exist so at least 50% of stat() calls fail. The number of failed stat() calls increases with the -I flags, i.e. with the size of the project. (For std imports, that means each will be looked up two times in each of the project directories.)

One alternative would be to use opendir()/readdir()/closedir() once for each directory searched, and cache the directory's contents. Then, subsequent attempts can simply look up the local cache and avoid stat() calls in directories that have been previously visited. This approach would accelerate imports if stat() is slow "enough".

On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.

The results change drastically when network mounts are tested. For sftp and sshfs mounts on a high speed local connection, one failed stat() takes 6-7 milliseconds, so an import like std.all (and many other imports liable to transitively pull others) would cause significant overheads.

So the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks.


Andrei

March 26, 2019

Re: A few measurements of stat()'s speed

Posted by H. S. Teoh
in reply to Andrei Alexandrescu

H. S. Teoh

Posted in reply to Andrei Alexandrescu

On Tue, Mar 26, 2019 at 02:06:08PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: [...]
> On a Linux moderately-loaded local directory (146 files) mounted from
> an SSD drive, one failed stat() takes only about 0.5 microseconds.
> That means e.g.  if a module imports std.all (which fails 142 times),
> the overhead accountable to failed stat() calls is about 70
> microseconds, i.e. negligible.
> 
> The results change drastically when network mounts are tested. For sftp and sshfs mounts on a high speed local connection, one failed stat() takes 6-7 milliseconds, so an import like std.all (and many other imports liable to transitively pull others) would cause significant overheads.
> 
> So the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks.
[...]

Does caching the contents of import directories cause significant overhead?  If not, why not just cache it anyway, regardless of whether the import happens across network mounts. Making excessive OS roundtrips (calling stat() hundreds of times) should be reduced anyway.

//

On a slightly different note, why are we paying so much attention to import speeds anyway?  We can optimize import speeds to hell and back again until they cost practically zero time, yet the act of actually *using* one of those imports -- ostensibly the reason you'd want to import anything in the first place -- immediately adds a huge amount of overhead that by far overshadows those niggly microseconds we pinched. Ergo:

	import std.regex;
	void main() {
		version(withRegex)
			auto re = regex("a");
	}

This takes about 0.5 seconds to compile without -version=withRegex on my machine. With -version=withRegex, it takes about *4.5 seconds* to compile.  We have a 4 second bottleneck here and yet we're trying to shave off microseconds elsewhere.  Why does instantiating a single-character regex add FOUR SECONDS to compilation time?  I think *that* is the question we should be answering.

T

-- 
People say I'm arrogant, and I'm proud of it.

March 26, 2019

Re: A few measurements of stat()'s speed

Posted by Andrei Alexandrescu
in reply to H. S. Teoh

Andrei Alexandrescu

Posted in reply to H. S. Teoh

On 3/26/19 2:36 PM, H. S. Teoh wrote:
> Does caching the contents of import directories cause significant
> overhead?  If not, why not just cache it anyway, regardless of whether
> the import happens across network mounts.

Because testing takes 10 minutes and implementation takes one day or more. We want to make sure there's impact.

> On a slightly different note, why are we paying so much attention to
> import speeds anyway?

You destroy your own opening point: work should be put where there's potential for impact, not "regardless".

> We can optimize import speeds to hell and back
> again until they cost practically zero time, yet the act of actually
> *using*  one of those imports -- ostensibly the reason you'd want to
> import anything in the first place -- immediately adds a huge amount of
> overhead that by far overshadows those niggly microseconds we pinched.
> Ergo:
> 
> 	import std.regex;
> 	void main() {
> 		version(withRegex)
> 			auto re = regex("a");
> 	}
> 
> This takes about 0.5 seconds to compile without -version=withRegex on my
> machine. With -version=withRegex, it takes about *4.5 seconds* to
> compile.  We have a 4 second bottleneck here and yet we're trying to
> shave off microseconds elsewhere.  Why does instantiating a
> single-character regex add FOUR SECONDS to compilation time?  I think
> *that*  is the question we should be answering.

There's a matter of difficulty. I don't have a good attack on dramatically improving regexen. If you do, it would be of course a high impact project. There's also a matter of paying for what you don't use. Unused imports are, well, unused. Used imports should be paid for in proportion. Agreed, 4.5 seconds is not quite proportionate.

March 26, 2019

Re: A few measurements of stat()'s speed

Posted by Vladimir Panteleev
in reply to Andrei Alexandrescu

Vladimir Panteleev

Posted in reply to Andrei Alexandrescu

On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
> On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.

I have some related experience with this:

- The eternal battle of keeping The Server's load levels down involves some deal of I/O profiling. The pertinent observation was that opening a file by name can be much faster than enumerating files in a directory. The reason for that is many filesystems implementing directories using some variant of hash table, with accessing a file by name being one hash table lookup, while enumerating all files meaning reading the entire thing.

- stat() is slow. It fetches a lot of information. Many filesystems do not have all of that information as readily accessible as a file name. This is observable through a simple test: on Ubuntu, drop caches, then, in a big directory, compare the execution time of `ls|cat` vs. `ls`. Explanation: when ls's output is a terminal, it will fetch extra information to colorize objects depending on their properties. These are fetched using stat(), but that's not done when it's piped into a file / another program. I had to take this into account when implementing a fast directory iterator [1] (stat only until necessary). dirEntries from std.file does some of this too, but not to the full extent.

My suggestion is: if we are going to read the file if it exists, don't even stat(), just open it. It might result in faster total performance as a result.

I would not recommend tricks like readdir() and caching. This ought to be done at the filesystem layer, and smells of problems like TOCTOU / cache invalidation. In any case, I would not suggest spending time on it unless someone encounters a specific, real-life situation where the additional complexity would make it worthwhile to research workarounds.

> So the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks.

Honestly, this sounds like you have a solution in search of a problem.

[1]: https://github.com/CyberShadow/ae/blob/25850209e03ee97640a9b0715efe7e25b1fcc62d/sys/file.d#L740

March 26, 2019

Re: A few measurements of stat()'s speed

Posted by Jonathan Marler
in reply to Andrei Alexandrescu

Jonathan Marler

Posted in reply to Andrei Alexandrescu

On Tuesday, 26 March 2019 at 20:09:52 UTC, Andrei Alexandrescu wrote:
> On 3/26/19 2:36 PM, H. S. Teoh wrote:
>> Does caching the contents of import directories cause significant
>> overhead?  If not, why not just cache it anyway, regardless of whether
>> the import happens across network mounts.
>
> Because testing takes 10 minutes and implementation takes one day or more. We want to make sure there's impact.
>
>> On a slightly different note, why are we paying so much attention to
>> import speeds anyway?
>
> You destroy your own opening point: work should be put where there's potential for impact, not "regardless".
>
>> We can optimize import speeds to hell and back
>> again until they cost practically zero time, yet the act of actually
>> *using*  one of those imports -- ostensibly the reason you'd want to
>> import anything in the first place -- immediately adds a huge amount of
>> overhead that by far overshadows those niggly microseconds we pinched.
>> Ergo:
>> 
>> 	import std.regex;
>> 	void main() {
>> 		version(withRegex)
>> 			auto re = regex("a");
>> 	}
>> 
>> This takes about 0.5 seconds to compile without -version=withRegex on my
>> machine. With -version=withRegex, it takes about *4.5 seconds* to
>> compile.  We have a 4 second bottleneck here and yet we're trying to
>> shave off microseconds elsewhere.  Why does instantiating a
>> single-character regex add FOUR SECONDS to compilation time?  I think
>> *that*  is the question we should be answering.
>
> There's a matter of difficulty. I don't have a good attack on dramatically improving regexen. If you do, it would be of course a high impact project. There's also a matter of paying for what you don't use. Unused imports are, well, unused. Used imports should be paid for in proportion. Agreed, 4.5 seconds is not quite proportionate.

I've included a script below to generate and run a performance test.  Save it to your box as "gen", then run "./gen" to generate then test, then "./build" to run it.

I tried changing the "stat" calls to use "access" instead, but with around 70,000 system calls (found out using strace), it didn't make any noticeable difference.  With "stat" it was around 2.2 seconds and was about the same with "access". So the issue is not with how much memory stat is returning, it the overhead of performing any system call.


#!/usr/bin/env python3
# Run "./gen" to generate file for performance test
# Run "./build" to run the test
import os
import stat

mod_count = 1000
path_count = 20

def mkdir(dir):
    if not os.path.exists(dir):
        os.mkdir(dir)

mkdir("out")
for i in range(0, path_count):
    mkdir("out/lib{}".format(i))

mkdir("out/mods")
for i in range(0, mod_count):
    with open("out/mods/mod{}.d".format(i), "w") as file:
    	for j in range(0, mod_count):
            file.write("import mod{};\n".format(j))

with open("out/main.d", "w") as file:
    for i in range(0, mod_count):
        file.write("import mod{};\n".format(i))
    file.write('void main() { import std.stdio;writeln("working"); }')

with open("build", "w") as file:
    file.write("#!/usr/bin/env bash\n")
    file.write('[ "$DMD" != "" ] || DMD=dmd\n')
    file.write("set -x\n")
    file.write("time $DMD \\\n")
    for i in range(0, path_count):
        file.write("  -I=out/lib{} \\\n".format(i))
    file.write("  -I=out/mods out/main.d $@")
os.chmod("build", stat.S_IRWXU | stat.S_IRWXG | stat.S_IROTH | stat.S_IXOTH)

March 26, 2019

Re: A few measurements of stat()'s speed

Posted by Andrei Alexandrescu
in reply to Vladimir Panteleev

Andrei Alexandrescu

Posted in reply to Vladimir Panteleev

On 3/26/19 6:04 PM, Vladimir Panteleev wrote:
> On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
>> On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.
> 
> I have some related experience with this:
> 
> - The eternal battle of keeping The Server's load levels down involves some deal of I/O profiling. The pertinent observation was that opening a file by name can be much faster than enumerating files in a directory. The reason for that is many filesystems implementing directories using some variant of hash table, with accessing a file by name being one hash table lookup, while enumerating all files meaning reading the entire thing.
> 
> - stat() is slow. It fetches a lot of information. Many filesystems do not have all of that information as readily accessible as a file name. This is observable through a simple test: on Ubuntu, drop caches, then, in a big directory, compare the execution time of `ls|cat` vs. `ls`. Explanation: when ls's output is a terminal, it will fetch extra information to colorize objects depending on their properties. These are fetched using stat(), but that's not done when it's piped into a file / another program. I had to take this into account when implementing a fast directory iterator [1] (stat only until necessary). dirEntries from std.file does some of this too, but not to the full extent.
> 
> My suggestion is: if we are going to read the file if it exists, don't even stat(), just open it. It might result in faster total performance as a result.
> 
> I would not recommend tricks like readdir() and caching. This ought to be done at the filesystem layer, and smells of problems like TOCTOU / cache invalidation. In any case, I would not suggest spending time on it unless someone encounters a specific, real-life situation where the additional complexity would make it worthwhile to research workarounds.

That's solid, thanks very much!

What seems to be the case according to https://github.com/dlang/dmd/blob/master/src/dmd/dmodule.d is that a bunch of "exists" are invoked (presumably those would call stat()). Then a filename is returned, which is used to create a File object, see https://github.com/dlang/dmd/blob/master/src/dmd/root/file.d. In turn, that calls open() and then fstat() again on the opened handle. Quite wasteful on the face of it, but hey if the measurable benefit is low not worth optimizing.

>> So the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks.
> 
> Honestly, this sounds like you have a solution in search of a problem.
> 
> [1]: https://github.com/CyberShadow/ae/blob/25850209e03ee97640a9b0715efe7e25b1fcc62d/sys/file.d#L740 

Agreed. Just looking for low-hanging fruit to pluck.


Andrei

March 27, 2019

Re: A few measurements of stat()'s speed

Posted by Bastiaan Veelo
in reply to Andrei Alexandrescu

Bastiaan Veelo

Posted in reply to Andrei Alexandrescu

On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
> On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.

It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes...

Bastiaan.

March 27, 2019

Re: A few measurements of stat()'s speed

Posted by Andrei Alexandrescu
in reply to Bastiaan Veelo

Andrei Alexandrescu

Posted in reply to Bastiaan Veelo

On 3/27/19 5:23 AM, Bastiaan Veelo wrote:
> On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
>> On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.
> 
> It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes...
> 
> Bastiaan.

Really simple. Here's the C code Eduard and I used. Run it a few times with a variety of paths (change of course to use Windows naming) and divide total run time by n.

#include<stdio.h>
#include<string.h>
#include<sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>

int main(int argc, char** argv)
{
    size_t i;
    size_t n = 1000000;
    const char* s = "/home/user/gd/Google Photos/xyz";
    //s = "/home/user/dir/xyz";
    //s = "/run/user/1000/gvfs/mount/xyz";

    struct stat sfile;
    for (i = 0; i < n; ++i)
    {
      stat(s, &sfile);
    }
    return 0;
}

March 27, 2019

Re: A few measurements of stat()'s speed

Posted by Bastiaan Veelo
in reply to Andrei Alexandrescu

Bastiaan Veelo

Posted in reply to Andrei Alexandrescu

On Wednesday, 27 March 2019 at 12:06:11 UTC, Andrei Alexandrescu wrote:
> On 3/27/19 5:23 AM, Bastiaan Veelo wrote:
>> On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
>>> On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.
>> 
>> It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes...
>> 
>> Bastiaan.
>
> Really simple. Here's the C code Eduard and I used. Run it a few times with a variety of paths (change of course to use Windows naming) and divide total run time by n.
>
> #include<stdio.h>
> #include<string.h>
> #include<sys/stat.h>
> #include <sys/types.h>
> #include <fcntl.h>
>
> int main(int argc, char** argv)
> {
>     size_t i;
>     size_t n = 1000000;
>     const char* s = "/home/user/gd/Google Photos/xyz";
>     //s = "/home/user/dir/xyz";
>     //s = "/run/user/1000/gvfs/mount/xyz";
>
>     struct stat sfile;
>     for (i = 0; i < n; ++i)
>     {
>       stat(s, &sfile);
>     }
>     return 0;
> }

On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, I get ca. 40x worse timings. Compiled with MSVC 2017, no options (cl teststat.c). Timed in PowerShell using `Measure-Command {.\teststat.exe}`.

For "/home/user/gd/Google Photos/xyz", a directory that does not exist, total running time is 17 seconds (+/- 0.2).

For "/Users/bastiaan/Documents/D/tests/stat/teststat.c", an existing file in a directory with two other files, total running time is a whopping 44 seconds (+/- 1.0).

For "/Coin/Coin_source/src/nodes/xyz", a nonexisting file in an existing directory with 114 items, total running time is 19.5 seconds (+/- 1.0). So for me, 142 failed stats cost close to 2.8 milliseconds.

Bastiaan.

March 27, 2019

Re: A few measurements of stat()'s speed

Posted by Bastiaan Veelo
in reply to Bastiaan Veelo

Bastiaan Veelo

Posted in reply to Bastiaan Veelo

On Wednesday, 27 March 2019 at 22:52:09 UTC, Bastiaan Veelo wrote:
> On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, I get ca. 40x worse timings.

File system is NTFS.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation