January 21, 2019
On Saturday, 19 January 2019 at 08:45:27 UTC, Walter Bright wrote:
> This can be a fun challenge! Anyone up for it?

This might get you some speed for the first compilation (fewer directory entry lookups), but I would expect that follow-up compilations to have a negligible overhead, compared to the CPU time needed to actually process the source code.

January 21, 2019
On Saturday, 19 January 2019 at 20:32:07 UTC, Neia Neutuladh wrote:
> https://github.com/dhasenan/dmd/tree/fasterimport

Any benchmarks?
January 21, 2019
On 1/20/2019 10:21 PM, Vladimir Panteleev wrote:
> This might get you some speed for the first compilation (fewer directory entry lookups), but I would expect that follow-up compilations to have a negligible overhead, compared to the CPU time needed to actually process the source code.


In my benchmarks with Warp, the slowdown persisted with multiple sequential runs.
January 21, 2019
On Monday, 21 January 2019 at 08:11:22 UTC, Walter Bright wrote:
> On 1/20/2019 10:21 PM, Vladimir Panteleev wrote:
>> This might get you some speed for the first compilation (fewer directory entry lookups), but I would expect that follow-up compilations to have a negligible overhead, compared to the CPU time needed to actually process the source code.
>
> In my benchmarks with Warp, the slowdown persisted with multiple sequential runs.

Would be nice if the benchmarks were reproducible.

Some things to consider:

- Would the same result apply to Phobos files?

  Some things that could be different between the two that would affect the viability of this approach:

  - Total number of files
  - The size of the files
  - The relative processing time needed to actually parse the files

    (Presumably, a C preprocessor would be faster than a full D compiler.)

- Could the same result be achieved through other means, especially those without introducing disruptive changes to tooling? For example, re-enabling the parallel / preemptive loading of source files in DMD.

Using JAR-like files would be disruptive to many tools. Consider:

- What should the file/path error message location look like when printing error messages with locations in Phobos?
- How should tools such as DCD, which need to scan source code, adapt to these changes?
- How should tools and editors with a "go to definition" function work when the definition is inside an archive, especially in editors which do not support visiting files inside archives?

Even if you have obvious answers to these questions, they still need to be implemented, so the speed gain from such a change would need to be significant in order to justify the disruption.

January 21, 2019
On 1/20/19 9:31 PM, Walter Bright wrote:
> On 1/20/2019 9:53 AM, Doc Andrew wrote:
>>     Are you envisioning something like the JAR format Java uses?
> 
> jar/zip/arc/tar/lib/ar/cab/lzh/whatever
> 
> They're all the same under the hood - a bunch of files concatenated together with a table of contents.

I notice a trend here. You eventually end up at the Java/JAR or .NET/Assembly model where everything need to compile against a library is included in a single file. I have often wondered how hard it would be to teach D how to automatically include the contents of a DI file into a library file and read the contents of the library at compile time.

Having studied dependencies/packaging in D quite a bit; a readily apparent observation is that D is much more difficult to work with than Java/.NET/JavaScript in regards to packaging. Unified interface/implementation would go a LONG way to simplifying the problem.

I even built a ZIP based packing model for D.

-- 
Adam Wilson
IRC: EllipticBit
import quiet.dlang.dev;
January 21, 2019
On 2019-01-21 11:21, Adam Wilson wrote:

> I notice a trend here. You eventually end up at the Java/JAR or .NET/Assembly model where everything need to compile against a library is included in a single file. I have often wondered how hard it would be to teach D how to automatically include the contents of a DI file into a library file and read the contents of the library at compile time.
> 
> Having studied dependencies/packaging in D quite a bit; a readily apparent observation is that D is much more difficult to work with than Java/.NET/JavaScript in regards to packaging. Unified interface/implementation would go a LONG way to simplifying the problem.
> 
> I even built a ZIP based packing model for D.

For distributing libraries you use Dub. For distributing applications to end users you distribute the executable.

-- 
/Jacob Carlborg
January 21, 2019
On 1/21/19 3:50 AM, Jacob Carlborg wrote:
> On 2019-01-21 11:21, Adam Wilson wrote:
> 
>> I notice a trend here. You eventually end up at the Java/JAR or .NET/Assembly model where everything need to compile against a library is included in a single file. I have often wondered how hard it would be to teach D how to automatically include the contents of a DI file into a library file and read the contents of the library at compile time.
>>
>> Having studied dependencies/packaging in D quite a bit; a readily apparent observation is that D is much more difficult to work with than Java/.NET/JavaScript in regards to packaging. Unified interface/implementation would go a LONG way to simplifying the problem.
>>
>> I even built a ZIP based packing model for D.
> 
> For distributing libraries you use Dub. For distributing applications to end users you distribute the executable.
> 

DUB does nothing to solve the file lookup problem so I am curious, how does it apply to this conversation? I was talking about how the files are packaged for distribution, not the actual distribution of the package itself.

And don't even get me started on DUB...

-- 
Adam Wilson
IRC: EllipticBit
import quiet.dlang.dev;
January 21, 2019
On 1/19/19 3:32 PM, Neia Neutuladh wrote:
> On Sat, 19 Jan 2019 11:56:29 -0800, H. S. Teoh wrote:
>> Excellent finding! I *knew* something was off when looking up a file is
>> more expensive than reading it.  I'm thinking a quick fix could be to
>> just cache the intermediate results of each lookup, and reuse those
>> instead of issuing another call to opendir() each time.  I surmise that
>> after this change, this issue may no longer even be a problem anymore.
> 
> It doesn't even call opendir(). It assembles each potential path and calls
> exists(). Which might be better for only having a small number of imports,
> but that's not the common case.
> 
> I've a partial fix for Posix, and I'll see about getting dev tools running
> in WINE to get a Windows version. (Which isn't exactly the same, but if I
> find a difference in FindFirstFile / FindNextFile between Windows and
> WINE, I'll be surprised.)
> 
> I'm not sure what it should do when the same module is found in multiple
> locations, though -- the current code seems to take the first match. I'm
> also not sure whether it should be lazy or not.
> 
> Also symlinks and case-insensitive filesystems are annoying.
> 
> https://github.com/dhasenan/dmd/tree/fasterimport
> 

I wonder if packages could be used to eliminate possibilities.

For example, std is generally ONLY going to be under a phobos import path. That can eliminate any other import directives from even being tried (if they don't have a std directory, there's no point in looking for an std.algorithm directory in there).

Maybe you already implemented this, I'm not sure.

I still find it difficult to believe that calling exists x4 is a huge culprit. But certainly, caching a directory structure is going to be more efficient than reading it every time.

-Steve
January 21, 2019
On Monday, 21 January 2019 at 19:01:57 UTC, Steven Schveighoffer wrote:
> I still find it difficult to believe that calling exists x4 is a huge culprit. But certainly, caching a directory structure is going to be more efficient than reading it every time.

For large directories, opendir+readdir, especially with stat, is much slower than open/access. Most filesystems already use a hash table or equivalent, so looking up a known file name is faster because it's a hash table lookup.

This whole endeavor generally seems like poorly reimplementing what the OS should already be doing.

January 21, 2019
On Mon, 21 Jan 2019 19:10:11 +0000, Vladimir Panteleev wrote:
> On Monday, 21 January 2019 at 19:01:57 UTC, Steven Schveighoffer wrote:
>> I still find it difficult to believe that calling exists x4 is a huge culprit. But certainly, caching a directory structure is going to be more efficient than reading it every time.
> 
> For large directories, opendir+readdir, especially with stat, is much slower than open/access.

We can avoid stat() except with symbolic links.

Opendir + readdir for my example would be about 500 system calls, so it breaks even with `import std.stdio;` assuming the cost per call is identical and we're reading eagerly. Testing shows that this is the case.

With a C preprocessor, though, you're dealing with /usr/share with thousands of header files.

> This whole endeavor generally seems like poorly reimplementing what the OS should already be doing.

The OS doesn't have a "find a file with one of this handful of names among these directories" call.