Proposed improvements to the separate compilation model (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Proposed improvements to the separate compilation model (page 3)

July 23, 2011

Re: Proposed improvements to the separate compilation model

Posted by Don
in reply to Andrei Alexandrescu

Don

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> On 7/23/11 1:53 PM, Andrej Mitrovic wrote:
>> Isn't the biggest issue of large D projects the problems with
>> incremental compilation (e.g.
>> https://bitbucket.org/h3r3tic/xfbuild/issue/7/make-incremental-building-reliable), 
>>
>> optlink, and the toolchain?
> 
> The proposed improvement would mark a step forward in the toolchain and generally in the development of large programs. In particular, it would provide a simple means to decouple compilation of modules used together. It's not easy for me to figure how people don't get it's a net step forward from the current situation.
> 
> Andrei

Personally I fear that it may be too much cost for too little benefit.

The role of .di files for information hiding is clear. But it's not _at all_ obvious to me that .di files will provide a significant improvement in compilation speed. Do we actually have profiling data that shows that parsing is the bottleneck?

It's also not clear to me that it's a "simple means to decouple compilation of modules" -- it seems complicated to me.

As far as performance goes, it would seem much simpler to just cache the symbol tables (similar to precompiled headers in C++, but the idea works better in D because D has an actual module system). That would give faster compilation times than .di files, because you'd also be caching CTFE results.

July 23, 2011

Re: Proposed improvements to the separate compilation model

Posted by Andrei Alexandrescu
in reply to Vladimir Panteleev

Andrei Alexandrescu

Posted in reply to Vladimir Panteleev

On 7/23/11 4:01 PM, Vladimir Panteleev wrote:
> On Sat, 23 Jul 2011 23:16:20 +0300, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> On 7/23/11 1:53 PM, Andrej Mitrovic wrote:
>>> Isn't the biggest issue of large D projects the problems with
>>> incremental compilation (e.g.
>>> https://bitbucket.org/h3r3tic/xfbuild/issue/7/make-incremental-building-reliable),
>>>
>>> optlink, and the toolchain?
>>
>> The proposed improvement would mark a step forward in the toolchain
>> and generally in the development of large programs. In particular, it
>> would provide a simple means to decouple compilation of modules used
>> together. It's not easy for me to figure how people don't get it's a
>> net step forward from the current situation.
>
> Then you don't understand what I'm ranting about.

That's a bit assuming. I thought about it for a little and concluded that I'd do good to explain the current state of affairs a bit.

Consider:

// file a.di
class A {
    int a;
    double b;
    string c;
    void fun();
}

Say the team working on A wants to "freeze" a.di without precluding work on A.fun(). In a large project, changing a.di would trigger a lot of recompilations, re-uploads, the need for retests etc. so they'd want to have control over that. So they freeze a.di and define a.d as follows:

// file a.d
class A {
    int a = 42;
    double b = 43;
    string c = "44";
    void fun() { assert(a == 42 && b == 43 && c == "44"); }
}

Now the team has achieved their goal: developers can work on A.fun without inadvertently messing up a.di. Everybody is happy.

The client code would work like this:

// file main.d
import std.stdio;
import a;

void main() {
    auto a = new A;
    a.fun();
    writeln(a.tupleof);
}

To build and run:

dmd -c a.d
dmd -c main.d a.o
./main

The program prints "424344" as expected.

The problem with this setup is that it's extremely fragile, in ways that are undetectable during compilation or runtime. For example, just swapping a and b in the implementation file makes the program print
"08.96566e-31344". Similar issues occur if fields or methods are added or removed from one file but not the other.

In an attempt to fix this, the developers may add an "import a" to a.d, thinking that the compiler would import a.di and would verify the bodies of the two classes for correspondence. That doesn't work - the compiler simply ignores the import. Things can be tenuously arranged such that the .d file and the .di file have different names, but in that case the compiler complains about duplicate definitions.

So the programmers conclude they need to define an interface for A (and generally each and every hierarchy or isolated class in the project). But the same problem occurs for struct, and there's no way to define interfaces for structs.

Ultimately the programmers figure there's no way to keep files separate without establishing a build mechanism that e.g. generates a.di from a.d, compares it against the existing a.di, and complains if the two aren't identical. Upon such a build failure, a senior engineer would figure out what action to take.

But wait, there's less. The programmers don't have the option of grouping method implementations in a hierarchy by functionality (which is common in visitation patterns - even dmd does so). They must define one class with everything in one place, and there's no way out of that.

My understanding is that the scenarios above are of no value to you, and if the language would accept them you'd consider that a degradation of the status quo. Given that the status quo includes a fair amount of impossible to detect failures and tenuous mechanisms, I disagree. Let me also play a card I wish I hadn't - I've worked on numerous large projects and I can tell from direct experience that the inherent problems are... well, odd. Engineers embarked on such projects need all the help they could get and would be willing to explore options that seem ridiculous for projects one fraction the size. Improved .di generation would be of great help. Enabling other options would be even better.

> It is certainly an
> improvement, but:
>
> 1) We don't have an infinity of programmer-hours. I'm saying that the
> time would likely be better spent at improving .di generation, which
> should have a much greater overall benefit per required work unit - and
> for all I can tell, you don't even want to seriously consider this option.

Generation of .di files does not compete with the proposed feature.

> 2) Once manually-maintained .di files are usable, they will be used as
> an excuse to shoo away people working on large projects (people
> complaining about compilation speed will be told to just manually write
> .di files for their 100KLoC projects).

Your ability to predict future is much better than mine.

Andrei

July 23, 2011

Re: Proposed improvements to the separate compilation model

Posted by Vladimir Panteleev
in reply to Andrei Alexandrescu

Vladimir Panteleev

Posted in reply to Andrei Alexandrescu

On Sun, 24 Jul 2011 00:54:57 +0300, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 7/23/11 4:01 PM, Vladimir Panteleev wrote:
>> On Sat, 23 Jul 2011 23:16:20 +0300, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> On 7/23/11 1:53 PM, Andrej Mitrovic wrote:
>>>> Isn't the biggest issue of large D projects the problems with
>>>> incremental compilation (e.g.
>>>> https://bitbucket.org/h3r3tic/xfbuild/issue/7/make-incremental-building-reliable),
>>>>
>>>> optlink, and the toolchain?
>>>
>>> The proposed improvement would mark a step forward in the toolchain
>>> and generally in the development of large programs. In particular, it
>>> would provide a simple means to decouple compilation of modules used
>>> together. It's not easy for me to figure how people don't get it's a
>>> net step forward from the current situation.
>>
>> Then you don't understand what I'm ranting about.
>
> That's a bit assuming.

OK, that implies that you weren't talking about me above.

> I thought about it for a little and concluded that I'd do good to explain the current state of affairs a bit.
>
> [snip]

There was no need to go in such great detail; I think the basics are well understood, I was asking for arguments and counter-arguments.

> Ultimately the programmers figure there's no way to keep files separate without establishing a build mechanism that e.g. generates a.di from a.d, compares it against the existing a.di, and complains if the two aren't identical. Upon such a build failure, a senior engineer would figure out what action to take.

I was going to suggest something like this, but without creating the dependency on a 3rd-party build tool. I mentioned in another thread how DMD shouldn't touch the .di file's mtime. A better idea is to not attempt to overwrite the file at all if the generated .di is identical to the old one. With this behavior, you can simply make .di files read-only - the compiler will bail out when it tries to save a new version of the .di file.

Yes, this is a hack, but it's not the only solution. Aside writing a build tool, as you mentioned, I believe many organizations include automatic tests coupled with source control, which could easily detect check-ins that change .di files.

Yes, neither of the above are "proper" solutions. But, unless I've lost track of something, you're trying to justify a solid amount of work on the compiler to implement the "proper" solution, when the above alternatives are much simpler in practice. (If you have more counter-arguments, I'd like to hear them.)

> But wait, there's less. The programmers don't have the option of grouping method implementations in a hierarchy by functionality (which is common in visitation patterns - even dmd does so). They must define one class with everything in one place, and there's no way out of that.

Sorry, I don't understand this part. Could you elaborate?

> My understanding is that the scenarios above are of no value to you, and if the language would accept them you'd consider that a degradation of the status quo.

I'm not trying to argue for my personal opinion and the way I use D. I was trying to point out that your suggestion seems less efficient in terms of benefit per work-unit for all users of D.

> Given that the status quo includes a fair amount of impossible to detect failures and tenuous mechanisms, I disagree. Let me also play a card I wish I hadn't - I've worked on numerous large projects and I can tell from direct experience that the inherent problems are... well, odd. Engineers embarked on such projects need all the help they could get and would be willing to explore options that seem ridiculous for projects one fraction the size. Improved .di generation would be of great help. Enabling other options would be even better.

[snip]

>> 1) We don't have an infinity of programmer-hours. I'm saying that the
>> time would likely be better spent at improving .di generation, which
>> should have a much greater overall benefit per required work unit - and
>> for all I can tell, you don't even want to seriously consider this option.
>
> Generation of .di files does not compete with the proposed feature.

Again, I'm not saying that this is a bad direction, just not the best one.

[off-topic]

>> 2) Once manually-maintained .di files are usable, they will be used as
>> an excuse to shoo away people working on large projects (people
>> complaining about compilation speed will be told to just manually write
>> .di files for their 100KLoC projects).
>
> Your ability to predict future is much better than mine.

I didn't say who'll say that... It might not be you or Walter, but can you account for all users of D on IRC, Reddit, StackOverflow, etc.?

Good enough is the enemy of better, etc.

[/off-topic]

P.S. I appreciate you taking the time for this discussion.

-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net

July 23, 2011

Re: Proposed improvements to the separate compilation model

Posted by bearophile
in reply to Andrei Alexandrescu

bearophile

Posted in reply to Andrei Alexandrescu

Andrei:

I am not expert on this, but it doesn't look like esoteric stuff.

> Consider:

Thank you for explaining better, with more examples. This usually helps the discussion.


> Say the team working on A wants to "freeze" a.di without precluding work on A.fun().

Currently in D there is no explicit & enforced way to state this desire to the compiler?


> The problem with this setup is that it's extremely fragile, in ways that are undetectable during compilation or runtime.

Is it possible to invent ways to make this less fragile?


> For example, just
> swapping a and b in the implementation file makes the program print
> "08.96566e-31344". Similar issues occur if fields or methods are added
> or removed from one file but not the other.

I have suggested some fine-grained hashing. Compute a hash from a class definition, and later quickly compare this value with a value stored elsewhere (like automatically written in the .di file).


> Ultimately the programmers figure there's no way to keep files separate without establishing a build mechanism that e.g. generates a.di from a.d, compares it against the existing a.di, and complains if the two aren't identical.

Comparing .di files looks tricky. DMD generates them deterministically, so in theory it works, but it doesn't sound like a good thing to do.


> They must define one class with everything in one place, and there's no way out of that.

I think C# uses partial classes to solve this. It doesn't use header files. http://msdn.microsoft.com/en-us/library/wa80x488%28v=vs.80%29.aspx

Bye,
bearophile

July 23, 2011

Re: Proposed improvements to the separate compilation model

Posted by so
in reply to Andrei Alexandrescu

so

Posted in reply to Andrei Alexandrescu

On Sun, 24 Jul 2011 00:54:57 +0300, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> The program prints "424344" as expected.
>
> The problem with this setup is that it's extremely fragile, in ways that are undetectable during compilation or runtime. For example, just swapping a and b in the implementation file makes the program print
> "08.96566e-31344". Similar issues occur if fields or methods are added or removed from one file but not the other.

I am not a fun of .di files but there is no reason i can think of that the compiler should allow such thing. If there is a .di file for the module compare it to the implementation. If the implementation is different, it is an error. If it is intended then .di file must be generated explicitly.

> So the programmers conclude they need to define an interface for A (and generally each and every hierarchy or isolated class in the project). But the same problem occurs for struct, and there's no way to define interfaces for structs.

I don't understand why people keep bringing this as a solution, this is D.

> Ultimately the programmers figure there's no way to keep files separate without establishing a build mechanism that e.g. generates a.di from a.d, compares it against the existing a.di, and complains if the two aren't identical. Upon such a build failure, a senior engineer would figure out what action to take.
>
> But wait, there's less. The programmers don't have the option of grouping method implementations in a hierarchy by functionality (which is common in visitation patterns - even dmd does so). They must define one class with everything in one place, and there's no way out of that.
>
> My understanding is that the scenarios above are of no value to you, and if the language would accept them you'd consider that a degradation of the status quo. Given that the status quo includes a fair amount of impossible to detect failures and tenuous mechanisms, I disagree. Let me also play a card I wish I hadn't - I've worked on numerous large projects and I can tell from direct experience that the inherent problems are... well, odd. Engineers embarked on such projects need all the help they could get and would be willing to explore options that seem ridiculous for projects one fraction the size. Improved .di generation would be of great help. Enabling other options would be even better.

Interface design, separate compilation, library development are one of the many things C++ couldn't make better than C but took it to a new low.
And i don't get why no one else here considers these are big issues and just go suggest C++ ways, last two times i brought up something related to this there was only one response. I want to believe it is because me, failing to express myself :)

July 23, 2011

Re: Proposed improvements to the separate compilation model

Posted by Andrei Alexandrescu
in reply to Vladimir Panteleev

Andrei Alexandrescu

Posted in reply to Vladimir Panteleev

On 7/23/11 5:34 PM, Vladimir Panteleev wrote:
> I was going to suggest something like this, but without creating the
> dependency on a 3rd-party build tool. I mentioned in another thread how
> DMD shouldn't touch the .di file's mtime. A better idea is to not
> attempt to overwrite the file at all if the generated .di is identical
> to the old one.

That's a must at any rate, and should be filed as an enhancement request. The compiler should generate filename.di.tmp, compare it against filename.di (if any), and then either remove the .tmp if identical or rename it forcefully to filename.di if different. That's a classic in code generation tools.

> Yes, this is a hack, but it's not the only solution. Aside writing a
> build tool, as you mentioned, I believe many organizations include
> automatic tests coupled with source control, which could easily detect
> check-ins that change .di files.
>
> Yes, neither of the above are "proper" solutions. But, unless I've lost
> track of something, you're trying to justify a solid amount of work on
> the compiler to implement the "proper" solution, when the above
> alternatives are much simpler in practice. (If you have more
> counter-arguments, I'd like to hear them.)

I don't think at all these aren't proper. As I said, people are willing to do crazy things to keep large projects sane. The larger question here is how to solve a failure scenario, i.e. what can we offer that senior engineer who fixes the build when the .di files are not in sync anymore.

>> But wait, there's less. The programmers don't have the option of
>> grouping method implementations in a hierarchy by functionality (which
>> is common in visitation patterns - even dmd does so). They must define
>> one class with everything in one place, and there's no way out of that.
>
> Sorry, I don't understand this part. Could you elaborate?

A class hierarchy defines foo() and bar(). We want to put all foo() implementations together and all bar() implementations together.

Andrei

July 23, 2011

Re: Proposed improvements to the separate compilation model

Posted by Andrei Alexandrescu
in reply to bearophile

Andrei Alexandrescu

Posted in reply to bearophile

On 7/23/11 5:39 PM, bearophile wrote:
> Andrei:
>
> I am not expert on this, but it doesn't look like esoteric stuff.
>
>> Consider:
>
> Thank you for explaining better, with more examples. This usually
> helps the discussion.
>
>
>> Say the team working on A wants to "freeze" a.di without precluding
>> work on A.fun().
>
> Currently in D there is no explicit&  enforced way to state this
> desire to the compiler?

It is as I described in the example. It works today. Fragility is the problem there.

>> The problem with this setup is that it's extremely fragile, in ways
>> that are undetectable during compilation or runtime.
>
> Is it possible to invent ways to make this less fragile?
>
>
>> For example, just swapping a and b in the implementation file makes
>> the program print "08.96566e-31344". Similar issues occur if fields
>> or methods are added or removed from one file but not the other.
>
> I have suggested some fine-grained hashing. Compute a hash from a
> class definition, and later quickly compare this value with a value
> stored elsewhere (like automatically written in the .di file).

I discussed four options with Walter, and this was one of them. It has issues. The proposal as in this thread is the simplest and most effective I could find.

>> Ultimately the programmers figure there's no way to keep files
>> separate without establishing a build mechanism that e.g. generates
>> a.di from a.d, compares it against the existing a.di, and complains
>> if the two aren't identical.
>
> Comparing .di files looks tricky. DMD generates them
> deterministically, so in theory it works, but it doesn't sound like a
> good thing to do.

It's commonplace with code generation tools.

>> They must define one class with everything in one place, and
>> there's no way out of that.
>
> I think C# uses partial classes to solve this. It doesn't use header
> files.
> http://msdn.microsoft.com/en-us/library/wa80x488%28v=vs.80%29.aspx

C#'s partial classes serve a different purpose, but yah, they can be used like that.


Andrei

July 23, 2011

Re: Proposed improvements to the separate compilation model

Posted by so
in reply to Andrei Alexandrescu

so

Posted in reply to Andrei Alexandrescu

On Sun, 24 Jul 2011 01:47:32 +0300, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> and then either remove the .tmp if identical or rename it forcefully to filename.di if different. That's a classic in code generation tools.

Not sure i got it right but how renaming it forcefully would solve this?
This must be two separate process for the compiler, issuing an error and if it was intended then the .di file must be generated.

July 23, 2011

Re: Proposed improvements to the separate compilation model

Posted by Vladimir Panteleev
in reply to bearophile

Vladimir Panteleev

Posted in reply to bearophile

On Sun, 24 Jul 2011 01:39:01 +0300, bearophile <bearophileHUGS@lycos.com> wrote:

> Andrei:
>
> I am not expert on this, but it doesn't look like esoteric stuff.
>
>> Consider:
>
> Thank you for explaining better, with more examples. This usually helps the discussion.

I retract my comment about this :)

>> The problem with this setup is that it's extremely fragile, in ways that
>> are undetectable during compilation or runtime.
>
> Is it possible to invent ways to make this less fragile?

This is what Andrei's proposal attempts to solve.

-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net

July 23, 2011

Re: Proposed improvements to the separate compilation model

Posted by Andrei Alexandrescu
in reply to so

Andrei Alexandrescu

Posted in reply to so

On 7/23/11 6:02 PM, so wrote:
> On Sun, 24 Jul 2011 01:47:32 +0300, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> and then either remove the .tmp if identical or rename it forcefully
>> to filename.di if different. That's a classic in code generation tools.
>
> Not sure i got it right but how renaming it forcefully would solve this?
> This must be two separate process for the compiler, issuing an error and
> if it was intended then the .di file must be generated.

There'd be an error if the file were read-only.

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation