December 20, 2016
On 12/20/2016 05:08 PM, Andrei Alexandrescu wrote:
> On 12/20/2016 03:46 AM, Joakim wrote:
>> I didn't just say "eh:" I gave evidence for why I think the problem is
>> minimal and asked why it's so important to scope those last 3-4 imported
>> modules too, which you didn't answer.
>
> You have asked for a smoking gun, and one has been found.
>
> I have just uploaded a major update that carefully analyzes the
> improvements brought about by switching to local imports in the D
> Standard Library. Please refer to the section "Workaround: Are Local
> Imports Good Enough?" and Appendix A:
>
> https://github.com/dlang/DIPs/pull/51/files
>
> https://github.com/dlang/DIPs/blob/91baecedcfe7cb75ac22e66478722ec797ebb901/DIPs/DIP1005.md

Eh, looks like http://dillinger.io/ and github don't agree on table rendering... Fixed URL for nice viewing:

https://github.com/dlang/DIPs/blob/249b28ddd784220e44e343e78e5ea7a65c4c7bde/DIPs/DIP1005.md


Andrei
December 21, 2016
On Tuesday, 20 December 2016 at 20:51:54 UTC, Andrei Alexandrescu wrote:
> I've asked Joakim about this via email just now, likely other folks also know the answer:
>
> 1. I found these PRs related to local imports:
>
> https://github.com/dlang/phobos/pull/4361
> https://github.com/dlang/phobos/pull/4365
> https://github.com/dlang/phobos/pull/4370
> https://github.com/dlang/phobos/pull/4373
> https://github.com/dlang/phobos/pull/4379
> https://github.com/dlang/phobos/pull/4392
> https://github.com/dlang/phobos/pull/4467
>
> Are there more? Is there a central point for them (such as a bugzilla issue)?

Ilya lists a lot more above, he did most of the work.  No central point that I know of.

> 2. I see you've done a bunch of work in the area. Where, in your
> estimate, are we on the spectrum of making imports local without a
> major reorganization? Any low-hanging fruit to look at, or have Joakim,
> Ilya, Jack and others made a pass through most/all modules already?

There is more to be done, but my guess would be 80-90% done for Phobos. Ilya scoped a lot, but usually left druntime imports alone.  Top-level module imports mostly don't use selective imports yet, because Martin wanted to hold off till he was sure the symbol leak bug was fixed.

On Tuesday, 20 December 2016 at 22:08:38 UTC, Andrei Alexandrescu wrote:
> On 12/20/2016 03:46 AM, Joakim wrote:
>> I didn't just say "eh:" I gave evidence for why I think the problem is
>> minimal and asked why it's so important to scope those last 3-4 imported
>> modules too, which you didn't answer.
>
> You have asked for a smoking gun, and one has been found.
>
> I have just uploaded a major update that carefully analyzes the improvements brought about by switching to local imports in the D Standard Library. Please refer to the section "Workaround: Are Local Imports Good Enough?" and Appendix A:
>
> https://github.com/dlang/DIPs/pull/51/files
>
> https://github.com/dlang/DIPs/blob/91baecedcfe7cb75ac22e66478722ec797ebb901/DIPs/DIP1005.md

Thanks for this analysis of the remaining dependency graph, it is worth looking at.  Allow me to poke some holes in it.

To begin with, the amount of scoping that has been done is overstated, if you simply count scoped imports and compare it to module-level imports.  Each module-level import has to be replicated multiple times for each local scope, especially in unittest blocks.  A better number is more like 20-30%, as I pointed out 4 out of 13 modules remain at top-level in std.array. Using that metric, a 3-4X reduction in top-level imports has led to at least a 2.2x improvement in imported files, so the effort has been more meaningful than you conclude.

Second, as I noted above, most top-level imports have not been made selective yet, because of the symbol leak bug that was recently fixed by Martin.  You will see in my PRs that I only list those symbols as a comment, because I could not turn those into selective imports yet.  If the compiler is doing its job right, selective imports should greatly reduce the cost of importing a module, even if your metric would still show the module being imported.

Third, checking some of the output from the commands you ran in your script shows that up to half of the imported modules are from druntime.  I noted earlier that Ilya usually didn't bother scoping top-level druntime imports, because he perceived their cost to be low (I scoped them too in the handful I cleaned up, just for completeness).  As far as I know, nobody has bothered to spend any time scoping druntime, so it would be better if you filtered out druntime imports from your analysis.

Finally, while it's nice to know the extent of the dependency graph, what really matters is the _cost_ of each link of the graph, which is what I keep hammering on.  If the cost of links is small, it doesn't matter how entangled it is.  If minimizing the dependency graph through scoping alone, ie without implementing this DIP, removes most of the cost, that's all I care about.

I have noted one example above, where _a single DCD in phobos_, ie a scoped, selective import, had gigantic costs in terms of executable size, where entire modules were included because of it.  If that's the case more generally, then _no_ amount of dependency disentangling will matter, because the cost of single DCDs is still huge.  Perhaps that's just an isolated issue however, it needs to be investigated.

My point is that the dependency graph matters, but now that we're getting down to the last entanglements, we need to know the cost of those last links.  Your dependency analysis gives us some quantitative idea of the size of the remaining graph, but tells us nothing about the cost of those links. That's what I'm looking for.

I will spend some time now investigating those costs with sample code.  My request all along has been that you give us some idea of those costs, if you know the answer already.
December 21, 2016
On 12/20/2016 11:32 PM, Joakim wrote:
> On Tuesday, 20 December 2016 at 20:51:54 UTC, Andrei Alexandrescu wrote:
> Thanks for this analysis of the remaining dependency graph, it is worth
> looking at.  Allow me to poke some holes in it.
>
> To begin with, the amount of scoping that has been done is overstated,
> if you simply count scoped imports and compare it to module-level
> imports.  Each module-level import has to be replicated multiple times
> for each local scope, especially in unittest blocks.  A better number is
> more like 20-30%, as I pointed out 4 out of 13 modules remain at
> top-level in std.array. Using that metric, a 3-4X reduction in top-level
> imports has led to at least a 2.2x improvement in imported files, so the
> effort has been more meaningful than you conclude.

Fixed. I also made a note about the need to duplicate imports as they are pushed down.

> Second, as I noted above, most top-level imports have not been made
> selective yet, because of the symbol leak bug that was recently fixed by
> Martin.  You will see in my PRs that I only list those symbols as a
> comment, because I could not turn those into selective imports yet.  If
> the compiler is doing its job right, selective imports should greatly
> reduce the cost of importing a module, even if your metric would still
> show the module being imported.

That is not relevant to this section, which discusses the effectiveness of using local imports with the current compilation technology. Per the section's opening sentence:

> A legitimate question to ask is whether consistent use of local
> imports wherever possible would be an appropriate approximation of
> the Dependency-Carrying Declarations goal with no change in the
> language at all.

The section "Alternative: Lazy Imports" discusses how static or local imports could be used in conjunction with new compilation technologies. If there are improvements to be made there, please advise.

> Third, checking some of the output from the commands you ran in your
> script shows that up to half of the imported modules are from druntime.
> I noted earlier that Ilya usually didn't bother scoping top-level
> druntime imports, because he perceived their cost to be low (I scoped
> them too in the handful I cleaned up, just for completeness).  As far as
> I know, nobody has bothered to spend any time scoping druntime, so it
> would be better if you filtered out druntime imports from your analysis.

Fixed to only count imports from std.

> Finally, while it's nice to know the extent of the dependency graph,
> what really matters is the _cost_ of each link of the graph, which is
> what I keep hammering on.  If the cost of links is small, it doesn't
> matter how entangled it is.  If minimizing the dependency graph through
> scoping alone, ie without implementing this DIP, removes most of the
> cost, that's all I care about.

In first approximation, whether a file gets opened or not makes a difference (filesystem operations (possibly including networking), necessity to rebuild if dependent code is changed. The analysis shows there is significant overhead remaining, on average 10.5 additional files per unused import.

If the current document could be clearer in explaining costs, please let me know.

> I have noted one example above, where _a single DCD in phobos_, ie a
> scoped, selective import, had gigantic costs in terms of executable
> size, where entire modules were included because of it.  If that's the
> case more generally, then _no_ amount of dependency disentangling will
> matter, because the cost of single DCDs is still huge.  Perhaps that's
> just an isolated issue however, it needs to be investigated.

That seems an unrelated matter. Yes, you could pull a large dependency in one shot with old or new technology.

> My point is that the dependency graph matters, but now that we're
> getting down to the last entanglements, we need to know the cost of
> those last links.  Your dependency analysis gives us some quantitative
> idea of the size of the remaining graph, but tells us nothing about the
> cost of those links. That's what I'm looking for.
>
> I will spend some time now investigating those costs with sample code.
> My request all along has been that you give us some idea of those costs,
> if you know the answer already.

I don't know how to make matters much clearer than the current document. Any suggestions are welcome. The section "Workaround: Are Local Imports Good Enough?" discusses the material cost in terms of extra files that need to be opened and parsed (some unnecessarily) in order to complete a compilation. The "Rationale" part of the document discusses the costs in terms of maintainability, clarity, and documentation.


Thanks,

Andrei

December 21, 2016
On 12/20/2016 09:31 AM, Dmitry Olshansky wrote:
> On 12/13/16 11:33 PM, Andrei Alexandrescu wrote:
>> Destroy.
>>
>> https://github.com/dlang/DIPs/pull/51/files
>>
>>
>> Andrei
>
> Just a thought but with all of proliferation of imports down to each
> declaration comes the pain that e.g. renaming a module cascades to
> countless instances of import statements. This is true of local imports
> as well but the problem gets bigger.

https://github.com/dlang/DIPs/pull/51/commits/d4ef6826dacedc38f822e48bec2186d93040fb42

Andrei

December 21, 2016
On Mon, Dec 19, 2016 at 9:33 PM, Timothee Cour <thelastmammoth@gmail.com> wrote:

> what about using `lazy` instead of `with`:
>
> `with(import foo)`
> =>
> `lazy(import foo)`
>
> advantages:
> * avoids confusion regarding usual scoping rules of `with` ;
> * conveys that the import is indeed lazy
>
> Furthermore (regardless of which keyword is used), what about allowing `:`
> ```
> // case 1
> lazy(import foo)
> void fun(){}
>
> // case 2
> lazy(import foo) {
>   void fun(){}
> }
>
> // case 3 : this is new
> lazy(import foo):
> void fun1(){}
> void fun2(){}
> ```
>
> advantages:
>
> * same behavior as other constructs which don't introduce a scope:
> ```
> // case 1, 2 3 are allowed:
> version(A):
> static if(true):
> private:
> void fun(){}
> ```
>
> * avoids nesting when case 3 is used (compared to when using `{}`)
>
> * I would argue that grouping lazy imports is actually a common case; without case 3, the indentation will increase.
>


Andrei: ping on this? (especially regarding allowing `:`)


December 21, 2016
On 12/21/16 6:40 PM, Timothee Cour via Digitalmars-d wrote:
> Andrei: ping on this? (especially regarding allowing `:`)

I think "lazy" is a bit too cute. "with" is so close to what's actually needed, it would be a waste to not use it.

Generally I'm weary of the use of ":" (never liked it - it makes code dependent on long-distance context) so I'd rather snatch the opportunity to avoid it.


Andrei

December 22, 2016
On Wednesday, 21 December 2016 at 15:16:34 UTC, Andrei Alexandrescu wrote:
> On 12/20/2016 11:32 PM, Joakim wrote:
>> Second, as I noted above, most top-level imports have not been made
>> selective yet, because of the symbol leak bug that was recently fixed by
>> Martin.  You will see in my PRs that I only list those symbols as a
>> comment, because I could not turn those into selective imports yet.  If
>> the compiler is doing its job right, selective imports should greatly
>> reduce the cost of importing a module, even if your metric would still
>> show the module being imported.
>
> That is not relevant to this section, which discusses the effectiveness of using local imports with the current compilation technology. Per the section's opening sentence:
>
>> A legitimate question to ask is whether consistent use of local
>> imports wherever possible would be an appropriate approximation of
>> the Dependency-Carrying Declarations goal with no change in the
>> language at all.

It is relevant because it could further reduce the cost from module-scope imports.  Whether your section only chooses to focus on locally scoped imports and ignore the impact of selective imports is irrelevant to me.

>> Finally, while it's nice to know the extent of the dependency graph,
>> what really matters is the _cost_ of each link of the graph, which is
>> what I keep hammering on.  If the cost of links is small, it doesn't
>> matter how entangled it is.  If minimizing the dependency graph through
>> scoping alone, ie without implementing this DIP, removes most of the
>> cost, that's all I care about.
>
> In first approximation, whether a file gets opened or not makes a difference (filesystem operations (possibly including networking), necessity to rebuild if dependent code is changed. The analysis shows there is significant overhead remaining, on average 10.5 additional files per unused import.

Opening a file or 10 is extremely cheap compared to all the other costs of the compiler.  Purely from a technical cost perspective, I'm not sure even scoped imports were worth it, as my simple investigation below suggests.

> If the current document could be clearer in explaining costs, please let me know.

Yes, please explain what significant "overhead" was reduced by scoping imports and would be futher reduced by this DIP.  Opening files doesn't cut it.

>> My point is that the dependency graph matters, but now that we're
>> getting down to the last entanglements, we need to know the cost of
>> those last links.  Your dependency analysis gives us some quantitative
>> idea of the size of the remaining graph, but tells us nothing about the
>> cost of those links. That's what I'm looking for.
>>
>> I will spend some time now investigating those costs with sample code.
>> My request all along has been that you give us some idea of those costs,
>> if you know the answer already.
>
> I don't know how to make matters much clearer than the current document. Any suggestions are welcome. The section "Workaround: Are Local Imports Good Enough?" discusses the material cost in terms of extra files that need to be opened and parsed (some unnecessarily) in order to complete a compilation. The "Rationale" part of the document discusses the costs in terms of maintainability, clarity, and documentation.

I don't know how to make it clearer that that's not good enough.  You seem to understand that I want more justification than hand-waving about "scalable" and "overhead," which is why you now give the cost of opening files as justification, but you don't seem to have anything more substantive than that flimsy claim.

I just tried compiling phobos and its unittests for dmd 2.066.1 and 2.067.1, the dmd releases from right before and right after Ilya's PRs linked above (compiling phobos from the older release with the newer dmd didn't work and I wasn't interested in porting it).  I found they took about the same amount of time to compile their respective phobos and the later version consistently took 15 seconds longer to compile the unittests on a single core.

This suggests that there has been essentially no technical benefit to scoped imports, that it is largely superficial.  That's fine, I think it's worth it just from the standpoint of understanding where most dependencies are coming from.  I don't think an additional syntax change is necessary for the remaining few module-scope dependencies for template constraints.

Note on investigation: Of course, some other relevant factors could have changed in dmd and phobos between those two releases, so the result is certainly not conclusive.  But the fact that there was no decrease in compile times while phobos took around the same amount of time is highly suggestive.

I'm uninterested in investigating further given the consistent hand-waving justifications for this DIP.  If somebody else had submitted this DIP, it would have been quickly shot down, and rightly so.
December 22, 2016
> I just tried compiling phobos and its unittests for dmd 2.066.1 and
2.067.1

I think it's worth considering compile time for partial recompilation as opposed to full compilation. The benifit of this DIP should be more pronounced there since there'll be more opportunities to skip parsing modules in that case. Partial recompilation is what matters most during `edit compile debug cycle` anyways

On Thu, Dec 22, 2016 at 2:17 AM, Joakim via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> On Wednesday, 21 December 2016 at 15:16:34 UTC, Andrei Alexandrescu wrote:
>
>> On 12/20/2016 11:32 PM, Joakim wrote:
>>
>>> Second, as I noted above, most top-level imports have not been made selective yet, because of the symbol leak bug that was recently fixed by Martin.  You will see in my PRs that I only list those symbols as a comment, because I could not turn those into selective imports yet.  If the compiler is doing its job right, selective imports should greatly reduce the cost of importing a module, even if your metric would still show the module being imported.
>>>
>>
>> That is not relevant to this section, which discusses the effectiveness of using local imports with the current compilation technology. Per the section's opening sentence:
>>
>> A legitimate question to ask is whether consistent use of local
>>> imports wherever possible would be an appropriate approximation of the Dependency-Carrying Declarations goal with no change in the language at all.
>>>
>>
> It is relevant because it could further reduce the cost from module-scope imports.  Whether your section only chooses to focus on locally scoped imports and ignore the impact of selective imports is irrelevant to me.
>
> Finally, while it's nice to know the extent of the dependency graph,
>>> what really matters is the _cost_ of each link of the graph, which is what I keep hammering on.  If the cost of links is small, it doesn't matter how entangled it is.  If minimizing the dependency graph through scoping alone, ie without implementing this DIP, removes most of the cost, that's all I care about.
>>>
>>
>> In first approximation, whether a file gets opened or not makes a difference (filesystem operations (possibly including networking), necessity to rebuild if dependent code is changed. The analysis shows there is significant overhead remaining, on average 10.5 additional files per unused import.
>>
>
> Opening a file or 10 is extremely cheap compared to all the other costs of the compiler.  Purely from a technical cost perspective, I'm not sure even scoped imports were worth it, as my simple investigation below suggests.
>
> If the current document could be clearer in explaining costs, please let
>> me know.
>>
>
> Yes, please explain what significant "overhead" was reduced by scoping imports and would be futher reduced by this DIP.  Opening files doesn't cut it.
>
> My point is that the dependency graph matters, but now that we're
>>> getting down to the last entanglements, we need to know the cost of those last links.  Your dependency analysis gives us some quantitative idea of the size of the remaining graph, but tells us nothing about the cost of those links. That's what I'm looking for.
>>>
>>> I will spend some time now investigating those costs with sample code. My request all along has been that you give us some idea of those costs, if you know the answer already.
>>>
>>
>> I don't know how to make matters much clearer than the current document. Any suggestions are welcome. The section "Workaround: Are Local Imports Good Enough?" discusses the material cost in terms of extra files that need to be opened and parsed (some unnecessarily) in order to complete a compilation. The "Rationale" part of the document discusses the costs in terms of maintainability, clarity, and documentation.
>>
>
> I don't know how to make it clearer that that's not good enough.  You seem to understand that I want more justification than hand-waving about "scalable" and "overhead," which is why you now give the cost of opening files as justification, but you don't seem to have anything more substantive than that flimsy claim.
>
> I just tried compiling phobos and its unittests for dmd 2.066.1 and 2.067.1, the dmd releases from right before and right after Ilya's PRs linked above (compiling phobos from the older release with the newer dmd didn't work and I wasn't interested in porting it).  I found they took about the same amount of time to compile their respective phobos and the later version consistently took 15 seconds longer to compile the unittests on a single core.
>
> This suggests that there has been essentially no technical benefit to scoped imports, that it is largely superficial.  That's fine, I think it's worth it just from the standpoint of understanding where most dependencies are coming from.  I don't think an additional syntax change is necessary for the remaining few module-scope dependencies for template constraints.
>
> Note on investigation: Of course, some other relevant factors could have changed in dmd and phobos between those two releases, so the result is certainly not conclusive.  But the fact that there was no decrease in compile times while phobos took around the same amount of time is highly suggestive.
>
> I'm uninterested in investigating further given the consistent hand-waving justifications for this DIP.  If somebody else had submitted this DIP, it would have been quickly shot down, and rightly so.
>


December 22, 2016
On Wednesday, 21 December 2016 at 15:16:34 UTC, Andrei Alexandrescu wrote:
> On 12/20/2016 11:32 PM, Joakim wrote:
>> (...)
> I don't know how to make matters much clearer than the current document. Any suggestions are welcome. The section "Workaround: Are Local Imports Good Enough?" discusses the material cost in terms of extra files that need to be opened and parsed (some unnecessarily) in order to complete a compilation. The "Rationale" part of the document discusses the costs in terms of maintainability, clarity, and documentation.
>
>
> Thanks,
>
> Andrei

Stipulation: I think the difference of opinion may be caused by working on different sizes of projects in ones career (tens of thousands vs millions of LoC).

Suggestion 1: Maybe the DIP should point out that the cost of redundant imports (however small) tends to grow quadratically with code size (size of import tree times the number of compilations).
If this is not the case then maybe the DIP is really in the wrong direction.

Suggestion 2:
Implement the DIP, autogenerate millions of lines of D code (2 versions: with and without DCDs) and see which version of DMD compiles them faster. This may also expose other ways to improve scalability without implementing this DIP.
December 22, 2016
On 12/22/2016 05:17 AM, Joakim wrote:
> On Wednesday, 21 December 2016 at 15:16:34 UTC, Andrei Alexandrescu wrote:
>> In first approximation, whether a file gets opened or not makes a
>> difference (filesystem operations (possibly including networking),
>> necessity to rebuild if dependent code is changed. The analysis shows
>> there is significant overhead remaining, on average 10.5 additional
>> files per unused import.
>
> Opening a file or 10 is extremely cheap compared to all the other costs
> of the compiler.  Purely from a technical cost perspective, I'm not sure
> even scoped imports were worth it, as my simple investigation below
> suggests.

This is a misunderstanding. (I'll make the DIP clearer.) It's not about the cost of opening the file per se; I'm using the number of files opened as a proxy for all work involved in processing a file. Meaning, if you import 10 files you're likely to do roughly 10x the work of importing one file.

> I just tried compiling phobos and its unittests for dmd 2.066.1 and
> 2.067.1, the dmd releases from right before and right after Ilya's PRs
> linked above (compiling phobos from the older release with the newer dmd
> didn't work and I wasn't interested in porting it).  I found they took
> about the same amount of time to compile their respective phobos and the
> later version consistently took 15 seconds longer to compile the
> unittests on a single core.

Unittesting Phobos (whether in separation or together) will instantiate everything so they are the case when local imports do _not_ make any difference. That's why DIP1005 uses that case as an estimate of everything in an imported module being used.

Compiling Phobos in its entirety (one command) and measuring that time is also of tenuous relevance; I need to think of it but at first sight the cost of unnecessary imports is collapsed, and I suspect (and will measure) that either way most modules are in fact imported.

One measurement of interest is: write a module that imports exactly one stdlib module, compile it, and measure time. Looking at the generated object and executable size would be also of interest. I'll do that.

> This suggests that there has been essentially no technical benefit to
> scoped imports, that it is largely superficial.  That's fine, I think
> it's worth it just from the standpoint of understanding where most
> dependencies are coming from.  I don't think an additional syntax change
> is necessary for the remaining few module-scope dependencies for
> template constraints.
>
> Note on investigation: Of course, some other relevant factors could have
> changed in dmd and phobos between those two releases, so the result is
> certainly not conclusive.  But the fact that there was no decrease in
> compile times while phobos took around the same amount of time is highly
> suggestive.
>
> I'm uninterested in investigating further given the consistent
> hand-waving justifications for this DIP.  If somebody else had submitted
> this DIP, it would have been quickly shot down, and rightly so.

Walter and I have the role of scrutinizing every addition to the language (and reject most). It is natural that our own work is met with increased scrutiny. Picking to death anything and everything we say or do is a staple in this community, and a rite of passage on github. It is of course impossible to know what would have happened if the proposal were made by someone else. All I can say is Walter knew nothing about it and said it is good (except for the initial syntax; he's on board with "with").

Anyhow, not to worry. The burden of proof is on the DIP. I'll take a look at making some more measurements.


Andrei