Thread overview
Re: what are guidelines for when to split a module into a package?
Feb 22, 2018
Jonathan M Davis
Feb 22, 2018
Timothee Cour
Feb 22, 2018
Timothee Cour
Feb 22, 2018
Jonathan M Davis
Feb 22, 2018
Jonathan M Davis
Feb 22, 2018
Timothee Cour
Feb 22, 2018
Seb
Feb 22, 2018
Timothee Cour
February 22, 2018
On Wednesday, February 21, 2018 23:13:33 Timothee Cour via Digitalmars-d wrote:
> from my perspective it makes sense to split a module M into submodules
> A, B when:
> * M is large
> * there's little interaction between A and B (eg only few symbols from
> A are needed in B and vice versa)
> * A and B are logically grouped (that is domain specific)
> * it doesn't turn into an extreme (1 function per module)
>
> Advantages of splitting:
> * easier to review
> * easier to edit (no need to scroll much to see entirety of module
> we're editing)
> * less pollution from top-level imports as they only affect submodule
> (likewise with top-level attributes etc)
> * more modular
> * doesn't affect existing code since `import M` will continue to work
> after M is split into a package
> * less memory when using separate compilation
> * allows fine-grained compiler options (eg we can compile B with `-O` if
> needed) * allows to run unittests just for A instead of M
> * allows selective import in client to avoid pulling in too many
> dependencies (see arguments that were made for std.range.primitives)
>
> Disadvantages of splitting:
> * more files; but not sure why that's a problem so long we don't go
> into extremes (eg 1 function per module or other things of bad taste)
>
> ---
> while working on https://github.com/dlang/phobos/pull/6178 I had
> initially split M:std.array into submodules:
> A:std.array.util (the old std.array) and B:std.array.static_array
> (everything added in the PR)
> IMO this made sense according to my above criteria (in this case there
> was 0 interaction between A and B), but the reviewers disagreed with
> the split.
>
> So, what are the guidelines?

It's decided on a case-by-case basis but is generally only done if the module is quite large. std.array is not particularly large. It's less than 4000 lines, including unit tests and documentation, and it only has 18 top-level symbols.

Also, remember that within Phobos, imports are supposed to be as localized as possible - both in terms of where the import is placed and in terms of selective imports - e.g. it would be

import std.algorithm.searching : find;

not

import std.algorithm : find;

which means that splitting the module then requires that all of those imports be even more specific. User code can choose to do that or not, but it does make having modules split up further that much more tedious. Related to that is the fact that anyone searching for these symbols now has more modules to search through. So, finding symbols will be harder. Take std.algorithm for instance. It was split, because it was getting large enough that compiling it on machines without large amounts of memory resulted in the compiler running out of memory. So, there was a very good argument for splitting it. However, now, even if you know that a symbol is in std.algorithm, do you know where in std.algorithm it is? Some are obvious - e.g. sort is in std.algorithm.sorting. However, others are not so obviously - e.g. where does startsWith live? Arguably, it could go in either std.algorithm.comparison or std.algorithm.searching. It turns out that it's in std.algorithm.searching, but I generally have to look it up. And where to functions like map or filter live? std.algorithm.mutation? std.algorithm.iteration? It's not necessarily obvious at all.

>From the perspective of users trying to find stuff, splitting modules up
comes at a real cost, and I honestly don't understand why some folks are in a hurry to make module really small. That means more import statements when using those modules, and it means that it's harder to find symbols.

Personally, I think that we should be very slow to consider splitting modules and only do so when it's clear that there's a real need, and std.array is nowhere near that level.

- Jonathan M Davis


February 21, 2018
>  it's harder to find symbols

i don't understand this argument.

```
dscanner --declaration startsWith
./std/algorithm/searching.d(4105:6)
./std/algorithm/searching.d(4195:6)
./std/algorithm/searching.d(4265:6)
./std/algorithm/searching.d(4301:6)
```


On Wed, Feb 21, 2018 at 11:31 PM, Jonathan M Davis via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Wednesday, February 21, 2018 23:13:33 Timothee Cour via Digitalmars-d wrote:
>> from my perspective it makes sense to split a module M into submodules
>> A, B when:
>> * M is large
>> * there's little interaction between A and B (eg only few symbols from
>> A are needed in B and vice versa)
>> * A and B are logically grouped (that is domain specific)
>> * it doesn't turn into an extreme (1 function per module)
>>
>> Advantages of splitting:
>> * easier to review
>> * easier to edit (no need to scroll much to see entirety of module
>> we're editing)
>> * less pollution from top-level imports as they only affect submodule
>> (likewise with top-level attributes etc)
>> * more modular
>> * doesn't affect existing code since `import M` will continue to work
>> after M is split into a package
>> * less memory when using separate compilation
>> * allows fine-grained compiler options (eg we can compile B with `-O` if
>> needed) * allows to run unittests just for A instead of M
>> * allows selective import in client to avoid pulling in too many
>> dependencies (see arguments that were made for std.range.primitives)
>>
>> Disadvantages of splitting:
>> * more files; but not sure why that's a problem so long we don't go
>> into extremes (eg 1 function per module or other things of bad taste)
>>
>> ---
>> while working on https://github.com/dlang/phobos/pull/6178 I had
>> initially split M:std.array into submodules:
>> A:std.array.util (the old std.array) and B:std.array.static_array
>> (everything added in the PR)
>> IMO this made sense according to my above criteria (in this case there
>> was 0 interaction between A and B), but the reviewers disagreed with
>> the split.
>>
>> So, what are the guidelines?
>
> It's decided on a case-by-case basis but is generally only done if the module is quite large. std.array is not particularly large. It's less than 4000 lines, including unit tests and documentation, and it only has 18 top-level symbols.
>
> Also, remember that within Phobos, imports are supposed to be as localized as possible - both in terms of where the import is placed and in terms of selective imports - e.g. it would be
>
> import std.algorithm.searching : find;
>
> not
>
> import std.algorithm : find;
>
> which means that splitting the module then requires that all of those imports be even more specific. User code can choose to do that or not, but it does make having modules split up further that much more tedious. Related to that is the fact that anyone searching for these symbols now has more modules to search through. So, finding symbols will be harder. Take std.algorithm for instance. It was split, because it was getting large enough that compiling it on machines without large amounts of memory resulted in the compiler running out of memory. So, there was a very good argument for splitting it. However, now, even if you know that a symbol is in std.algorithm, do you know where in std.algorithm it is? Some are obvious - e.g. sort is in std.algorithm.sorting. However, others are not so obviously - e.g. where does startsWith live? Arguably, it could go in either std.algorithm.comparison or std.algorithm.searching. It turns out that it's in std.algorithm.searching, but I generally have to look it up. And where to functions like map or filter live? std.algorithm.mutation? std.algorithm.iteration? It's not necessarily obvious at all.
>
> From the perspective of users trying to find stuff, splitting modules up comes at a real cost, and I honestly don't understand why some folks are in a hurry to make module really small. That means more import statements when using those modules, and it means that it's harder to find symbols.
>
> Personally, I think that we should be very slow to consider splitting modules and only do so when it's clear that there's a real need, and std.array is nowhere near that level.
>
> - Jonathan M Davis
>
>
February 21, 2018
```
import std.algorithm.searching : find;

not

import std.algorithm : find;
```

that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect.


On Wed, Feb 21, 2018 at 11:44 PM, Timothee Cour <thelastmammoth@gmail.com> wrote:
>>  it's harder to find symbols
>
> i don't understand this argument.
>
> ```
> dscanner --declaration startsWith
> ./std/algorithm/searching.d(4105:6)
> ./std/algorithm/searching.d(4195:6)
> ./std/algorithm/searching.d(4265:6)
> ./std/algorithm/searching.d(4301:6)
> ```
>
>
> On Wed, Feb 21, 2018 at 11:31 PM, Jonathan M Davis via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>> On Wednesday, February 21, 2018 23:13:33 Timothee Cour via Digitalmars-d wrote:
>>> from my perspective it makes sense to split a module M into submodules
>>> A, B when:
>>> * M is large
>>> * there's little interaction between A and B (eg only few symbols from
>>> A are needed in B and vice versa)
>>> * A and B are logically grouped (that is domain specific)
>>> * it doesn't turn into an extreme (1 function per module)
>>>
>>> Advantages of splitting:
>>> * easier to review
>>> * easier to edit (no need to scroll much to see entirety of module
>>> we're editing)
>>> * less pollution from top-level imports as they only affect submodule
>>> (likewise with top-level attributes etc)
>>> * more modular
>>> * doesn't affect existing code since `import M` will continue to work
>>> after M is split into a package
>>> * less memory when using separate compilation
>>> * allows fine-grained compiler options (eg we can compile B with `-O` if
>>> needed) * allows to run unittests just for A instead of M
>>> * allows selective import in client to avoid pulling in too many
>>> dependencies (see arguments that were made for std.range.primitives)
>>>
>>> Disadvantages of splitting:
>>> * more files; but not sure why that's a problem so long we don't go
>>> into extremes (eg 1 function per module or other things of bad taste)
>>>
>>> ---
>>> while working on https://github.com/dlang/phobos/pull/6178 I had
>>> initially split M:std.array into submodules:
>>> A:std.array.util (the old std.array) and B:std.array.static_array
>>> (everything added in the PR)
>>> IMO this made sense according to my above criteria (in this case there
>>> was 0 interaction between A and B), but the reviewers disagreed with
>>> the split.
>>>
>>> So, what are the guidelines?
>>
>> It's decided on a case-by-case basis but is generally only done if the module is quite large. std.array is not particularly large. It's less than 4000 lines, including unit tests and documentation, and it only has 18 top-level symbols.
>>
>> Also, remember that within Phobos, imports are supposed to be as localized as possible - both in terms of where the import is placed and in terms of selective imports - e.g. it would be
>>
>> import std.algorithm.searching : find;
>>
>> not
>>
>> import std.algorithm : find;
>>
>> which means that splitting the module then requires that all of those imports be even more specific. User code can choose to do that or not, but it does make having modules split up further that much more tedious. Related to that is the fact that anyone searching for these symbols now has more modules to search through. So, finding symbols will be harder. Take std.algorithm for instance. It was split, because it was getting large enough that compiling it on machines without large amounts of memory resulted in the compiler running out of memory. So, there was a very good argument for splitting it. However, now, even if you know that a symbol is in std.algorithm, do you know where in std.algorithm it is? Some are obvious - e.g. sort is in std.algorithm.sorting. However, others are not so obviously - e.g. where does startsWith live? Arguably, it could go in either std.algorithm.comparison or std.algorithm.searching. It turns out that it's in std.algorithm.searching, but I generally have to look it up. And where to functions like map or filter live? std.algorithm.mutation? std.algorithm.iteration? It's not necessarily obvious at all.
>>
>> From the perspective of users trying to find stuff, splitting modules up comes at a real cost, and I honestly don't understand why some folks are in a hurry to make module really small. That means more import statements when using those modules, and it means that it's harder to find symbols.
>>
>> Personally, I think that we should be very slow to consider splitting modules and only do so when it's clear that there's a real need, and std.array is nowhere near that level.
>>
>> - Jonathan M Davis
>>
>>
February 22, 2018
On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d wrote:
> ```
> import std.algorithm.searching : find;
>
> not
>
> import std.algorithm : find;
> ```
>
> that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect.

Maybe, but the CI stuff for Phobos doesn't like that, and it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols.

- Jonathan M Davis

February 22, 2018
On Wednesday, February 21, 2018 23:44:49 Timothee Cour via Digitalmars-d wrote:
> >  it's harder to find symbols
>
> i don't understand this argument.
>
> ```
> dscanner --declaration startsWith
> ./std/algorithm/searching.d(4105:6)
> ./std/algorithm/searching.d(4195:6)
> ./std/algorithm/searching.d(4265:6)
> ./std/algorithm/searching.d(4301:6)
> ```

I have no intention whatsoever of using dscanner. I can do something similar with grep, but either way, that doesn't help anyone who's actually reading the documentation and trying to find stuff that way, which is often what people do.

Regardless, the more split up stuff is, the more places that you have to look for it. We already have plenty of complaints about folks not knowing whether something is in std.algorithm, std.array, std.range, and std.string without getting into issues of sub-modules.

And I really don't understand why splitting up a module like std.array would help anything except maybe import times, but that only matters if the module is large (which std.array definitely isn't) and if you're importing the modules as specifically as possible and not using public imports through something like std.array.package.

- Jonathan M Davis

February 22, 2018
> it actually does reduce compilation times if the imports go directly to the module in
question rather than to a module that publicly imports the symbols

time1=compilation time of `import std.algorithm : find;` before split time21=compilation time of `import std.algorithm : find;` after split time22=compilation time of `import std.algorithm.searching : find;` after split

my understand is that we have:
time22 < time1 but time21 ~= time1
so we're in no way worse than before the split
unless time21 > time1 (noticably) in which case you have a strong argument


On Wed, Feb 21, 2018 at 11:57 PM, Jonathan M Davis via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d wrote:
>> ```
>> import std.algorithm.searching : find;
>>
>> not
>>
>> import std.algorithm : find;
>> ```
>>
>> that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect.
>
> Maybe, but the CI stuff for Phobos doesn't like that, and it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols.
>
> - Jonathan M Davis
>
February 22, 2018
> that doesn't help anyone who's actually reading the documentation and trying to find stuff that way

how about the following fix for that:
having a DDOC token on a package.d to indicate merging the submodules
in the documentation, eg:

```
/// MERGE_SUBMODULES
std/aglorithm/package.d
```

when user browses to https://dlang.org/phobos/std_algorithm.html, he would see the DDOC contents from all direct submodules of std/aglorithm/ right there in that ddoc page.




On Thu, Feb 22, 2018 at 12:04 AM, Timothee Cour <thelastmammoth@gmail.com> wrote:
>> it actually does reduce compilation times if the imports go directly to the module in
> question rather than to a module that publicly imports the symbols
>
> time1=compilation time of `import std.algorithm : find;` before split time21=compilation time of `import std.algorithm : find;` after split time22=compilation time of `import std.algorithm.searching : find;` after split
>
> my understand is that we have:
> time22 < time1 but time21 ~= time1
> so we're in no way worse than before the split
> unless time21 > time1 (noticably) in which case you have a strong argument
>
>
> On Wed, Feb 21, 2018 at 11:57 PM, Jonathan M Davis via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>> On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d wrote:
>>> ```
>>> import std.algorithm.searching : find;
>>>
>>> not
>>>
>>> import std.algorithm : find;
>>> ```
>>>
>>> that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect.
>>
>> Maybe, but the CI stuff for Phobos doesn't like that, and it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols.
>>
>> - Jonathan M Davis
>>
February 22, 2018
On Thursday, 22 February 2018 at 08:04:19 UTC, Timothee Cour wrote:
>> it actually does reduce compilation times if the imports go directly to the module in
> question rather than to a module that publicly imports the symbols
>
> time1=compilation time of `import std.algorithm : find;` before split time21=compilation time of `import std.algorithm : find;` after split time22=compilation time of `import std.algorithm.searching : find;` after split
>
> my understand is that we have:
> time22 < time1 but time21 ~= time1
> so we're in no way worse than before the split
> unless time21 > time1 (noticably) in which case you have a strong argument
>
>
> On Wed, Feb 21, 2018 at 11:57 PM, Jonathan M Davis via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>> On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d wrote:
>>> ```
>>> import std.algorithm.searching : find;
>>>
>>> not
>>>
>>> import std.algorithm : find;
>>> ```
>>>
>>> that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect.
>>
>> Maybe, but the CI stuff for Phobos doesn't like that, and it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols.
>>
>> - Jonathan M Davis

Regarding import time:

Apart from the fact that you usually end up importing almost entire Phobos anyways, the import cost for importing entire Phobos is about 0.2s on my system with the upcoming 2.079 and we are working on getting it even further towards 0 which shouldn't be too hard (there's still a lot of unnecessary work being done by default).

I'm really looking forward to the point where we can just tell people to use `import std;` because there's almost no overhead.
(I am aware that there are downsides to importing everything but that's not what this discussion is about)