Thread overview | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
February 22, 2018 Re: what are guidelines for when to split a module into a package? | ||||
---|---|---|---|---|
| ||||
On Wednesday, February 21, 2018 23:13:33 Timothee Cour via Digitalmars-d wrote: > from my perspective it makes sense to split a module M into submodules > A, B when: > * M is large > * there's little interaction between A and B (eg only few symbols from > A are needed in B and vice versa) > * A and B are logically grouped (that is domain specific) > * it doesn't turn into an extreme (1 function per module) > > Advantages of splitting: > * easier to review > * easier to edit (no need to scroll much to see entirety of module > we're editing) > * less pollution from top-level imports as they only affect submodule > (likewise with top-level attributes etc) > * more modular > * doesn't affect existing code since `import M` will continue to work > after M is split into a package > * less memory when using separate compilation > * allows fine-grained compiler options (eg we can compile B with `-O` if > needed) * allows to run unittests just for A instead of M > * allows selective import in client to avoid pulling in too many > dependencies (see arguments that were made for std.range.primitives) > > Disadvantages of splitting: > * more files; but not sure why that's a problem so long we don't go > into extremes (eg 1 function per module or other things of bad taste) > > --- > while working on https://github.com/dlang/phobos/pull/6178 I had > initially split M:std.array into submodules: > A:std.array.util (the old std.array) and B:std.array.static_array > (everything added in the PR) > IMO this made sense according to my above criteria (in this case there > was 0 interaction between A and B), but the reviewers disagreed with > the split. > > So, what are the guidelines? It's decided on a case-by-case basis but is generally only done if the module is quite large. std.array is not particularly large. It's less than 4000 lines, including unit tests and documentation, and it only has 18 top-level symbols. Also, remember that within Phobos, imports are supposed to be as localized as possible - both in terms of where the import is placed and in terms of selective imports - e.g. it would be import std.algorithm.searching : find; not import std.algorithm : find; which means that splitting the module then requires that all of those imports be even more specific. User code can choose to do that or not, but it does make having modules split up further that much more tedious. Related to that is the fact that anyone searching for these symbols now has more modules to search through. So, finding symbols will be harder. Take std.algorithm for instance. It was split, because it was getting large enough that compiling it on machines without large amounts of memory resulted in the compiler running out of memory. So, there was a very good argument for splitting it. However, now, even if you know that a symbol is in std.algorithm, do you know where in std.algorithm it is? Some are obvious - e.g. sort is in std.algorithm.sorting. However, others are not so obviously - e.g. where does startsWith live? Arguably, it could go in either std.algorithm.comparison or std.algorithm.searching. It turns out that it's in std.algorithm.searching, but I generally have to look it up. And where to functions like map or filter live? std.algorithm.mutation? std.algorithm.iteration? It's not necessarily obvious at all. >From the perspective of users trying to find stuff, splitting modules up comes at a real cost, and I honestly don't understand why some folks are in a hurry to make module really small. That means more import statements when using those modules, and it means that it's harder to find symbols. Personally, I think that we should be very slow to consider splitting modules and only do so when it's clear that there's a real need, and std.array is nowhere near that level. - Jonathan M Davis |
February 21, 2018 Re: what are guidelines for when to split a module into a package? | ||||
---|---|---|---|---|
| ||||
> it's harder to find symbols i don't understand this argument. ``` dscanner --declaration startsWith ./std/algorithm/searching.d(4105:6) ./std/algorithm/searching.d(4195:6) ./std/algorithm/searching.d(4265:6) ./std/algorithm/searching.d(4301:6) ``` On Wed, Feb 21, 2018 at 11:31 PM, Jonathan M Davis via Digitalmars-d <digitalmars-d@puremagic.com> wrote: > On Wednesday, February 21, 2018 23:13:33 Timothee Cour via Digitalmars-d wrote: >> from my perspective it makes sense to split a module M into submodules >> A, B when: >> * M is large >> * there's little interaction between A and B (eg only few symbols from >> A are needed in B and vice versa) >> * A and B are logically grouped (that is domain specific) >> * it doesn't turn into an extreme (1 function per module) >> >> Advantages of splitting: >> * easier to review >> * easier to edit (no need to scroll much to see entirety of module >> we're editing) >> * less pollution from top-level imports as they only affect submodule >> (likewise with top-level attributes etc) >> * more modular >> * doesn't affect existing code since `import M` will continue to work >> after M is split into a package >> * less memory when using separate compilation >> * allows fine-grained compiler options (eg we can compile B with `-O` if >> needed) * allows to run unittests just for A instead of M >> * allows selective import in client to avoid pulling in too many >> dependencies (see arguments that were made for std.range.primitives) >> >> Disadvantages of splitting: >> * more files; but not sure why that's a problem so long we don't go >> into extremes (eg 1 function per module or other things of bad taste) >> >> --- >> while working on https://github.com/dlang/phobos/pull/6178 I had >> initially split M:std.array into submodules: >> A:std.array.util (the old std.array) and B:std.array.static_array >> (everything added in the PR) >> IMO this made sense according to my above criteria (in this case there >> was 0 interaction between A and B), but the reviewers disagreed with >> the split. >> >> So, what are the guidelines? > > It's decided on a case-by-case basis but is generally only done if the module is quite large. std.array is not particularly large. It's less than 4000 lines, including unit tests and documentation, and it only has 18 top-level symbols. > > Also, remember that within Phobos, imports are supposed to be as localized as possible - both in terms of where the import is placed and in terms of selective imports - e.g. it would be > > import std.algorithm.searching : find; > > not > > import std.algorithm : find; > > which means that splitting the module then requires that all of those imports be even more specific. User code can choose to do that or not, but it does make having modules split up further that much more tedious. Related to that is the fact that anyone searching for these symbols now has more modules to search through. So, finding symbols will be harder. Take std.algorithm for instance. It was split, because it was getting large enough that compiling it on machines without large amounts of memory resulted in the compiler running out of memory. So, there was a very good argument for splitting it. However, now, even if you know that a symbol is in std.algorithm, do you know where in std.algorithm it is? Some are obvious - e.g. sort is in std.algorithm.sorting. However, others are not so obviously - e.g. where does startsWith live? Arguably, it could go in either std.algorithm.comparison or std.algorithm.searching. It turns out that it's in std.algorithm.searching, but I generally have to look it up. And where to functions like map or filter live? std.algorithm.mutation? std.algorithm.iteration? It's not necessarily obvious at all. > > From the perspective of users trying to find stuff, splitting modules up comes at a real cost, and I honestly don't understand why some folks are in a hurry to make module really small. That means more import statements when using those modules, and it means that it's harder to find symbols. > > Personally, I think that we should be very slow to consider splitting modules and only do so when it's clear that there's a real need, and std.array is nowhere near that level. > > - Jonathan M Davis > > |
February 21, 2018 Re: what are guidelines for when to split a module into a package? | ||||
---|---|---|---|---|
| ||||
```
import std.algorithm.searching : find;
not
import std.algorithm : find;
```
that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect.
On Wed, Feb 21, 2018 at 11:44 PM, Timothee Cour <thelastmammoth@gmail.com> wrote:
>> it's harder to find symbols
>
> i don't understand this argument.
>
> ```
> dscanner --declaration startsWith
> ./std/algorithm/searching.d(4105:6)
> ./std/algorithm/searching.d(4195:6)
> ./std/algorithm/searching.d(4265:6)
> ./std/algorithm/searching.d(4301:6)
> ```
>
>
> On Wed, Feb 21, 2018 at 11:31 PM, Jonathan M Davis via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>> On Wednesday, February 21, 2018 23:13:33 Timothee Cour via Digitalmars-d wrote:
>>> from my perspective it makes sense to split a module M into submodules
>>> A, B when:
>>> * M is large
>>> * there's little interaction between A and B (eg only few symbols from
>>> A are needed in B and vice versa)
>>> * A and B are logically grouped (that is domain specific)
>>> * it doesn't turn into an extreme (1 function per module)
>>>
>>> Advantages of splitting:
>>> * easier to review
>>> * easier to edit (no need to scroll much to see entirety of module
>>> we're editing)
>>> * less pollution from top-level imports as they only affect submodule
>>> (likewise with top-level attributes etc)
>>> * more modular
>>> * doesn't affect existing code since `import M` will continue to work
>>> after M is split into a package
>>> * less memory when using separate compilation
>>> * allows fine-grained compiler options (eg we can compile B with `-O` if
>>> needed) * allows to run unittests just for A instead of M
>>> * allows selective import in client to avoid pulling in too many
>>> dependencies (see arguments that were made for std.range.primitives)
>>>
>>> Disadvantages of splitting:
>>> * more files; but not sure why that's a problem so long we don't go
>>> into extremes (eg 1 function per module or other things of bad taste)
>>>
>>> ---
>>> while working on https://github.com/dlang/phobos/pull/6178 I had
>>> initially split M:std.array into submodules:
>>> A:std.array.util (the old std.array) and B:std.array.static_array
>>> (everything added in the PR)
>>> IMO this made sense according to my above criteria (in this case there
>>> was 0 interaction between A and B), but the reviewers disagreed with
>>> the split.
>>>
>>> So, what are the guidelines?
>>
>> It's decided on a case-by-case basis but is generally only done if the module is quite large. std.array is not particularly large. It's less than 4000 lines, including unit tests and documentation, and it only has 18 top-level symbols.
>>
>> Also, remember that within Phobos, imports are supposed to be as localized as possible - both in terms of where the import is placed and in terms of selective imports - e.g. it would be
>>
>> import std.algorithm.searching : find;
>>
>> not
>>
>> import std.algorithm : find;
>>
>> which means that splitting the module then requires that all of those imports be even more specific. User code can choose to do that or not, but it does make having modules split up further that much more tedious. Related to that is the fact that anyone searching for these symbols now has more modules to search through. So, finding symbols will be harder. Take std.algorithm for instance. It was split, because it was getting large enough that compiling it on machines without large amounts of memory resulted in the compiler running out of memory. So, there was a very good argument for splitting it. However, now, even if you know that a symbol is in std.algorithm, do you know where in std.algorithm it is? Some are obvious - e.g. sort is in std.algorithm.sorting. However, others are not so obviously - e.g. where does startsWith live? Arguably, it could go in either std.algorithm.comparison or std.algorithm.searching. It turns out that it's in std.algorithm.searching, but I generally have to look it up. And where to functions like map or filter live? std.algorithm.mutation? std.algorithm.iteration? It's not necessarily obvious at all.
>>
>> From the perspective of users trying to find stuff, splitting modules up comes at a real cost, and I honestly don't understand why some folks are in a hurry to make module really small. That means more import statements when using those modules, and it means that it's harder to find symbols.
>>
>> Personally, I think that we should be very slow to consider splitting modules and only do so when it's clear that there's a real need, and std.array is nowhere near that level.
>>
>> - Jonathan M Davis
>>
>>
|
February 22, 2018 Re: what are guidelines for when to split a module into a package? | ||||
---|---|---|---|---|
| ||||
On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d wrote:
> ```
> import std.algorithm.searching : find;
>
> not
>
> import std.algorithm : find;
> ```
>
> that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect.
Maybe, but the CI stuff for Phobos doesn't like that, and it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols.
- Jonathan M Davis
|
February 22, 2018 Re: what are guidelines for when to split a module into a package? | ||||
---|---|---|---|---|
| ||||
On Wednesday, February 21, 2018 23:44:49 Timothee Cour via Digitalmars-d wrote:
> > it's harder to find symbols
>
> i don't understand this argument.
>
> ```
> dscanner --declaration startsWith
> ./std/algorithm/searching.d(4105:6)
> ./std/algorithm/searching.d(4195:6)
> ./std/algorithm/searching.d(4265:6)
> ./std/algorithm/searching.d(4301:6)
> ```
I have no intention whatsoever of using dscanner. I can do something similar with grep, but either way, that doesn't help anyone who's actually reading the documentation and trying to find stuff that way, which is often what people do.
Regardless, the more split up stuff is, the more places that you have to look for it. We already have plenty of complaints about folks not knowing whether something is in std.algorithm, std.array, std.range, and std.string without getting into issues of sub-modules.
And I really don't understand why splitting up a module like std.array would help anything except maybe import times, but that only matters if the module is large (which std.array definitely isn't) and if you're importing the modules as specifically as possible and not using public imports through something like std.array.package.
- Jonathan M Davis
|
February 22, 2018 Re: what are guidelines for when to split a module into a package? | ||||
---|---|---|---|---|
| ||||
> it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols time1=compilation time of `import std.algorithm : find;` before split time21=compilation time of `import std.algorithm : find;` after split time22=compilation time of `import std.algorithm.searching : find;` after split my understand is that we have: time22 < time1 but time21 ~= time1 so we're in no way worse than before the split unless time21 > time1 (noticably) in which case you have a strong argument On Wed, Feb 21, 2018 at 11:57 PM, Jonathan M Davis via Digitalmars-d <digitalmars-d@puremagic.com> wrote: > On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d wrote: >> ``` >> import std.algorithm.searching : find; >> >> not >> >> import std.algorithm : find; >> ``` >> >> that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect. > > Maybe, but the CI stuff for Phobos doesn't like that, and it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols. > > - Jonathan M Davis > |
February 22, 2018 Re: what are guidelines for when to split a module into a package? | ||||
---|---|---|---|---|
| ||||
> that doesn't help anyone who's actually reading the documentation and trying to find stuff that way how about the following fix for that: having a DDOC token on a package.d to indicate merging the submodules in the documentation, eg: ``` /// MERGE_SUBMODULES std/aglorithm/package.d ``` when user browses to https://dlang.org/phobos/std_algorithm.html, he would see the DDOC contents from all direct submodules of std/aglorithm/ right there in that ddoc page. On Thu, Feb 22, 2018 at 12:04 AM, Timothee Cour <thelastmammoth@gmail.com> wrote: >> it actually does reduce compilation times if the imports go directly to the module in > question rather than to a module that publicly imports the symbols > > time1=compilation time of `import std.algorithm : find;` before split time21=compilation time of `import std.algorithm : find;` after split time22=compilation time of `import std.algorithm.searching : find;` after split > > my understand is that we have: > time22 < time1 but time21 ~= time1 > so we're in no way worse than before the split > unless time21 > time1 (noticably) in which case you have a strong argument > > > On Wed, Feb 21, 2018 at 11:57 PM, Jonathan M Davis via Digitalmars-d <digitalmars-d@puremagic.com> wrote: >> On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d wrote: >>> ``` >>> import std.algorithm.searching : find; >>> >>> not >>> >>> import std.algorithm : find; >>> ``` >>> >>> that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect. >> >> Maybe, but the CI stuff for Phobos doesn't like that, and it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols. >> >> - Jonathan M Davis >> |
February 22, 2018 Re: what are guidelines for when to split a module into a package? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timothee Cour | On Thursday, 22 February 2018 at 08:04:19 UTC, Timothee Cour wrote:
>> it actually does reduce compilation times if the imports go directly to the module in
> question rather than to a module that publicly imports the symbols
>
> time1=compilation time of `import std.algorithm : find;` before split time21=compilation time of `import std.algorithm : find;` after split time22=compilation time of `import std.algorithm.searching : find;` after split
>
> my understand is that we have:
> time22 < time1 but time21 ~= time1
> so we're in no way worse than before the split
> unless time21 > time1 (noticably) in which case you have a strong argument
>
>
> On Wed, Feb 21, 2018 at 11:57 PM, Jonathan M Davis via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>> On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d wrote:
>>> ```
>>> import std.algorithm.searching : find;
>>>
>>> not
>>>
>>> import std.algorithm : find;
>>> ```
>>>
>>> that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect.
>>
>> Maybe, but the CI stuff for Phobos doesn't like that, and it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols.
>>
>> - Jonathan M Davis
Regarding import time:
Apart from the fact that you usually end up importing almost entire Phobos anyways, the import cost for importing entire Phobos is about 0.2s on my system with the upcoming 2.079 and we are working on getting it even further towards 0 which shouldn't be too hard (there's still a lot of unnecessary work being done by default).
I'm really looking forward to the point where we can just tell people to use `import std;` because there's almost no overhead.
(I am aware that there are downsides to importing everything but that's not what this discussion is about)
|
Copyright © 1999-2021 by the D Language Foundation