Thread overview
phobos src level stats
Sep 22, 2020
Bruce Carneal
Sep 22, 2020
Bruce Carneal
Sep 22, 2020
DlangUser38
Sep 23, 2020
Bruce Carneal
Sep 23, 2020
Bruce Carneal
September 22, 2020
Below you'll find category line percentages and total line counts for the 20 biggest files in phobos.

The "line" counts following the file names are of the dscanner/libdparse variety rather than the 'wc' variety.

On a 2.4GhZ zen1, libdparse managed all of phobos in a little under 1.5 seconds.  For comparison note that all files were read in a little under 10 milliseconds (from file cache).

I really enjoyed my first interaction with libdparse but I'm guessing that the maintainers there strongly favor clarity/correctness over speed.  One other note, compiling with dub's --combined option cut the execution time of the ldc2/release exe by about 2X.


total bytes 10918340
empty 18%   comments  9%   docs 17%   utst 32%   src 24% range/package.d, 9610
empty 18%   comments  3%   docs 12%   utst 54%   src 13% datetime/systime.d, 9351
empty 18%   comments  3%   docs 16%   utst 52%   src 11% datetime/date.d, 8496
empty 12%   comments 10%   docs 19%   utst 18%   src 41% uni/package.d, 8335
empty 21%   comments  3%   docs 34%   utst 37%   src  6% datetime/interval.d, 8215
empty 13%   comments 13%   docs 17%   utst 24%   src 35% math.d, 7679
empty 16%   comments  8%   docs 10%   utst 36%   src 30% format.d, 6920
empty 16%   comments  7%   docs 16%   utst 42%   src 19% traits.d, 6868
empty 17%   comments 11%   docs 18%   utst 35%   src 20% typecons.d, 6784
empty 16%   comments  4%   docs 19%   utst 32%   src 29% string.d, 5442
empty 19%   comments  9%   docs 14%   utst 35%   src 23% algorithm/iteration.d, 5235
empty 16%   comments 11%   docs 11%   utst 39%   src 23% conv.d, 4939
empty 16%   comments  7%   docs 25%   utst 22%   src 30% stdio.d, 4396
empty 14%   comments  6%   docs 43%   utst  6%   src 31% net/curl.d, 4167
empty 16%   comments  8%   docs 21%   utst 29%   src 26% algorithm/searching.d, 4074
empty 14%   comments  9%   docs 18%   utst 22%   src 36% algorithm/sorting.d, 4073
empty 20%   comments  4%   docs 23%   utst 25%   src 28% file.d, 4071
empty 17%   comments  8%   docs 16%   utst 38%   src 22% array.d, 3601
empty 20%   comments  9%   docs 29%   utst 12%   src 30% parallelism.d, 3594
empty 20%   comments  4%   docs 14%   utst 43%   src 19% bitmanip.d, 3457

September 22, 2020
On Tuesday, 22 September 2020 at 20:53:17 UTC, Bruce Carneal wrote:
> Below you'll find category line percentages and total line counts for the 20 biggest files in phobos.
>
> The "line" counts following the file names are of the dscanner/libdparse variety rather than the 'wc' variety.
>
> On a 2.4GhZ zen1, libdparse managed all of phobos in a little under 1.5 seconds.  For comparison note that all files were read in a little under 10 milliseconds (from file cache).
>
> I really enjoyed my first interaction with libdparse but I'm guessing that the maintainers there strongly favor clarity/correctness over speed.  One other note, compiling with dub's --combined option cut the execution time of the ldc2/release exe by about 2X.
>

The empty line numbers seem a little high to me.  I may have a bug in the code for that:

ulong countEmptyLines(string rawText) @nogc nothrow pure @safe
{
    ulong empties;
    lineLoop: foreach (line; lineSplitter(rawText))
    {
        foreach_reverse (ch; line)
            if (ch != ' ' && ch != '\t')
                continue lineLoop;
        ++empties;
    }
    return empties;
}

September 22, 2020
On Tuesday, 22 September 2020 at 21:01:17 UTC, Bruce Carneal wrote:
> The empty line numbers seem a little high to me.  I may have a bug in the code for that:
>
> ulong countEmptyLines(string rawText) @nogc nothrow pure @safe
> {
>     ulong empties;
>     lineLoop: foreach (line; lineSplitter(rawText))
>     {
>         foreach_reverse (ch; line)
>             if (ch != ' ' && ch != '\t')
>                 continue lineLoop;
>         ++empties;
>     }
>     return empties;
> }

you can count empty lines using a sliding window of two token over the token range.
The difference between the two token position give empty line. string literal and comments require a special processing but otherwise this is quite straightforward to implement.
September 23, 2020
On Tuesday, 22 September 2020 at 23:22:42 UTC, DlangUser38 wrote:
> On Tuesday, 22 September 2020 at 21:01:17 UTC, Bruce Carneal wrote:
>> The empty line numbers seem a little high to me.  I may have a bug in the code for that:
>>
>> ulong countEmptyLines(string rawText) @nogc nothrow pure @safe
>> {
>>     ulong empties;
>>     lineLoop: foreach (line; lineSplitter(rawText))
>>     {
>>         foreach_reverse (ch; line)
>>             if (ch != ' ' && ch != '\t')
>>                 continue lineLoop;
>>         ++empties;
>>     }
>>     return empties;
>> }
>
> you can count empty lines using a sliding window of two token over the token range.
> The difference between the two token position give empty line. string literal and comments require a special processing but otherwise this is quite straightforward to implement.

So, a way to stay in "token space" then.  Don't see a problem with the above but will note for future apps that I need not drop back to raw text.

September 23, 2020
On Wednesday, 23 September 2020 at 00:48:16 UTC, Bruce Carneal wrote:
> On Tuesday, 22 September 2020 at 23:22:42 UTC, DlangUser38 wrote:
>> On Tuesday, 22 September 2020 at 21:01:17 UTC, Bruce Carneal wrote:
>>> [...]
>>
>> you can count empty lines using a sliding window of two token over the token range.
>> The difference between the two token position give empty line. string literal and comments require a special processing but otherwise this is quite straightforward to implement.
>
> So, a way to stay in "token space" then.  Don't see a problem with the above but will note for future apps that I need not drop back to raw text.

Ah, the problem would be empty lines within docs or comments that are counted when they've already been accounted for in the 'docs' and 'comments' sections.  I'll revise the code.