Thread overview
language feature usage statistics
Oct 18, 2019
aliak
Oct 18, 2019
Les De Ridder
Oct 18, 2019
Adam D. Ruppe
Oct 18, 2019
Paolo Invernizzi
Oct 18, 2019
Dennis
Oct 20, 2019
aliak
Oct 20, 2019
Dennis
Oct 20, 2019
matheus
Oct 21, 2019
aliak
October 18, 2019
So this is something I've been wondering about for a while. And I don't believe I've seen any compiler with this, but have people ever thought about putting telemetry in compilers? How do people feel about having feature usage stats from dmd?

I can understand that networking from a compiler just sounds bad, but, there're other ways around it. e.g. write a file instead and ask dev to email it, or ask for permission before turning it on and send it, only do it in debug mode, I dunno, just spit balling here.

But, having actual usage statistics will take away so many assumptions people have about how features are used, how often they're used, which features are not used, etc (of course if there're no statistics on a feature it doesn't mean it's never used). Data like this is very actionable - and is how any (probably non-enterprise) product is built these days (even vscode for example has an option to send usage stats).

Crash reporting is another thing. When the compiler crashes, that can be sent somewhere (again, with user permission).

Things that can be answered:
* which feature is not used and can be cut
* which feature is used the most and should be enhanced, fixed, polished
* which combination of features are used together => can they be unified?

The next time someone says they don't think lazy is useful, we can point to actual data.

And then for example, from the features that are hardly used, we can start asking why they are not used. If we know why then future features that may contain the same base assumptions that led to the creation of the unused features can be avoided.

Figuring out why the features are unused, or hardly used, can also better enable us to make the feature usable.

These kind of stats can also be collected on any symbols that are loaded from std for eg, and then we can also get a feel for which functions and modules are used from phobos.

Anyway, I'm not sure about others, but if it'd make D a better language than the competition, I'd gladly trust dmd to send stats to a place the d language foundation controls.

Cheers
- ali



October 18, 2019
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
> So this is something I've been wondering about for a while. And I don't believe I've seen any compiler with this, but have people ever thought about putting telemetry in compilers? How do people feel about having feature usage stats from dmd?
>
> [...]

I vaguely remember there being a tool that generated such statistics
from the source code of packages registered on code.dlang.org, but I
might be mistaken.

October 18, 2019
I've actually considered doing this with dpldocs.info before. It would only hit public code but... I have copies of basically the whole dub repo and code that is already custom parsing it so I could possibly pull info like this when it does its updates.

Though my parser doesn't always keep up with new features (I often skip function bodies since it isn't super important for documentation purposes) it still mostly works.
October 18, 2019
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
> So this is something I've been wondering about for a while. And I don't believe I've seen any compiler with this, but have people ever thought about putting telemetry in compilers? How do people feel about having feature usage stats from dmd?
>
> [...]

Hear, hear!

+1

October 18, 2019
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
> But, having actual usage statistics will take away so many assumptions people have about how features are used, how often they're used, which features are not used, etc

Totally. I've started toying with this by cloning all packages on Dub and running libdparse over it. Turns out a shallow clone takes only ~4Gb total, and a deep clone ~7Gb I believe. I've already used it a few times to support my cases.

I made this: https://gist.github.com/dkorpel/10cc13d0740c50a8aab30588f392950f
For this: https://github.com/dlang/DIPs/blob/9ca12cc89dadc10f2abfb8a98bf4d52ed8679c2a/DIPs/DIP1NNN-DK.md

I made this: https://gist.github.com/dkorpel/df2c2f567588bb8ee59e293146e52723
For this: https://github.com/dlang/dmd/pull/10236

These were bodged together, but I plan to make something more general and polished once I allocate some time for it. Building telemetry options in DMD is something I don't plan to do, but if someone else champions that I'd be in favor!

October 20, 2019
On Friday, 18 October 2019 at 20:53:50 UTC, Dennis wrote:
> On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
>> [...]
>
> Totally. I've started toying with this by cloning all packages on Dub and running libdparse over it. Turns out a shallow clone takes only ~4Gb total, and a deep clone ~7Gb I believe. I've already used it a few times to support my cases.
>
> I made this: https://gist.github.com/dkorpel/10cc13d0740c50a8aab30588f392950f
> For this: https://github.com/dlang/DIPs/blob/9ca12cc89dadc10f2abfb8a98bf4d52ed8679c2a/DIPs/DIP1NNN-DK.md
>
> I made this: https://gist.github.com/dkorpel/df2c2f567588bb8ee59e293146e52723
> For this: https://github.com/dlang/dmd/pull/10236
>
> These were bodged together, but I plan to make something more general and polished once I allocate some time for it. Building telemetry options in DMD is something I don't plan to do, but if someone else champions that I'd be in favor!

That is great! Which APIs did you use to get all d project links? Does dub provide something? And curious, were you rate limited by github (i'm assuming this was the work of a for loop?).
October 20, 2019
On Sunday, 20 October 2019 at 21:29:24 UTC, aliak wrote:
> Which APIs did you use to get all d project links? Does dub provide something?

There might be an API, but I simply parsed the html pages.
First I get the identifiers of all packages:
```
import std.net.curl;
string page = get("http://code.dlang.org/?sort=added&category=&skip=0&limit=2000").idup;
string[] result;
foreach(m; page.matchAll(regex(`packages/([a-zA-Z0-9_-]+)`)))
    result ~= m[1];
```

Then I parse the package pages for the repository link with htmld:

```
import html; // http://code.dlang.org/packages/htmld
string getRepo(string packageName) {
    string page = get("http://code.dlang.org/packages/"~packageName).idup;
    auto doc = createDocument(page);
    if (auto p = doc.querySelector("#repository")) {
        if (auto m = p.html.matchFirst(`href="([^"]+)`)) {
            return m[1].text;
        }
    }
}
```

> And curious, were you rate limited by github (i'm assuming this was the work of a for loop?).

I wouldn't have been surprised if I got a timeout for cloning 1600 repositories in succession, but I didn't. (I suppose the same happens when installing your average NPM package, lol)
October 20, 2019
On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
> ...
> I don't believe I've seen any compiler with this, but have people ever thought about putting telemetry in compilers?
> ...

I pretty sure Visual Studio does this: https://code.visualstudio.com/docs/getstarted/telemetry

Matheus.


October 21, 2019
On Sunday, 20 October 2019 at 22:08:46 UTC, matheus wrote:
> On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
>> ...
>> I don't believe I've seen any compiler with this, but have people ever thought about putting telemetry in compilers?
>> ...
>
> I pretty sure Visual Studio does this: https://code.visualstudio.com/docs/getstarted/telemetry
>
> Matheus.

Aye, VSCode does this (i mentioned it actually in my original post):

On Friday, 18 October 2019 at 19:53:46 UTC, aliak wrote:
>
> (probably non-enterprise) product is built these days (even vscode for example has an option to send usage stats).
>