Thread overview
DustMite: the General-Purpose Data Reduction Tool (from the D Blog)
Apr 13, 2020
Mike Parker
Apr 13, 2020
Mike Parker
Apr 13, 2020
Andrej Mitrovic
Apr 13, 2020
Vladimir Panteleev
Apr 14, 2020
WebFreak001
Apr 15, 2020
Vladimir Panteleev
Apr 15, 2020
Walter Bright
April 13, 2020
Vladimir has contributed to the blog an article on the evolution of DustMite, looking at some of the challenges he had to overcome along the way.

The blog:
https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-reduction-tool/

Reddit:
https://www.reddit.com/r/programming/comments/g0ihse/dustmite_the_generalpurpose_data_reduction_tool/


April 13, 2020
On Monday, 13 April 2020 at 13:06:30 UTC, Mike Parker wrote:
> Vladimir has contributed to the blog an article on the evolution of DustMite, looking at some of the challenges he had to overcome along the way.
>
> The blog:
> https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-reduction-tool/
>
> Reddit:
> https://www.reddit.com/r/programming/comments/g0ihse/dustmite_the_generalpurpose_data_reduction_tool/

HN:
https://news.ycombinator.com/item?id=22855633
April 13, 2020
On Monday, 13 April 2020 at 13:06:30 UTC, Mike Parker wrote:
> Vladimir has contributed to the blog an article on the evolution of DustMite, looking at some of the challenges he had to overcome along the way.
>
> The blog:
> https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-reduction-tool/
>
> Reddit:
> https://www.reddit.com/r/programming/comments/g0ihse/dustmite_the_generalpurpose_data_reduction_tool/

I wish I had even half (or quarter?) of the ingenuity of Vladimir. Really great write-up. Dustmite helped me reduce many compiler bugs in the past and saved me a lot of time.

Thanks so much for writing Dustmite, and implementing the recent feature I've requested!
April 13, 2020
On 4/13/20 9:06 AM, Mike Parker wrote:
> Vladimir has contributed to the blog an article on the evolution of DustMite, looking at some of the challenges he had to overcome along the way.
> 
> The blog:
> https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-reduction-tool/ 
> 
> 
> Reddit:
> https://www.reddit.com/r/programming/comments/g0ihse/dustmite_the_generalpurpose_data_reduction_tool/ 
> 
> 
> 

Very nice article! I think everyone can be inspired by the tools that Vladimir writes.

Interesting from the animation that it decided that importing std.stdio can be "reduced" to importing std!

I see that you can prevent reductions via regex. How do you say "Don't reduce `std\..*` to `std`" or is that possible? In other words, I'm fine with reducing imports, but not that specific reduction.

-Steve
April 13, 2020
On Monday, 13 April 2020 at 18:53:39 UTC, Steven Schveighoffer wrote:
> Very nice article!

Thank you!

> Interesting from the animation that it decided that importing std.stdio can be "reduced" to importing std!

Yes, it's a new minor annoyance for all DustMite users :)

> I see that you can prevent reductions via regex.

Regex and similar rules are applied at input parsing time, not on the emitted output.

> How do you say "Don't reduce `std\..*` to `std`" or is that possible? In other words, I'm fine with reducing imports, but not that specific reduction.

The canonical way, right now, is to add something like `if grep -q 'import .*std[;,]' ; then exit 1 ; fi` to the test script. To make this test reusable, it can be saved to e.g. "dustmite-no-std" and DustMite invoked with `dustmite src "dustmite-no-std && ../actual-test-script.sh"`.

I don't know if it's worth it, but to make this common annoyance easier to handle without baking in more highly-D-specific stuff into a tool which aims to be general-purpose, I'm thinking of the following additions:

1. Allow more than one test command. A reduction is considered successful only if all test commands pass. (It would be the equivalent of chaining them with && in a shell command.)
2. Add built-in tests which can be used in place of a test command, such as ":d-no-std".

April 14, 2020
On Monday, 13 April 2020 at 13:06:30 UTC, Mike Parker wrote:
> Vladimir has contributed to the blog an article on the evolution of DustMite, looking at some of the challenges he had to overcome along the way.
>
> The blog:
> https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-reduction-tool/
>
> Reddit:
> https://www.reddit.com/r/programming/comments/g0ihse/dustmite_the_generalpurpose_data_reduction_tool/

very nice article! Before I was never sure what I could even use dustmite for and rarely ever used it, but having been shown all these use cases here gives me a lot of ideas for how to potentially use it.

I really like all the diagrams and animations in the blog post too, they make it a lot more intuitive to grasp what was being done. Though I think some diagrams could have used a little more labels on what the colors, shapes and numbers mean.

Also for the performance changes: what do the numbers mean in the diagram there? Is higher better? What exactly is the unit of these numbers? Should I even read it from top to bottom or from bottom to top like usual git logs? Why did it jump from 487 to 200 and is that good or bad?
April 15, 2020
On Tuesday, 14 April 2020 at 07:03:42 UTC, WebFreak001 wrote:
> very nice article!

Thank you!
> Also for the performance changes: what do the numbers mean in the diagram there? Is higher better? What exactly is the unit of these numbers? Should I even read it from top to bottom or from bottom to top like usual git logs? Why did it jump from 487 to 200 and is that good or bad?

Yes, sorry, I was going to add some explanations but forgot. They go in chronological order and going from 200 to 487 was bad (the new tree data structure added a lot of overhead, which was not yet taken advantage of). The numbers represent seconds for 200 reduction steps for reducing my test program.

The source code for all programs/scripts used to create the graphics can be found here:

https://gitlab.com/CyberShadow/dustmite-article
April 15, 2020
Please do an AMA on the Reddit article!