Thread overview
[improve-it] Parsing NG archive and sorting by post-count
Mar 15, 2011
Andrej Mitrovic
Mar 15, 2011
bearophile
Mar 15, 2011
Andrej Mitrovic
Mar 15, 2011
Andrej Mitrovic
Mar 15, 2011
bearophile
Mar 16, 2011
Andrej Mitrovic
March 15, 2011
I thought about making a kind of code-golf contest (stackoverflow usually has these contests). Only I would focus on improving each others code.

So here's my idea of the day: Parse the newsgroup archive files from http://www.digitalmars.com/NewsGroup.html, and for each .html file output another .html file which has a list of topics sorted in post count order. Sure, there is NG software which does this automatically. But this is about doing it in D.

Here's my implementation: https://gist.github.com/871631

Download a few .html files, save them in their own folder. Then copy my script into a .d file in the same folder, and just run it with RDMD. It will output the files in a `output`subfolder. It works on Windows, since that's all I've tested it with.

There's a few things I've noticed: Using just a simple hash with the post count as the Key type wouldn't work. There are many topics which have the same post count number, and AA's can't hold duplicates. So I worked around this by making a wrapper which hides all the details of storing duplicates and traversal, I've called it `CommonAA`.

I've also implemented an `allSatisfy` function which works on runtime arguments. There's a similar function in std.typetuple, but its only useful for compile-time arguments. There's probably a similar method someplace in std.algorithm, but I was too lazy to check. I thought it would be nice to have.

I can see some ways to improve this. For one, I could have used Regex instead of indexOf. I could have also tried to avoid using a wrapper, however I haven't figured out a way to do this while having duplicate key types and having to sort them while keeping the Key types linked to the Values.

Anywho, let's see you improve my code! It's just for fun and maybe we'll learn some tricks from one another. Have fun!
March 15, 2011
Andrej Mitrovic:

> I've also implemented an `allSatisfy` function which works on runtime arguments. There's a similar function in std.typetuple, but its only useful for compile-time arguments. There's probably a similar method someplace in std.algorithm, but I was too lazy to check. I thought it would be nice to have.

http://d.puremagic.com/issues/show_bug.cgi?id=4405


> Anywho, let's see you improve my code! It's just for fun and maybe we'll learn some tricks from one another. Have fun!

I suggest you to add unit tests and Contracts to your CommonAA() and allSatisfy() :-)

Have you tried to replace this:

        if (key in payload)
        {
            payload[key] ~= val;
        }
        else
        {
            payload[key] = [val];
        }

With just:

        payload[key] ~= val;


I suggest to replace this:
sortedKeys.sort;

With:
sortedKeys.sort();

Bye,
bearophile
March 15, 2011
On 3/15/11, bearophile <bearophileHUGS@lycos.com> wrote:
> Andrej Mitrovic:
>
>> I've also implemented an `allSatisfy` function which works on runtime arguments. There's a similar function in std.typetuple, but its only useful for compile-time arguments. There's probably a similar method someplace in std.algorithm, but I was too lazy to check. I thought it would be nice to have.
>
> http://d.puremagic.com/issues/show_bug.cgi?id=4405

Cool, I was afraid I was reinventing the wheel.

> I suggest you to add unit tests and Contracts to your CommonAA() and
> allSatisfy() :-)

allSatisfy definitely doesn't work for a bunch of cases, like passing a delegate instead of a literal. And CommonAA doesn't take into account things like removing elements, etc. It's definitely a half-ass implementation. :p

>
> Have you tried to replace this:
>
>         if (key in payload)
>         {
>             payload[key] ~= val;
>         }
>         else
>         {
>             payload[key] = [val];
>         }
>
> With just:
>
>         payload[key] ~= val;
>

Good catch. Since the value type is an array I could simply append to it. Although one didn't exist yet, so I figure I had to assign something to an empty spot in an AA. Oh well..

>
> I suggest to replace this:
> sortedKeys.sort;
>
> With:
> sortedKeys.sort();
>

Yes, I prefer it that way too. Since DMD doesn't complain about it (is sort even a property?), I missed it.

Thanks for the input.
March 15, 2011
On 3/15/11, Andrej Mitrovic <andrej.mitrovich@gmail.com> wrote:
>
>>
>> I suggest to replace this:
>> sortedKeys.sort;
>>
>> With:
>> sortedKeys.sort();
>>
>
> Yes, I prefer it that way too.

Correction: DMD complains about having parentheses, in fact it's an error: ngparser.d(28): Error: undefined identifier module ngparser.sort

So I've had to remove them. And again that's that uninformative error message which I don't like.
March 15, 2011
Andrej Mitrovic:

> Correction: DMD complains about having parentheses, in fact it's an error: ngparser.d(28): Error: undefined identifier module ngparser.sort
> 
> So I've had to remove them. And again that's that uninformative error message which I don't like.

Sorry, this time the uninformative text was mine :-) When I have suggested you to add the () after the sort, I meant to suggest you to use the std.algorithm sort instead of the deprecated built-in one, because the built-in one is slow and it has bad bugs, like this one I've found:
http://d.puremagic.com/issues/show_bug.cgi?id=2819

Bye,
bearophile
March 16, 2011
On 3/16/11, bearophile <bearophileHUGS@lycos.com> wrote:
> I meant to suggest you to use the
> std.algorithm sort instead of the deprecated built-in one, because the
> built-in one is slow and it has bad bugs, like this one I've found:
> http://d.puremagic.com/issues/show_bug.cgi?id=2819

Thanks, I didn't know about the bugs. .