View mode: basic / threaded / horizontal-split · Log in · Help
May 27, 2013
More on "Component Programming"
This simple task on Rosettacode site is useful to show some uses 
of Phobos and the "component programming" recently discussed by 
Walter (other languages use a different name to denote the same 
idea).

Given a dictionary file of different words, it asks to find any 
of the longest anagram pairs, that also share no equal chars in 
the same position (so they are named deranged anagrams):

http://rosettacode.org/wiki/Anagrams/Deranged_anagrams#D

There are many ways to do this in D+Phobos. The following 
solution is long, but it's quite fast (the "warmed up" run-time 
is only about 0.03 seconds with a dictionary of about 200 KB, on 
an old CPU core), I have chosen it over simple solutions because 
it gives me a chance to discuss certain things:



import std.stdio, std.file, std.algorithm, std.string,
       std.typecons, std.range, std.functional;

auto findDeranged(in string[] words) pure /*nothrow*/ {
    //return words.pairwise.filter!(ww=> ww[].zip.all!q{a[0] != 
a[1]});
    Tuple!(string, string)[] result;
    foreach (immutable i, const w1; words)
        foreach (const w2; words[i + 1 .. $])
            if (zip(w1, w2).all!q{ a[0] != a[1] })
                result ~= tuple(w1, w2);
    return result;
}

void main() {
    Appender!(string[])[30] wClasses;
    foreach (word; 
std.algorithm.splitter("unixdict.txt".readText))
        wClasses[$ - word.length] ~= word;

    "Longest deranged anagrams:".writeln;
    foreach (words; wClasses[].map!q{ a.data 
}.filter!(not!empty)) {
        string[][const ubyte[]] anags; // Assume ASCII input.
        foreach (w; words)
            anags[w.dup.representation.sort().release.idup] ~= w;
        auto pairs = anags.byValue.map!findDeranged.join;
        if (!pairs.empty)
            return writefln("  %s, %s", pairs.front[]);
    }
}


- - - - - - - - - - - -

That program contains five foreach loops. Foreach loops are not 
evil and I like them, but for a certain kind of programming 
(discussed recently by Walter, and also common in F# and other 
languages) every time you use a for/foreach it's one small 
"failure" for the standard library :-)

The following weird (untested and maybe buggy) program replaces 
all the foreach loops with higher order functions and other 
library functions. It can't be compiled because it uses some 
things not yet present in Phobos (on the Rosettacode page there 
is also a slower and simpler D solution of this problem that uses 
only one foreach):


void main() {
    import std.stdio, std.file, std.algorithm, std.string,
           std.typecons, std.range, std.functional;

    "unixdict.txt"
    .readText
    .splitter
    .classify!q{ a.length }
    .map!q{ a.values } // .byValue is almost OK.
    .array
    .schwartzSort!q{ -a[0].length }
    .release
    .map!(words => words
                   .classify!q{ a
                                .dup
                                .representation
                                .sort()
                                .release
                                .idup }
                   .byValue
                   .map!(words => words
                                  .pairwise
                                  .filter!(ww => ww[]
                                                 .zip
                                                 .all!q{ a[0] != 
a[1] }))
                   .join)
    .filter(not!empty)
    .front[]
    .binaryReverseArgs!writefln("  %s, %s");
}


A copy of the same code if the newsgroup has messed up the 
formatting and indents, turning that code into a soup:
http://codepad.org/L4TyDkcQ


I am not suggesting you to write whole D script-like programs in 
this strange style. But I think Phobos should offer all the tools 
to write a program like this, because even if you don't want to 
write a whole little program in this style, you sometimes want to 
use some parts of it or some other parts of it, so I think all 
the most common and distinct micro-patterns should be contained 
in Phobos.

- - - - - - - - - - - -

"binaryReverseArgs" is in the std.functional module. Here it 
allows the use of writefln in UFCS style, inverting the 
formatting string position. I think I'd like a shorter and more 
handy name for it. In Haskell it's named "flip", and its usage is 
not uncommon.

- - - - - - - - - - - -

"classify" is a simple function, that given a forward range of T 
and an optional function T->K, returns an associative array 
T[][K]. (Arrays are used by default as values. But maybe you can 
optionally specify a different type of values, like Appenders, 
Arrays, sets, etc). (Currently in Phobos the only function to 
build an associative array is std.array.assocArray, but here we 
need something different). 
(http://d.puremagic.com/issues/show_bug.cgi?id=5502 ).

[1, 7, 6, 3, 2].classify!(x => x % 2 ? "odd": "even").writeln;

==>
["odd": [1, 7, 3], "even": [6, 2]]

- - - - - - - - - - - -

"pairwise" is a very useful lazy range similar to 
cartesianProduct, but it yields only the ordered pairs, so they 
cover only about half (a triangle) of the square matrix of the 
possibilities. 
(http://d.puremagic.com/issues/show_bug.cgi?id=6788 ).


This simple example shows the difference:

import std.stdio, std.algorithm;
void main() {
    auto data = [1, 2, 3, 4];
    foreach (xy; cartesianProduct(data, data))
        writeln(xy);
}


Generates the tuples:
(1, 1)
(2, 1)
(3, 1)
(4, 1)
(1, 2)
(2, 2)
(3, 2)
(4, 2)
(1, 3)
(2, 3)
(3, 3)
(4, 3)
(1, 4)
(2, 4)
(3, 4)
(4, 4)


While:

import std.stdio, std.range;
void main() {
    auto data = [1, 2, 3, 4];
    foreach (tup; pairwise(data))
        writeln(tup);
}


Should generate:
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)


In the Python standard library there is a lazy generator that's 
more general than pairwise:

>>> from itertools import combinations
>>> list(combinations([1, 2, 3, 4], 2))
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]


So if you prefer that more general solution the D code becomes:

...
                   .map!(words => words
                                  .combinations(2)
                                  .filter!(ww => ww[]
...

Bye,
bearophile
May 27, 2013
Re: More on "Component Programming"
On Monday, 27 May 2013 at 21:36:12 UTC, bearophile wrote:
> snip

Every time I see that kind of code, my heart makes a delightful 
jump. That code is what I enjoy most about D compared to C++. 
Plus, the compiler is still able to optimize most of the 
delegate/range fluff away (as opposed to e.g. C#).

I'm all for more algorithm primitives in std.algorithm. I missed 
classify often enough and coming from a C# backgroung I was 
confused that std.algorithm.group did not what I thought it did.

Is there any reason why you keep using quoted strings instead of 
string literals for lambdas besides taste?
May 28, 2013
Re: More on "Component Programming"
Sebastian Graf:

> Plus, the compiler is still able to optimize most of the 
> delegate/range fluff away (as opposed to e.g. C#).

There are several optimizations that D/DMD is not performing on 
those ranges and higher order functions. The Haskell compiler GHC 
optimized that stuff using purity, library defined "rewrite 
rules", stream fusion/deforestation and more. DMD does nothing of 
this, or very little. I think so far Walter has shown no interest 
in this.


> I'm all for more algorithm primitives in std.algorithm. I 
> missed classify often enough and coming from a C# backgroung I 
> was confused that std.algorithm.group did not what I thought it 
> did.

> and coming from a C# backgroung I was confused that 
> std.algorithm.group did not what I thought it did.

The "group" of Phobos is useful and it has purposes quite 
different from the Perl6 "classify", both are needed.
I have also suggested "group" to act more like the Python 
"itertools.grouby" and yield not just the (head,count) tuples, 
but (head,lazy_range_of_the_equality_ class) that is quite 
useful, example:


>>> from itertools import groupby
>>> s = "abABACAaaaaBCAB"
>>> [(h, list(g)) for h,g in groupby(s, key=str.isupper)]
[(False, ['a', 'b']), (True, ['A', 'B', 'A', 'C', 'A']), (False, 
['a', 'a', 'a', 'a']), (True, ['B', 'C', 'A', 'B'])]

I think Andrei was looking at this change with interest. Changing 
Phobos group() API now is maybe not easy to do, so maybe it's 
better to introduce a differently named function for that, like 
one named "groupAll" or something similar.


> Is there any reason why you keep using quoted strings instead 
> of string literals for lambdas besides taste?

My editor uses a single uniform color for the contents of normal 
strings, unlike quoted strings.

Bye,
bearophile
May 28, 2013
Re: More on "Component Programming"
>     .map!(words => words
>                    .classify!q{ a
>                                 .dup
>                                 .representation
>                                 .sort()
>                                 .release

Also, let's kill the built-in sort already :-)

Bye,
bearophile
May 28, 2013
Re: More on "Component Programming"
On Tue, May 28, 2013 at 02:16:22AM +0200, bearophile wrote:
> Sebastian Graf:
> 
> >Plus, the compiler is still able to optimize most of the
> >delegate/range fluff away (as opposed to e.g. C#).
> 
> There are several optimizations that D/DMD is not performing on
> those ranges and higher order functions. The Haskell compiler GHC
> optimized that stuff using purity, library defined "rewrite rules",
> stream fusion/deforestation and more.

Library-defined rewrite rules would be ├╝ber-cool if it were supported in
D.

*(waits for somebody to mention "AST macros", and everyone else to shout
him down. :-P)


> DMD does nothing of this, or very little. I think so far Walter has
> shown no interest in this.
[...]

The DMD optimizer certainly leaves a lot of room for improvement. I
think Walter would accept pull requests to that effect. ;-)


T

-- 
Try to keep an open mind, but not so open your brain falls out. -- theboz
May 28, 2013
Re: More on "Component Programming"
On 28/05/13 10:37, bearophile wrote:
>>     .map!(words => words
>>                    .classify!q{ a
>>                                 .dup
>>                                 .representation
>>                                 .sort()
>>                                 .release
>
> Also, let's kill the built-in sort already :-)
>

But I just found it and started using it. :-)

I was contemplating writing my own sort function as the ones in 
std.algorithm didn't meet my needs (or using them was too messy) until I 
discovered this feature.

Are the () necessary on sort?  I found:

auto sorted_array = an_array.dup.sort;

worked.

Peter
PS Now I've found this I can go back and simplify all the code where I 
iterated over associative arrays in key order by getting the keys and 
the sorting them separately.
PPS Every time I discover one of these features I like D even more.
May 28, 2013
Re: More on "Component Programming"
On Tuesday, May 28, 2013 11:30:55 Peter Williams wrote:
> Are the () necessary on sort?  I found:
> 
> auto sorted_array = an_array.dup.sort;

Any function which takes no arguments can be called without parens, and thanks 
to UFCS (Universal Function Call Syntax), you're calling these functions as if 
they were member functions and so they no longer have any arguments between 
the parens and so can be called without parens.

- Jonathan M Davis
May 28, 2013
Re: More on "Component Programming"
On 5/27/13 8:16 PM, bearophile wrote:
> The "group" of Phobos is useful and it has purposes quite different from
> the Perl6 "classify", both are needed.
> I have also suggested "group" to act more like the Python
> "itertools.grouby" and yield not just the (head,count) tuples, but
> (head,lazy_range_of_the_equality_ class) that is quite useful, example:
>
>
>>>> from itertools import groupby
>>>> s = "abABACAaaaaBCAB"
>>>> [(h, list(g)) for h,g in groupby(s, key=str.isupper)]
> [(False, ['a', 'b']), (True, ['A', 'B', 'A', 'C', 'A']), (False, ['a',
> 'a', 'a', 'a']), (True, ['B', 'C', 'A', 'B'])]
>
> I think Andrei was looking at this change with interest. Changing Phobos
> group() API now is maybe not easy to do, so maybe it's better to
> introduce a differently named function for that, like one named
> "groupAll" or something similar.

I wrote that a while ago. Someone please review and pull 
https://github.com/D-Programming-Language/phobos/pull/1186 already!


Andrei
May 28, 2013
Re: More on "Component Programming"
Peter Williams:

> Are the () necessary on sort?

If you don't use () I think you call the slower, not flexible and 
buggy built-in sort. I think it's already deprecated. Maybe I am 
wrong...


> PS Now I've found this I can go back and simplify all the code 
> where I iterated over associative arrays in key order by 
> getting the keys and the sorting them separately.

I don't fully understand.

Bye,
bearophile
May 28, 2013
Re: More on "Component Programming"
On 5/27/13 5:36 PM, bearophile wrote:
> This simple example shows the difference:
>
> import std.stdio, std.algorithm;
> void main() {
> auto data = [1, 2, 3, 4];
> foreach (xy; cartesianProduct(data, data))
> writeln(xy);
> }
>
>
> Generates the tuples:
> (1, 1)
> (2, 1)
> (3, 1)
> (4, 1)
> (1, 2)
> (2, 2)
> (3, 2)
> (4, 2)
> (1, 3)
> (2, 3)
> (3, 3)
> (4, 3)
> (1, 4)
> (2, 4)
> (3, 4)
> (4, 4)

I'm disappointed cartesianProduct works that way; I should have caught 
that during the code review. A better iteration order would have spanned 
the lower position in both ranges first, i.e. create squares of 
increasing side in the 2D space.

Andrei
« First   ‹ Prev
1 2 3 4
Top | Discussion index | About this forum | D home