Java streams Vs LINQ Vs D

March 27, 2013

Posted by bearophile

Permalink

bearophile

Permalink

Linked on Reddit I've seen a nice comparison of Java streams Vs dotnet LINQ:

http://blog.informatech.cr/2013/03/24/java-streams-preview-vs-net-linq/

Despite they aren't a complete list, those little challenges are well chosen, they are operations done commonly. So I have translatated them to D with Phobos. For most of them I have found a nice D translation. But few of them uncover holes in Phobos, that I alredy know. (Maybe some of them are not really Phobos holes, but just my lack of knowledge about Phobos and D. So your better solutions are welcome).

If you want to read the whole list of my translations:
http://codepad.org/0KtXu7nh

Below I list just the five troubled challenges, with the LINQ solution followed by one or more D solutions.

For all the solutions I import several modules:

import std.stdio, std.algorithm, std.range, std.typecons, std.traits,
       std.array, std.string;

- - - - - - - - - - - -

Challenge 2: Indexed Filtering

Find all the names in the array "names" where the length of the name is less than or equal to the index of the element + 1.

string[] names = { "Sam", "Pamela", "Dave", "Pascal", "Erik" };
var nameList = names.Where((c, index) => c.Length <= index + 1).ToList();


In D:

    auto names2 = ["Sam","Pamela", "Dave", "Pascal", "Erik"];
    auto nameRange = iota(size_t.max)
                     .zip(names2)
                     .filter!q{ a[1].length <= a[0] }
                     .map!q{ a[1] };
    nameRange.writeln;


On Bugzilla I have proposed to add an enumerate():
http://d.puremagic.com/issues/show_bug.cgi?id=5550

With it the D code improves:


    auto nameRange2 = names2
                      .enumerate
                      .filter!q{ a[1].length <= a[0] }
                      .map!q{ a[1] };
    nameRange2.writeln;


If D gains a syntax to unpack tuples in function signatures the code becomes (untested):


    auto nameRange2 = names2
                      .enumerate
                      .filter!((i, n) => n.length <= i)
                      .map!q{ a[1] };
    nameRange2.writeln;


Beside adding enumerate() that is useful in many other situations, another (not alternative!) idea is to add iFilter/iMap (that mean indexed filter and indexed map), where the filtering or mapping function is supplied by an index+item 2-tuple:

    auto nameRange2 = names2.iFilter!((i, a) => a.length <= i);

Or equivalently:

    auto nameRange2 = names2.iFilter!q{ a.length <= i };


Those ifilter/imap functions are present in the standard library of the F# language.

- - - - - - - - - - - -

Challenge 3: Selecting/Mapping

Say we have a list of names and we would like to print “Hello” in front of all the names:

List<string> nameList1 = new List(){ "Anders", "David", "James",
                                     "Jeff", "Joe", "Erik" };
nameList1.Select(c => "Hello! " + c).ToList()
         .ForEach(c => Console.WriteLine(c));


In Phobos there is no forEach(), so you have to use foreach:

    auto nameList1 = ["Anders", "David", "James", "Jeff", "Joe", "Erik"];
    foreach (name; nameList1)
        writeln("Hello! ", name);


The only advantage I see of a forEach() over foreach() is that it's usable at the end of an UFCS chain.

- - - - - - - - - - - -

Challenge 12: Grouping by a Criterium

Group the elements of a collection of strings by their length.

string[] names = {"Sam", "Samuel", "Samu", "Ravi", "Ratna",  "Barsha"};
var groups = names.GroupBy(c => c.Length);


In Phobos there is a group() but it can't be used here because it returns just one of the equivalent grouped items. And I can't use std.array.assocArray for similar reasons.


    auto names3 = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"];
    string[][size_t] groups;
    foreach (name; names3)
        groups[name.length] ~= name;
    groups.byValue.writeln;


Andrei has recently written a groupBy, not yet merged:
https://github.com/D-Programming-Language/phobos/pull/1186


Using that future groupBy the D code improves a little (untested. In DMD 2.063 schwartzSort accepts a string literal too):

    auto names3 = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"];
    auto groups = names3
                  .schwartzSort!q{ a.length }
                  .groupBy!q{ a.length == b.length };
    groups.writeln;


By the way, I like Python for having a free len() function that's usable for higher order functions like map and filter. In Phobos there is walkLength():

    auto names3 = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"];
    auto groups = names3
                  .schwartzSort!walkLength
                  .groupBy!q{ a.walkLength == b.walkLength };
    groups.writeln;


Unlike schwartzSort the Phobos group/groupBy use a comparison function like "a.length == b.length" instead of a less flexible but more handy single function like "c => c.Length". So I'd like something like a keyGroup/keyGroupBy that accepts a single-argument function as schwartzSort. (And I'd like schwartzSort to be renamed "keySort").

    auto names3 = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"];
    auto groups = names3
                  .schwartzSort!walkLength
                  .keyGroupBy!walkLength;
    groups.writeln;



Another problem with group/groupBy is that they work by sorting. But a hash-based O(n) group/groupBy is also conceivable, potentially faster, and leading to simpler code, because you don't need to sort the items first:

    auto names3 = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"];
    auto groups = names3.hashKeyGroupBy!walkLength;
    groups.writeln;


Uhm. The name "hashKeyGroupBy" is becoming a bit too much complex :-) So maybe it's better not go there.

- - - - - - - - - - - -

Challenge 13: Filter Distinct Elements

Obtain all the distinct elements from a collection.

string[] songIds = {"Song#1", "Song#2", "Song#2", "Song#2", "Song#3", "Song#1"};
var uniqueSongIds = songIds.Distinct();


This is not too much bad in D, there is uniq(), but first you need to .sort or .idup.sort or .array.sort the original array/range:


    auto songIds = ["Song#1", "Song#2", "Song#2", "Song#2", "Song#3", "Song#1"];
    auto uniqueSongIds = songIds.sort().uniq;
    uniqueSongIds.writeln;


A hash-based uniq that doesn't need a previous sorting is conceivable. But see also below.

- - - - - - - - - - - -

Challenge 14: Union of Two Sets

Join together two sets of items.
LINQ

List<string> friends1 = new List<string>() {"Anders", "David","James",
                                            "Jeff", "Joe", "Erik"};
List<string> friends2 = new List<string>() { "Erik", "David", "Derik" };
var allMyFriends = friends1.Union(friends2);


This seems a bit too much complex to do in D+Phobos:


    auto friends1 = ["Anders", "David","James", "Jeff", "Joe", "Erik"];
    auto friends2 = ["Erik", "David", "Derik"];
    auto allMyFriends = friends1.sort().setUnion(friends2.sort()).uniq;
    allMyFriends.writeln;


Note that you have to call uniq at the end because that's not a set union, it's a badly named function. A better name for it is "bagUnion" because it doesn't remove the duplications, and a set operation should.

For the Challenge 13 and 14 I suggest to not add more functions to std.algorithm, and instead just rely on a set data structure, as in Python:


>>> song_ids = ["Song#1", "Song#2", "Song#2", "Song#2", "Song#3", "Song#1"]
>>> set(song_ids)
set(['Song#1', 'Song#2', 'Song#3'])



>>> friends1 = ["Anders", "David","James", "Jeff", "Joe", "Erik"]
>>> friends2 = ["Erik", "David", "Derik"]
>>> set(friends1).union(friends2)
set(['Erik', 'Joe', 'Jeff', 'Derik', 'James', 'Anders', 'David'])


In my D1 dlibs I had a Set!T data structure (with a set() helper function) that offered a similar syntax (here I use D2 UFCS):

    auto songIds = ["Song#1", "Song#2", "Song#2", "Song#2", "Song#3", "Song#1"];
    auto uniqueSongIds = songIds.set;

    auto friends1 = ["Anders", "David","James", "Jeff", "Joe", "Erik"];
    auto friends2 = ["Erik", "David", "Derik"];
    auto allMyFriends = friends1.set.united(friends2);

- - - - - - - - - - - -

Bye,
bearophile

On Wednesday, 27 March 2013 at 22:19:01 UTC, bearophile wrote:
> Challenge 3: Selecting/Mapping
>
> Say we have a list of names and we would like to print “Hello” in front of all the names:
>
> List<string> nameList1 = new List(){ "Anders", "David", "James",
>                                      "Jeff", "Joe", "Erik" };
> nameList1.Select(c => "Hello! " + c).ToList()
>          .ForEach(c => Console.WriteLine(c));
>
>
> In Phobos there is no forEach(), so you have to use foreach:
>
>     auto nameList1 = ["Anders", "David", "James", "Jeff", "Joe", "Erik"];
>     foreach (name; nameList1)
>         writeln("Hello! ", name);
>
>
> The only advantage I see of a forEach() over foreach() is that it's usable at the end of an UFCS chain.

Hmm, I would have thought this should work:

    auto nameList1 = ["Anders", "David", "James", "Jeff", "Joe",
"Erik"];
	 nameList1.copy( a => writeln("Hello! ", a) );

According std.range: "r(e); 	R is e.g. a delegate accepting an E."

Forums