May 28, 2013
On Mon, May 27, 2013 at 9:53 PM, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Mon, May 27, 2013 at 09:24:28PM -0700, Timothee Cour wrote:
> > > Done, turns out the fix was trivial, just swapping two static ifs: https://github.com/D-Programming-Language/phobos/pull/1314
> >
> > This isn't what Andrei had in mind in his post above:
> >
> > > I'm disappointed cartesianProduct works that way; I should have caught that during the code review. A better iteration order would have spanned the lower position in both ranges first, i.e. create squares of increasing side in the 2D space.
> >
> > I would suggest an additional template parameter to specify the order of iteration:
> [...]
>
> The problem with allowing the user to specify order is that when one or more of the input ranges are infinite, the order of traversal is much less flexible, and it may not be possible to satisfy the requested order.
>

Explicit is better than implicit. The user should request the order, and
the compiler can statically disallow
lexicographic_depth/antilexicographic_depth
when at least one of the ranges is infinite.


>
> The order that Andrei suggested is what's currently used for two infinite ranges, though when finite ranges are involved I chose a slightly simpler implementation. The order bearophile proposed in issue 9878 is different from what Andrei describes, at any rate.
>
> What's the typical output order of cartesian products in other languages / libraries? It seems nobody can agree on what the order should be. :-/
>
>
python uses itertools.product which is lexicographic_depth.
Like you say, no-one can agrees what the order should be, so let's leave it
up to user through a template. Sounds like a no-brainer to me. There are
use cases for each order I mentioned.


>
> T
>
> --
> People demand freedom of speech to make up for the freedom of thought which they avoid. -- Soren Aabye Kierkegaard (1813-1855)
>


May 28, 2013
On Monday, 27 May 2013 at 21:36:12 UTC, bearophile wrote:
> "pairwise" is a very useful lazy range similar to cartesianProduct, but it yields only the ordered pairs, so they cover only about half (a triangle) of the square matrix of the possibilities.

This will be part of my combinatorics library, but is more general. pairWise is just a specific case of k-subsets (with k=2). By default, it currently provides the subsets in colexicographic order instead of lexicographic, but I intend to make the ordering a policy before I'm done.
May 28, 2013
On Tuesday, 28 May 2013 at 00:16:23 UTC, bearophile wrote:
> Sebastian Graf:
>
>> Plus, the compiler is still able to optimize most of the delegate/range fluff away (as opposed to e.g. C#).
>
> There are several optimizations that D/DMD is not performing on those ranges and higher order functions. The Haskell compiler GHC optimized that stuff using purity, library defined "rewrite rules", stream fusion/deforestation and more. DMD does nothing of this, or very little. I think so far Walter has shown no interest in this.
>

Point taken, but what I actually meant was that the compiler is allowed to optimize it all away in _theory_. You cannot have this in C#, where filtering a
 big array is faster with a foreach-if combination than with an array.Where(predicate) linq statement (I blame the cache and delegate indirection). Since there is no strong compile time templating, this will not change in the future.
Of course this is a very specific case, but it is something you have to keep in mind. The difference for some MB-sized arrays is noticable.
TIL that GHC already has this. I should really continue learning Haskell.
May 28, 2013
On Monday, 27 May 2013 at 23:23:52 UTC, Sebastian Graf wrote:
> On Monday, 27 May 2013 at 21:36:12 UTC, bearophile wrote:
>> snip
>
> Every time I see that kind of code, my heart makes a delightful jump. That code is what I enjoy most about D compared to C++. Plus, the compiler is still able to optimize most of the delegate/range fluff away (as opposed to e.g. C#).
>
> I'm all for more algorithm primitives in std.algorithm. I missed classify often enough and coming from a C# backgroung I was confused that std.algorithm.group did not what I thought it did.
>
> Is there any reason why you keep using quoted strings instead of string literals for lambdas besides taste?

Microsoft could actually spend a bit more time on their optimizer, but I guess business has other priorities.
May 28, 2013
On Tuesday, 28 May 2013 at 07:26:06 UTC, Sebastian Graf wrote:
> On Tuesday, 28 May 2013 at 00:16:23 UTC, bearophile wrote:
>> Sebastian Graf:
>>
>>> Plus, the compiler is still able to optimize most of the delegate/range fluff away (as opposed to e.g. C#).
>>
>> There are several optimizations that D/DMD is not performing on those ranges and higher order functions. The Haskell compiler GHC optimized that stuff using purity, library defined "rewrite rules", stream fusion/deforestation and more. DMD does nothing of this, or very little. I think so far Walter has shown no interest in this.
>>
>
> Point taken, but what I actually meant was that the compiler is allowed to optimize it all away in _theory_. You cannot have this in C#, where filtering a
>  big array is faster with a foreach-if combination than with an array.Where(predicate) linq statement (I blame the cache and delegate indirection). Since there is no strong compile time templating, this will not change in the future.
> Of course this is a very specific case, but it is something you have to keep in mind. The difference for some MB-sized arrays is noticable.
> TIL that GHC already has this. I should really continue learning Haskell.

Even with NGEN or Mono -aot?
May 28, 2013
On Tuesday, 28 May 2013 at 02:01:30 UTC, Andrei Alexandrescu wrote:
> I'm disappointed cartesianProduct works that way; I should have caught that during the code review. A better iteration order would have spanned the lower position in both ranges first, i.e. create squares of increasing side in the 2D space.

Why is that better? It would be both unexpected and almost certainly slower.

It has the advantage that it works sensibly with infinite ranges, but I think the correct approach would be to provide a policy option for the iteration strategy (lexicographic, colexicographic, Gray, etc.) so that the user can control this.
May 28, 2013
Peter Alexander:

> This will be part of my combinatorics library, but is more general. pairWise is just a specific case of k-subsets (with k=2).

Yes, that's what's written a bit lower in my post, in the combinations() of Python.

Bye,
bearophile
May 28, 2013
Timothee Cour:

> python uses itertools.product which is lexicographic_depth.
> Like you say, no-one can agrees what the order should be, so let's leave it
> up to user through a template. Sounds like a no-brainer to me. There are
> use cases for each order I mentioned.

Different cases enjoy different orderings, so giving it a compile-time enum is OK. But I prefer the order of 9878 to be the default one, because it's the most commonly needed by me, and it's what I expect when I port Python code to D.

Bye,
bearophile
May 28, 2013
Walter Bright:

>> There are several optimizations that D/DMD is not performing on those ranges and
>> higher order functions. The Haskell compiler GHC optimized that stuff using
>> purity, library defined "rewrite rules", stream fusion/deforestation and more.
>> DMD does nothing of this, or very little. I think so far Walter has shown no
>> interest in this.
>
> I don't really know what you're talking about.

Then maybe you have missed the last two times I have discussed such topics in D.learn. They are optimizations done by compilers of functional languages, in particular the GHC compiler of the lazy functional language Haskell.

The optimizations performed thanks to purity and immutability are needed by GHC because in Haskell you use Higher Order Functions (HOFs), and generally you use functions as first class immutable values, so if you don't inline a lot, your Haskell program will be slow as a snail.

The "rewrite rules" is a feature of the GHC compiler. It allows the library writers to define rules that in most (or many) cases are advantageous and lead to better optimization. As example one of such rules swap map and filter, putting the filter before the map, because this is more efficient. Haskell allows you to perform such swapping because (nearly) everything it does is pure and immutable.

Some examples and explanations on the GHC rewrite rules:
http://www.haskell.org/ghc/docs/7.0.1/html/users_guide/rewrite-rules.html
http://www.haskell.org/haskellwiki/GHC/Using_rules
http://www.haskell.org/haskellwiki/Playing_by_the_rules

Stream fusion, or more generally deforestation is another optimization done by GHC, it's needed because it's a functional language that uses lists all the time, and such optimization are helped by Haskell default lazyness (deforestation is possible in an eager language too, but it's a bit less easy to do, they say). If you perform a map and then a filter and then another HOF you are traversing lists all the time, and this is slow. Haskell default strings currently are lists of chars, so you have to optimize string operations well. To generate fast binaries GHC "fuses" similar lazy streams, and avoids creating all the intermediate lists.

With this optimization recently GHC has shown to generate good enough vectorized SSE-aware binaries that are faster than naive C code optimized by the vectorizing stages of the GCC optimizer, and beaten only by very vector-intrinsics-heavy handwritten C code:
http://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf

But that's not a paper to start reading about such topics, because that's a late paper on the topic.

More on the topic:
http://www.randomhacks.net/articles/2007/02/10/map-fusion-and-haskell-performance

On the topic there are both easy to understand and harder to understand papers. So you have to choose.

Bye,
bearophile
May 28, 2013
On 2013-05-28 02:16, bearophile wrote:

> My editor uses a single uniform color for the contents of normal
> strings, unlike quoted strings.

Why not use proper lambdas instead of strings?

-- 
/Jacob Carlborg