May 30, 2010
On Sun, May 30, 2010 at 18:00, Andrei Alexandrescu < SeeWebsiteForEmail@erdani.org> wrote:

> On 05/30/2010 10:27 AM, Simen kjaeraas wrote:
>
>> { 2·x | x ∈ N, x² > 3 }
>> =>
>> ( list!"2*a" | iota(1,int.max) * where!"x^^2 > 3" )
>> return comp ! ("tuple(a,b,c)", "a*a+b*b==c*c && a<b")
>>
>



> This reminded me that Phobos lacks a combinatorial range, taking two or
>> more (forward) ranges and giving all combinations thereof
>>
>
Simen, could you please have a look at my dranges project?

http://www.dsource.org/projects/dranges/

Look at the algorithm and range sections, you'll find plenty of code that
try to tackle this.
In particular combinations() in algorithm2, and also tensorialProduct() in
rangeofranges

combinations is a higher-order range that takes any number of forward ranges and output their combinations as tuples.

tensorialProduct does the same, but creates 'topology' : the combination of three ranges is a range, whereas the tensorial product of three (linear) ranges is a range of ranges of ranges:

tensorialProduct([0,1],[2,3])
->
[[(0,2),(0,3)],
 [(1,2),(1,3)]]

(where (x,y) indicates a tuple)


Yah, that would be useful. If Philippe agrees to adapt his work, maybe that
> would be the fastest route. And don't forget - the gates of Phobos are open.
>
> Andrei
>
>
Concerning helping Phobos, I'd be delighted to if the Phobos team deems it possible. It's just that my job and formation are far from coding (I'm more a physicist), so I'm not sure my code would be up to it without being reviewed.

I'm OK with pretty much everything concerning licensing and adaptation. I'll begin by putting a Boost licence in my code, when dsource is up again. If someone is interested by some of it, I'm quite ready to adapt it to conform to D/Phobos evolutions. That's what I'm doing anyway.


    Philippe


May 30, 2010
Philippe Sigaud <philippe.sigaud@gmail.com> wrote:
> Simen, could you please have a look at my dranges project?
>
> http://www.dsource.org/projects/dranges/
>
> Look at the algorithm and range sections, you'll find plenty of code that
> try to tackle this.
> In particular combinations() in algorithm2, and also tensorialProduct() in
> rangeofranges
>
> combinations is a higher-order range that takes any number of forward ranges
> and output their combinations as tuples.
>
> tensorialProduct does the same, but creates 'topology' : the combination of
> three ranges is a range, whereas the tensorial product of three (linear)
> ranges is a range of ranges of ranges:
>
> tensorialProduct([0,1],[2,3])
> ->
> [[(0,2),(0,3)],
>  [(1,2),(1,3)]]
>
> (where (x,y) indicates a tuple)

Indeed. I just had a quick look at the traits2 module last time you
linked it. This does seem to cover all bases. I have to say I'm impressed!


-- 
Simen
May 30, 2010
On Sun, May 30, 2010 at 19:01, Simen kjaeraas <simen.kjaras@gmail.com>wrote:

> Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>
>> This reminded me that Phobos lacks a combinatorial range, taking two or
>>> more (forward) ranges and giving all combinations thereof:
>>>
>>> combine([0,1],[2,3])
>>> =>
>>> (0,2), (0,3), (1,2), (1,3)
>>>
>>> Work has commenced on implementing just that.
>>>
>>
>> Yah, that would be useful. If Philippe agrees to adapt his work, maybe that would be the fastest route. And don't forget - the gates of Phobos are open.
>>
>
> Too late for that, as I've already written this. :p
>

Hey, cool, lots of static foreach. Man, it's like I'm reading my own code. I'm happy to see convergent evolution: this must be the D way to do this.



> Current problems: back and popBack not implemented. I'm not sure they even should be. Doing so would be a tremendous pain the region below the spine.
>
>
Ow.
I *guess* it's doable, but I for sure don't want to do it.


> May very well be there are other problems, I don't know. If anyone finds any, please let me know.


Well, I like the way you dealt with popFront.  You length is more costly than mine, but I cheated: I count the numbers of popFronts and substract them from the original length.

Your empty use .length in the foreach loop. You should use .empty instead, I think. And with an infinite range in the mix, the resulting product is infinte also, so you can define your combine to be infinite.

Then I can give you something to munch over :-)

When one range is infinite, the resulting combination is infinite. What's
sad is that you'll never 'traverse' the infinite range:
auto c = combine([0,1,2], cycle([2,3,4]));
->
(0,2),(0,3),(0,4),(0,2),(0,3), ...

You never reach the (1,x) (2,x) parts, as it should be seeing how we both
defined combinations: iterating on the ranges as if they were digits in a
number.

But there is another way to iterate: diagonally, by cutting successive diagonal slices:

c is:
(0,2),(0,3),(0,4),(0,2),...
(1,2),(1,3),(1,4),(1,2),...
(2,2),(2,3),(2,4),(2,2),...
->

(0,2),(0,3)(1,2),(0,4),(1,3),(2,2) ...

It's even better if all ranges are infinite.
I never coded this, but it seems doable for two ranges. Lot thougher for any
number of ranges... Pretty obscure, maybe?

btw, I think we should divert this discussion in another thread if you wish to continue: this is bearophile's Huffman thread, after all.

  Philippe


May 30, 2010
On Sun, May 30, 2010 at 19:29, Simen kjaeraas <simen.kjaras@gmail.com>wrote:

>
> Indeed. I just had a quick look at the traits2 module last time you linked it. This does seem to cover all bases. I have to say I'm impressed!
>
>
If you ever have time to look at it, I'm always eager for
comments/criticisms/new ideas.
Even though my todo list for this project has more than 200 entries left :-)


Philippe


May 30, 2010
On 05/30/2010 04:43 PM, Philippe Sigaud wrote:
> On Sun, May 30, 2010 at 19:01, Simen kjaeraas <simen.kjaras@gmail.com
> <mailto:simen.kjaras@gmail.com>> wrote:
>
>     Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org
>     <mailto:SeeWebsiteForEmail@erdani.org>> wrote:
>
>             This reminded me that Phobos lacks a combinatorial range,
>             taking two or
>             more (forward) ranges and giving all combinations thereof:
>
>             combine([0,1],[2,3])
>             =>
>             (0,2), (0,3), (1,2), (1,3)
>
>             Work has commenced on implementing just that.
>
>
>         Yah, that would be useful. If Philippe agrees to adapt his work,
>         maybe that would be the fastest route. And don't forget - the
>         gates of Phobos are open.
>
>
>     Too late for that, as I've already written this. :p
>
>
> Hey, cool, lots of static foreach. Man, it's like I'm reading my own
> code. I'm happy to see convergent evolution: this must be the D way to
> do this.
>
>     Current problems: back and popBack not implemented. I'm not sure they
>     even should be. Doing so would be a tremendous pain the region below the
>     spine.
>
>
> Ow.
> I *guess* it's doable, but I for sure don't want to do it.
>
>     May very well be there are other problems, I don't know. If anyone finds
>     any, please let me know.
>
>
> Well, I like the way you dealt with popFront.  You length is more costly
> than mine, but I cheated: I count the numbers of popFronts and substract
> them from the original length.
>
> Your empty use .length in the foreach loop. You should use .empty
> instead, I think. And with an infinite range in the mix, the resulting
> product is infinte also, so you can define your combine to be infinite.
>
> Then I can give you something to munch over :-)
>
> When one range is infinite, the resulting combination is infinite.
> What's sad is that you'll never 'traverse' the infinite range:
> auto c = combine([0,1,2], cycle([2,3,4]));
> ->
> (0,2),(0,3),(0,4),(0,2),(0,3), ...
>
> You never reach the (1,x) (2,x) parts, as it should be seeing how we
> both defined combinations: iterating on the ranges as if they were
> digits in a number.
>
> But there is another way to iterate: diagonally, by cutting successive
> diagonal slices:
>
> c is:
> (0,2),(0,3),(0,4),(0,2),...
> (1,2),(1,3),(1,4),(1,2),...
> (2,2),(2,3),(2,4),(2,2),...
> ->
>
> (0,2),(0,3)(1,2),(0,4),(1,3),(2,2) ...
>
> It's even better if all ranges are infinite.
> I never coded this, but it seems doable for two ranges. Lot thougher for
> any number of ranges... Pretty obscure, maybe?
>
> btw, I think we should divert this discussion in another thread if you
> wish to continue: this is bearophile's Huffman thread, after all.

Yah, there's no argument that infinite ranges must be allowed by a n-way cross-product. It reminds one of Cantor's diagonalization, just in several dimensions. Shouldn't be very difficult, but it only works if all ranges except one are forward ranges (one can be an input range).

Andrei
May 31, 2010
On 05/28/2010 05:01 PM, bearophile wrote:
[snip]
> Some notes (later I can send them to Andrei):

Thanks for your notes. I removed those for which I have no notable answer.

> The "std.typecons" module, that contains Tuple and tuple can be named
> something simpler to remember as "std.tuples".

Phobos' general approach to naming modules is of coarse granularity. I wanted to massage in std.typecons everything that constructs new, generally useful types.

> I will keep missing array comprehensions in D. In the meantime other
> languages have got some forms of them (but Python ones use the best
> syntax still).

Looks like we're having two proposals.

> In translating the program I have not found significant problems. The
> only bug I have found was caused by the different ordering of the
> Python/Phobos2 heaps, so I have had to use "b<a" in D2. This is not a
> bug, but I have to keep it in mind in future usages of heaps across
> the two languages. I don't know why the two languages have chosen
> different orderings here, if someone of you knows I'd like to know
> it.

a < b is sort of the default comparison operator in D, so it would have been surprising if I'd created a max heap.

> Another small problem I have found was caused by BinaryHeap.pop().
> I'd like it to pop and return a single item, instead of an array of
> given length of items, because this is done quite more often than
> popping many items. If the need to pop many items is important, then
> a different method can be defined, as npop().

There exists a pop() function that only pops one element.

> Maybe BinaryHeap can be moved to the collections module.

Agreed.

> array(map(...)) is so common that an amap(...) can be considered.

I don't know.

> A helper function like this one can be useful: auto binaryHeap(alias
> less="a<  b", Range)(Range items) { return BinaryHeap!(Range,
> less)(items); } Note that the 'less' is before the Range, so in that
> program you can write: auto heap = binaryHeap!("b<  a")(s2w); Instead
> of: auto heap = BinaryHeap!(Block[], "b<  a")(s2w); And in many cases
> you just write: auto heap = binaryHeap(items);

Sounds good.

> I have seen that this doesn't work, but something like this can be
> useful (but I am not sure, I don't love strings used this way):
> schwartzSort!("a.code.length")(r);
>
>
> In Python there is both list.sort() and sorted(), the second creates
> a new list. In my dlibs1 I have similar functions. They allow you to
> write: return schwartzSorted!((x){ return x.code.length;
> })(...items); Instead of: auto r = ...items; schwartzSort!((x){
> return x.code.length; })(items); return items; So I'd like sorted and
> schwartzSorted in Phobos2.

I'm not crazy about functions that return large arrays by value. I'd have sorted() return a range (a heap!) that lazily spans the input in sorted order.

> While debugging this program I have found that array(iota(0, 10)) is
> an uint[] instead of being an int[]. In most cases I prefer that
> syntax to produce an array of true ints. In the uncommon situations
> when I want an array of uints I can ask it with a syntax like
> array(iota(0U, 10U)). If this specification is not possible then I
> prefer iota to always yield signed values, because they are much
> *safer* in D.
>
>
> I'd like iota(100) to be the same as iota(0, 100), as in the Python
> range().

Sounds good.

> To improve a *lot* my debugging in code like this I'd like this line
> of code: writeln(tuple(['x': 1.0, 'y': 2.0], "hello", [[1, 2], [3,
> 4]])); To print (better): tuple(['x': 1.0, 'y': 2.0], "hello", [[1,
> 2], [3, 4]]) Or even: Tuple!(double[char], string, int[][])(['x':
> 1.0, 'y': 2.0], "hello", [[1, 2], [3, 4]]) instead of:
> Tuple!(double[char],string,int[][])([x:1, y:2], hello, 1 2 3 4)

I've never had a clear view on what the target audience for writeln() is. You seem to want it to output debug strings; I'm not sure that's the best way to purpose it.


Andrei
May 31, 2010
Andrei:

>Thanks for your notes.<

My pleasure, if you like them I will try to do this thing some more times.
There are probably tens or hundreds of small details that can be improved in Phobos. Some of such changes can improve the usage patterns. In past I have put some of them in Bugzilla.
One good way to find such possible improvements is to use Phobos, to write small programs, and keep eyes open.


>Looks like we're having two proposals.<

I am sceptic that this can be done with no compiler/language support to offer the good enough syntax sugar: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=110613

In my dlibs1 (for D1) I have implemented and then later sometimes used an expanded and improved an idea by Henning Hasemann shown in 2007, but this (you are free to use it if you want, code shown in this newsgroup page is free to use, I think):
- is not efficient
- you have to define the iteration variable types before this array comp
- the code that uses this array comp is not so easy to read
- this can't be done (in D1) for lazy comphrensions.


TA[] select(TA, TI, TC)(lazy TA mapper,
                        ref TI iter1, TC items1) {
    static if(is( TC == void[0] )) {
        return null;
    } else {
        auto aux1 = iter1; // save original iteration variable

        static if (HasLength!(TC)) {
            auto result = new TA[items1.length];

            uint i;
            static if (IsAA!(TC)) {
                foreach (k, v; items1) {
                    iter1 = k;
                    result[i] = mapper();
                    i++;
                }
            } else {
                // Then items1 is an iterable with attribute length
                // (an array, xrange, etc)
                // don't use foreach (i,el;items1), it's less general
                foreach (el; items1) {
                    iter1 = el;
                    result[i] = mapper();
                    i++;
                }
            }

            iter1 = aux1; // restore original iteration variable
            return result;
        } else {
            // Then items1 is an iterable object
            // when TA isn't an AA, geometric grow can be used to speed up
            ArrayBuilder!(TA) result;
            foreach (el; items1) {
                iter1 = el;
                result ~= mapper();
            }
            iter1 = aux1; // restore original iteration variable
            return result.toarray;
        }
    }
}


TA[] select(TA, TI, TC, TP)(lazy TA mapper,
                            ref TI iter1, TC items1,
                            lazy TP where) {
...

TA[] select(TA, TI1, TC1, TI2, TC2)(lazy TA mapper,
                                    ref TI1 iter1, TC1 items1,
                                    ref TI2 iter2, lazy TC2 items2) {
...

TA[] select(TA, TI1, TC1, TI2, TC2, TP)(lazy TA mapper,
                                        ref TI1 iter1, TC1 items1,
                                        ref TI2 iter2, lazy TC2 items2,
                                        lazy TP where) {
...



>There exists a pop() function that only pops one element.<

This is how it is implemented:

/**
Pops the largest element (according to the predicate $(D less)).
 */
    void pop()
    {
        enforce(_length);
        if (_length > 1)
        swap(_store.front, _store[_length - 1]);
        --_length;
        percolateDown(_store[0 .. _length]);
    }

I'd like it to also return the popped item, a ElementType!Range, is this possible? Popping one item out is one of the most common operations I have to perform on an heap.


>>array(map(...)) is so common that an amap(...) can be considered.<<

>I don't know.<

A too much long list of function (that is a too much large API) is bad, but I have found that for the most common higher order functions (map and filter, they are common because array comps aren't present) I like a short cut for the eager version, amap/afilter. But they are not essential, we can survive without them :-)


>I'm not crazy about functions that return large arrays by value. I'd have sorted() return a range (a heap!) that lazily spans the input in sorted order.<

When I need only the few items in sorted order I can use just pop(n), or many pop().
Functional languages return data copies, but they are sometimes lazy (Haskell) or thy try to avoid using arrays and use more functional-friendly data structures that reduce the need for copying lot of data.

sorted() and schwartzSorted() can be handy because they can be used as expressions in a functional style. You can write:

foreach (x; sorted(items)) {...

Instead of:

auto sortedItems = items.dup;
sortedItems.sorted();
foreach (x; sortedItems) {...

If items is long then sorted() is not the top efficiency in both memory and time, but in many situations you don't have many items. Most arrays are short. If you have a 5 items long array and the items are small (like numbers) then using sorted() is not so bad unless it's in the middle of a critical piece of code. And in this case using a standard sort is probably wrong anyway.

So sorted/schwartzSorted are not for every situation, they are more for situations where you prefer handy and short code, and you don't need max performance. You don't have to abuse them, as most other things.


>I've never had a clear view on what the target audience for writeln() is. You seem to want it to output debug strings; I'm not sure that's the best way to purpose it.<

Usages of the printing functions:
- To debug code. For this purpose the text shown has to be human-readable, the writeln has to be as automatic as possible (to reduce time needed to add the printing statements), and the text shown has to be "nice" to show the data types but not too much noisy, otherwise the text can become useless. There are more modern ways to show data structures, even GUI-based, but having a fall-back strategy with a good writeln is good.
- To show output in small script-like programs or medium command line programs. I think this is the same case as the debug code one.
- To print a lot of numbers or simple symbols, for later processing with other programs. In this case printf() is better because it's faster than writeln.
- To print many strings. In this case in D printf()/puts() can be suboptimal or unfit. Some very simple, very fast and not templated function similar to puts() but designed for D strings.
- For (textual) serialization? In this case it's better to use functions more specialized for this purpose, and to avoid the writeln.


So I don't see why it's better for this command:
  writeln(tuple(['x': 1.0, 'y': 2.0], "hello", [[1, 2], [3, 4]]));

To print:
  Tuple!(double[char],string,int[][])([x:1, y:2], hello, 1 2 3 4)

Instead of something more fitter for humans, that can show the things well, as:
  tuple(['x': 1.0, 'y': 2.0], "hello", [[1, 2], [3, 4]])
Or:
  Tuple!(double[char], string, int[][])(['x': 1.0, 'y': 2.0], "hello", [[1, 2], [3, 4]])

Bye,
bearophile
May 31, 2010
> I'd like it to also return the popped item, a ElementType!Range, is this possible? Popping one item out is one of the most common operations I have to perform on an heap.

I have read some pages, trying to understand why your pop doesn't return the item.

In a page I have read:
>Pop returns void for the sake of speed: SGI FAQ, and to prevent exceptions that may be thrown by the objects copy constructor.<


A quotation from this page: http://www.sgi.com/tech/stl/FAQ.html
>Why do the pop member functions return void? All of the STL's pop member functions (pop_back in vector, list, and deque; pop_front in list, slist, and deque; pop in stack, queue, and priority_queue) return void, rather than returning the element that was removed. This is for the sake of efficiency. If the pop member functions were to return the element that was removed then they would have to return it by value rather than by reference. (The element is being removed, so there wouldn't be anything for a reference to point to.) Return by value, however, would be inefficient; it would involve at least one unnecessary copy constructor invocation. The pop member functions return nothing because it is impossible for them to return a value in a way that is both correct and efficient.<


Time ago I have suggested about a possible enum boolean value defined inside nonvoid functions, that is true/false if the function result is used or not:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=97424
So this value can be used with a static if to compute the result. So such functions are like templates, they get copied in two (generic function pointers have to refer to the nornal version of the function that computes and returns the result).

A simpler solution is to use two different names for two functions. The most common operation is to pop an item from a heap and then to use it. So the pop() can return the popped item. And then the heap can define another method that just pops the top item and doesn't return it, it can be named for example "drop", something similar to this (I don't know if this is right):

/**
Pops the largest element (according to the predicate $(D less))
and returns it.
 */
    ElementType!Range pop()
    {
        enforce(_length);
        auto tmp = _store.front;
        enforce(_length);
        if (_length > 1)
            swap(_store.front, _store[_length - 1]);
        --_length;
        percolateDown(_store[0 .. _length]);
        return tmp;
    }

/**
Drops the largest element (according to the predicate $(D less)).
 */
    void drop()
    {
        enforce(_length);
        if (_length > 1)
            swap(_store.front, _store[_length - 1]);
        --_length;
        percolateDown(_store[0 .. _length]);
    }


Bye,
bearophile
June 01, 2010
Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Yah, there's no argument that infinite ranges must be allowed by a n-way cross-product. It reminds one of Cantor's diagonalization, just in several dimensions. Shouldn't be very difficult, but it only works if all ranges except one are forward ranges (one can be an input range).

Might I coerce you into indulging some more detail on this idea? I'm
afraid my knowledge of the diagonal method is sadly lacking, and some
reading on the subject did not give me satisfactory understanding of
its application in the discussed problem.

Way I thought of doing it is save the highest position this far of each
range, then in popFront see if we're past it. If we are, reset this
range, and pop from the next range up, recursively.

-- 
Simen
June 01, 2010
On 05/31/2010 08:07 PM, Simen kjaeraas wrote:
> Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>
>> Yah, there's no argument that infinite ranges must be allowed by a
>> n-way cross-product. It reminds one of Cantor's diagonalization, just
>> in several dimensions. Shouldn't be very difficult, but it only works
>> if all ranges except one are forward ranges (one can be an input range).
>
> Might I coerce you into indulging some more detail on this idea? I'm
> afraid my knowledge of the diagonal method is sadly lacking, and some
> reading on the subject did not give me satisfactory understanding of
> its application in the discussed problem.
>
> Way I thought of doing it is save the highest position this far of each
> range, then in popFront see if we're past it. If we are, reset this
> range, and pop from the next range up, recursively.

I thought about this some more, and it's more difficult and more interesting than I thought.

Cantor enumerated rational numbers the following way: first come all fractions that have numerator + denominator = 1. That's only one rational number, 1/1. Then come all fractions that have num + denom = 2. That gives 1/2 and 2/1. Then come all fractions that have num + denom = 3, and so on.

Using this enumeration method he proved that rational numbers are countable so in some way they are not more "numerous" than natural numbers, which was an important result.

With ranges, the trick should be similar: although individual ranges may be infinite, they are composed in a simple, countable manner.

Generalizing Cantor's enumeration technique to n ranges, note that the enumeration goes through elements such that the sum of their offsets from the beginning of the ranges is constant.

So for two ranges, we first select pairs that have their offsets sum to 0. That is (0, 0). Then we select pairs of offsets summing to 1: (0, 1) and (1, 0). Then same for 2: (0, 2), (1, 1), (2, 0). And so on.

The problem is that in order to make sure offsets are constant, forward ranges are not enough. If all ranges involved had random access, the problem would be trivially solvable. The trick is pushing that envelope; for example, it's possible to make things work with one forward range and one random-access range, and so on.

This is bound to be an interesting problem. Please discuss any potential solutions here.


Andrei