View mode: basic / threaded / horizontal-split · Log in · Help
June 02, 2013
Re: A Small Contribution to Phobos
On 6/2/13 9:20 AM, bearophile wrote:
> Andrei Alexandrescu:
>
>> [1, 2, 3, 4].map!(n => n.writeln).reduce;
>
> I have to shot this down for many reasons:
>
> I think it's better to give that final function a different name (like
> "consume" or something like that) because it's used for very different
> purposes and it returns nothing. Re-using the name "reduce" doesn't
> reduce the amount of Phobos lines of code, it doesn't make the user code
> simpler to understand, it's more obscure because it's more semantically
> overloaded, and it's not more easy to find in the documentation by the
> future D users.

Au contraire, there are many advantages. Using "reduce" leverages a 
well-understood notion instead of introducing a new one. There is less 
need for documentation, motivation, and explanations. "Reduce with no 
function simply spans the entire range." Builds on an already-eager 
construct par excellence instead of adding a new one that must be 
remembered and distinguished from the lazy constructs.

Actually my first thought when I saw consume() was to look up reduce, 
thinking, "how do I reduce a range to nothing"? Because that's the goal. 
Reduce is the obvious choice here.

> Function names are not language keywords, packing different purposes
> in the same name as "static" doesn't give any advantage, and only
> disadvantages.

Strawman argument.


Andrei
June 02, 2013
Re: A Small Contribution to Phobos
On Sunday, 2 June 2013 at 13:07:18 UTC, Andrei Alexandrescu wrote:
> On 6/2/13 1:58 AM, Meta wrote:
>>> For reference type ranges and input ranges which are not 
>>> forward
>>> ranges, this
>>> will consume the range and return nothing.
>>
>> I originally wrote it to accept forward ranges and use save, 
>> but I
>> wanted to make it as inclusive as possible. I guess I 
>> overlooked the
>> case of ref ranges.
> [snip]
>
> Thanks for sharing your ideas.
>
> I think consuming all of a range evaluating front and doing 
> nothing should be the role of reduce with only one parameter 
> (the range). That overload would take the range to be 
> "exhausted" and return void. Thus your example becomes:
>
> [1, 2, 3, 4].map!(n => n.writeln).reduce;
>
>
> Andrei

map being lazy, this can really do all kind of different stuff.
June 02, 2013
Re: A Small Contribution to Phobos
On 6/2/13 11:41 AM, monarch_dodra wrote:
> On Sunday, 2 June 2013 at 13:07:18 UTC, Andrei Alexandrescu wrote:
>> [1, 2, 3, 4].map!(n => n.writeln).reduce;
>>
>>
>> Andrei
>
> One of the problems with using "map" for something such as this, is that
> the resulting object is not a range, since "front" now returns void, and
> a range *must* return a value. So that code will never compile (since
> reduce will ask for at least input range). Heck, I think we should make
> it so that map refuses to compile with an operator that returns void. It
> doesn't make much sense as-is.

Hm, interesting. I'm destroyed.

> Usage has to be something like:
>
> map!((n) {n.writeln; return n;})
>
> which is quite clunky. The idea of a "tee" range, that takes n, runs an
> operation on it, and then returns said n as is becomes really very
> useful (and more idiomatic). [1, 2, 3, 4].tee!(n => n.writeln). There!
> perfect :)
>
> I've dabbled in implementing such a function, but there are conceptual
> problems: If the user calls "front" twice in a row, then should "fun" be
> called twice? If user popsFront without calling front, should "fun" be
> called at all?
>
> Should it keep track of calls, to guarantee 1, and only 1, call on each
> element?
>
> I'm not sure there is a correct answer to that, which is one of the
> reasons I haven't actually submitted anything.

I think there is one answer that arguably narrows the design space 
appropriately: just like the Unix utility, tee should provide a hook 
that creates an exact replica of the (portion of the) range being 
iterated. So calling front several times is nicely out of the picture. 
The remaining tactical options are:

1. evaluate .front for the parent range once in its constructor and then 
every time right after forwarding popFront() to the parent range. This 
is a bit "eager" because the constructor evaluates .front even if the 
client never does.

2. evaluate .front for the parent range just before forwarding 
popFront() to parent. This will call front even though the client 
doesn't (which I think is fine).

3. keep a bool that is set by constructor and popFront() and reset by 
front(). The bool makes sure front() is called if and only if the client 
calls it.

I started writing the options mechanically without thinking of the 
implications. Now that I'm done, I think 2 is by far the best.

> --------
>
> I don't think "argument-less reduce" should do what you describe, as it
> would be a bit confusing what the function does. 1-names; 1-operation,
> IMO. Users might accidentally think they are getting an additive
> reduction :(

Good point.

> I think a function called "walk", in line with "walkLength", would be
> much more appropriate, and make more sense to boot!
>
> But we run into the same problem... Should "walk" call front between
> each element? Both answers are correct, IMO.

That's why I'm thinking: the moment .front gets evaluated, we get into 
the realm of reduce.


Andrei
June 02, 2013
Re: A Small Contribution to Phobos
On Sunday, 2 June 2013 at 16:55:23 UTC, Andrei Alexandrescu wrote:
> On 6/2/13 11:41 AM, monarch_dodra wrote:
>> On Sunday, 2 June 2013 at 13:07:18 UTC, Andrei Alexandrescu 
>> wrote:
>>> [1, 2, 3, 4].map!(n => n.writeln).reduce;
>>>
>>>
>>> Andrei
>>
>> One of the problems with using "map" for something such as 
>> this, is that
>> the resulting object is not a range, since "front" now returns 
>> void, and
>> a range *must* return a value. So that code will never compile 
>> (since
>> reduce will ask for at least input range). Heck, I think we 
>> should make
>> it so that map refuses to compile with an operator that 
>> returns void. It
>> doesn't make much sense as-is.
>
> Hm, interesting. I'm destroyed.
>
>> Usage has to be something like:
>>
>> map!((n) {n.writeln; return n;})
>>
>> which is quite clunky. The idea of a "tee" range, that takes 
>> n, runs an
>> operation on it, and then returns said n as is becomes really 
>> very
>> useful (and more idiomatic). [1, 2, 3, 4].tee!(n => 
>> n.writeln). There!
>> perfect :)
>>
>> I've dabbled in implementing such a function, but there are 
>> conceptual
>> problems: If the user calls "front" twice in a row, then 
>> should "fun" be
>> called twice? If user popsFront without calling front, should 
>> "fun" be
>> called at all?
>>
>> Should it keep track of calls, to guarantee 1, and only 1, 
>> call on each
>> element?
>>
>> I'm not sure there is a correct answer to that, which is one 
>> of the
>> reasons I haven't actually submitted anything.
>
> I think there is one answer that arguably narrows the design 
> space appropriately: just like the Unix utility, tee should 
> provide a hook that creates an exact replica of the (portion of 
> the) range being iterated. So calling front several times is 
> nicely out of the picture. The remaining tactical options are:
>
> 1. evaluate .front for the parent range once in its constructor 
> and then every time right after forwarding popFront() to the 
> parent range. This is a bit "eager" because the constructor 
> evaluates .front even if the client never does.
>
> 2. evaluate .front for the parent range just before forwarding 
> popFront() to parent. This will call front even though the 
> client doesn't (which I think is fine).
>
> 3. keep a bool that is set by constructor and popFront() and 
> reset by front(). The bool makes sure front() is called if and 
> only if the client calls it.
>
> I started writing the options mechanically without thinking of 
> the implications. Now that I'm done, I think 2 is by far the 
> best.

I think I just had a good idea. First, we introduce "cached": 
cached will take the result of front, but only evaluate it once. 
This is a good idea in and out of itself, and should take the 
place of ".array()" in UFCS chains. It can store the result of an 
operation, but keeps the lazy iteration semantic. That's a win 
for functional programming right there.

It would be most convenient right after an expansive call, such 
as after a map or whatnot.

The semantic of "cached" would be:
"eagerly calls front once, always once, and exactly once, and 
stores the result. Calling front on cached returns said result. 
calling popFront repeats operation".

From there, "tee", is nothing more than "calls funs on the front 
element every time front is called, then returns front".

From there, users can user either of:

MyRange.tee!foo(): This calls foo on every front element, and 
several times is front gets called several times.
MyRange.tee!foo().cached(): This calls foo on every front 
element, but only once, and guaranteed at least once, if it gets 
iterated.

>> --------
>>
>> I don't think "argument-less reduce" should do what you 
>> describe, as it
>> would be a bit confusing what the function does. 1-names; 
>> 1-operation,
>> IMO. Users might accidentally think they are getting an 
>> additive
>> reduction :(
>
> Good point.
>
>> I think a function called "walk", in line with "walkLength", 
>> would be
>> much more appropriate, and make more sense to boot!
>>
>> But we run into the same problem... Should "walk" call front 
>> between
>> each element? Both answers are correct, IMO.
>
> That's why I'm thinking: the moment .front gets evaluated, we 
> get into the realm of reduce.

Combined with my "cached" proposal, the problem is solved I 
think: "walk" does not call front, it merely pops. But, if 
combined with cached, then cache *will*, call front. Once and 
exactly once.

This will call foo on all elements of my range (once exactly 
once):

MyRange.tee!foo().cached().walk();

--------

Unless I'm missing something, it looks like a sweet spot between 
functionality, modularity, and even efficiency...?
June 02, 2013
Re: A Small Contribution to Phobos
On Sunday, 2 June 2013 at 18:43:44 UTC, monarch_dodra wrote:
> On Sunday, 2 June 2013 at 16:55:23 UTC, Andrei Alexandrescu 
> wrote:
>> On 6/2/13 11:41 AM, monarch_dodra wrote:
>>> On Sunday, 2 June 2013 at 13:07:18 UTC, Andrei Alexandrescu 
>>> wrote:
>>>> [1, 2, 3, 4].map!(n => n.writeln).reduce;
>>>>
>>>>
>>>> Andrei
>>>
>>> One of the problems with using "map" for something such as 
>>> this, is that
>>> the resulting object is not a range, since "front" now 
>>> returns void, and
>>> a range *must* return a value. So that code will never 
>>> compile (since
>>> reduce will ask for at least input range). Heck, I think we 
>>> should make
>>> it so that map refuses to compile with an operator that 
>>> returns void. It
>>> doesn't make much sense as-is.
>>
>> Hm, interesting. I'm destroyed.
>>
>>> Usage has to be something like:
>>>
>>> map!((n) {n.writeln; return n;})
>>>
>>> which is quite clunky. The idea of a "tee" range, that takes 
>>> n, runs an
>>> operation on it, and then returns said n as is becomes really 
>>> very
>>> useful (and more idiomatic). [1, 2, 3, 4].tee!(n => 
>>> n.writeln). There!
>>> perfect :)
>>>
>>> I've dabbled in implementing such a function, but there are 
>>> conceptual
>>> problems: If the user calls "front" twice in a row, then 
>>> should "fun" be
>>> called twice? If user popsFront without calling front, should 
>>> "fun" be
>>> called at all?
>>>
>>> Should it keep track of calls, to guarantee 1, and only 1, 
>>> call on each
>>> element?
>>>
>>> I'm not sure there is a correct answer to that, which is one 
>>> of the
>>> reasons I haven't actually submitted anything.
>>
>> I think there is one answer that arguably narrows the design 
>> space appropriately: just like the Unix utility, tee should 
>> provide a hook that creates an exact replica of the (portion 
>> of the) range being iterated. So calling front several times 
>> is nicely out of the picture. The remaining tactical options 
>> are:
>>
>> 1. evaluate .front for the parent range once in its 
>> constructor and then every time right after forwarding 
>> popFront() to the parent range. This is a bit "eager" because 
>> the constructor evaluates .front even if the client never does.
>>
>> 2. evaluate .front for the parent range just before forwarding 
>> popFront() to parent. This will call front even though the 
>> client doesn't (which I think is fine).
>>
>> 3. keep a bool that is set by constructor and popFront() and 
>> reset by front(). The bool makes sure front() is called if and 
>> only if the client calls it.
>>
>> I started writing the options mechanically without thinking of 
>> the implications. Now that I'm done, I think 2 is by far the 
>> best.
>
> I think I just had a good idea. First, we introduce "cached": 
> cached will take the result of front, but only evaluate it 
> once. This is a good idea in and out of itself, and should take 
> the place of ".array()" in UFCS chains. It can store the result 
> of an operation, but keeps the lazy iteration semantic. That's 
> a win for functional programming right there.
>
> It would be most convenient right after an expansive call, such 
> as after a map or whatnot.
>
> The semantic of "cached" would be:
> "eagerly calls front once, always once, and exactly once, and 
> stores the result. Calling front on cached returns said result. 
> calling popFront repeats operation".
>
> From there, "tee", is nothing more than "calls funs on the 
> front element every time front is called, then returns front".
>
> From there, users can user either of:
>
> MyRange.tee!foo(): This calls foo on every front element, and 
> several times is front gets called several times.
> MyRange.tee!foo().cached(): This calls foo on every front 
> element, but only once, and guaranteed at least once, if it 
> gets iterated.
>
>>> --------
>>>
>>> I don't think "argument-less reduce" should do what you 
>>> describe, as it
>>> would be a bit confusing what the function does. 1-names; 
>>> 1-operation,
>>> IMO. Users might accidentally think they are getting an 
>>> additive
>>> reduction :(
>>
>> Good point.
>>
>>> I think a function called "walk", in line with "walkLength", 
>>> would be
>>> much more appropriate, and make more sense to boot!
>>>
>>> But we run into the same problem... Should "walk" call front 
>>> between
>>> each element? Both answers are correct, IMO.
>>
>> That's why I'm thinking: the moment .front gets evaluated, we 
>> get into the realm of reduce.
>
> Combined with my "cached" proposal, the problem is solved I 
> think: "walk" does not call front, it merely pops. But, if 
> combined with cached, then cache *will*, call front. Once and 
> exactly once.
>
> This will call foo on all elements of my range (once exactly 
> once):
>
> MyRange.tee!foo().cached().walk();
>
> --------
>
> Unless I'm missing something, it looks like a sweet spot 
> between functionality, modularity, and even efficiency...?

I like the idea of "cached" and it's certainly useful if you need 
to iterate a range multiple times or something like that, but I 
also think that 90% of the time the user is just going to want to 
do something simple such as printing every element, and I think 
the syntax "tee!(x => writeln(x)).cached.walk();" is both 
unnecessarily long and less efficient than simply:
consume!(x => writeln(x)); // Template parameter is optional

"consume" would always call front once per element even if no 
function is specified.
June 03, 2013
Re: A Small Contribution to Phobos
On 6/2/13 2:43 PM, monarch_dodra wrote:
> I think I just had a good idea. First, we introduce "cached": cached
> will take the result of front, but only evaluate it once. This is a good
> idea in and out of itself, and should take the place of ".array()" in
> UFCS chains.

Yah, cached() (better cache()?) should be nice. It may also offer 
lookahead, e.g. cache(5) would offer a non-standard lookahead(size_t n) 
up to 5 elements ahead.

>  From there, "tee", is nothing more than "calls funs on the front
> element every time front is called, then returns front".
>
>  From there, users can user either of:
>
> MyRange.tee!foo(): This calls foo on every front element, and several
> times is front gets called several times.
> MyRange.tee!foo().cached(): This calls foo on every front element, but
> only once, and guaranteed at least once, if it gets iterated.

I kinda dislike that tee() is hardly useful without cache.


Andrei
June 03, 2013
Re: A Small Contribution to Phobos
On Monday, 3 June 2013 at 02:31:00 UTC, Andrei Alexandrescu wrote:
> On 6/2/13 2:43 PM, monarch_dodra wrote:
>> I think I just had a good idea. First, we introduce "cached": 
>> cached
>> will take the result of front, but only evaluate it once. This 
>> is a good
>> idea in and out of itself, and should take the place of 
>> ".array()" in
>> UFCS chains.
>
> Yah, cached() (better cache()?) should be nice. It may also 
> offer lookahead, e.g. cache(5) would offer a non-standard 
> lookahead(size_t n) up to 5 elements ahead.

Hum... That'd be a whole different ballpark in terms of power, as 
opposed to the simple minded cached I had in mind.

But I think both can coexist anyway, so I see no problem with 
adding extra functionality.

>> From there, "tee", is nothing more than "calls funs on the 
>> front
>> element every time front is called, then returns front".
>>
>> From there, users can user either of:
>>
>> MyRange.tee!foo(): This calls foo on every front element, and 
>> several
>> times is front gets called several times.
>> MyRange.tee!foo().cached(): This calls foo on every front 
>> element, but
>> only once, and guaranteed at least once, if it gets iterated.
>
> I kinda dislike that tee() is hardly useful without cache.
>
>
> Andrei

I disagree. One thing a user could expect out of tee is to print 
on every access, just to see "which elements get pushed down the 
pipe, and in which order", as opposed to "just print my range". 
In particular, I don't see why tee would not mix with random 
access.

For example, with this program:

    auto r = [4, 3, 2, 1].tee!writeln();
    writeln("first sort (not sorted)");
    r.sort();
    writeln("second sort (already sorted)");
    r.sort();

I can see the output as:

first sort (not sorted)
2
1
1
2
1
3
1
1
3
2
2
1
2
4
1
1
4
2
2
1
3
3
2
3
2
1
3
2
4
3
second sort (already sorted)
3
4
3
2
3
2
1
2
1
2
1
3
2
4
3

which gives me a good idea of how costly the sort algorithm is.

It's a good way to find out if cache(d) or array should be 
inserted in my chain.
June 16, 2013
Re: A Small Contribution to Phobos
On Monday, 3 June 2013 at 06:58:00 UTC, monarch_dodra wrote:
> I disagree. One thing a user could expect out of tee is to

Hello.

I have created a pull request for tee: 
https://github.com/D-Programming-Language/phobos/pull/1348

(I did this based on discussion in Issue 9882, and was not aware 
of this present forum thread until monarch_dodra asked me to 
present my approach here).

My approach is to create an InputRange wrapper called TeeRange, 
which will call the user provided function with each element of 
the wrapped range during iteration.

One concept with this is that the user can pass a flag to specify 
whether the function should be called on popFront (default) or on 
front.  The following unittest from my change illustrates that 
distinction:

---
unittest
{
    // Manually stride to test different pipe behavior.
    void testRange(Range)(Range r)
    {
        const int strideLen = 3;
        int i = 0;
        typeof(Range.front) elem;
        while (!r.empty)
        {
            if (i % strideLen == 0)
            {
                elem = r.front();
            }
            r.popFront();
            i++;
        }
    }

    string txt = "abcdefghijklmnopqrstuvwxyz";

    int popCount = 0;
    auto pipeOnPop = tee!(a => popCount++)(txt);
    testRange(pipeOnPop);
    assert(popCount == 26);

    int frontCount = 0;
    auto pipeOnFront = tee!(a => frontCount++, false)(txt);
    testRange(pipeOnFront);
    assert(frontCount == 9);
}
---

Thanks,
irritate
June 16, 2013
Re: A Small Contribution to Phobos
On Sunday, 16 June 2013 at 13:39:35 UTC, irritate wrote:
>
> [SNIP]
>
> One concept with this is that the user can pass a flag to 
> specify whether the function should be called on popFront 
> (default) or on front.
>
> [SNIP]
>
> Thanks,
> irritate

What made you change the parameter of :
* "pipeOnPop = false" (eg call on front by default)
to
* "pipeOnFront = false" (eg call on pop by default)
?

I think pipe on front makes more sense, since you'll actually 
*see* the last value that was passed if the stream is terminated, 
eg:
[1, 2, 3, 4].tee!`writeln("processing: ", a).until!"a > 2"();

Which will output:
processing: 1
processing: 2
<end>
what about 3?

Or

processing something A...
processing something B...
core dump...
(stream was actually processing C, but we are fooled into 
investigating B...)

The *advantage* of pipeOnPop is that each element is piped at 
least once, and at most once, so that's good. However, it comes 
with pitfalls which (IMO) I think should be an explicit opt-in.
June 17, 2013
Re: A Small Contribution to Phobos
On Sunday, 16 June 2013 at 17:37:32 UTC, monarch_dodra wrote:
>
> What made you change the parameter of :
> * "pipeOnPop = false" (eg call on front by default)
> to
> * "pipeOnFront = false" (eg call on pop by default)
> ?

Actually the original version was pipeOnPop = true by default.  
So I didn't change the logic, I just renamed the variable to make 
the flag more clear that you would pass in pipeOnFront.yes to 
opt-in. (Also to coincide with your comment on the pull request).

> I think pipe on front makes more sense, since you'll actually 
> *see* the last value that was passed if the stream is 
> terminated, eg:
> [1, 2, 3, 4].tee!`writeln("processing: ", a).until!"a > 2"();
>
> Which will output:
> processing: 1
> processing: 2
> <end>
> what about 3?
. . .
> The *advantage* of pipeOnPop is that each element is piped at 
> least once, and at most once, so that's good.

I think they both have their advantages, which is why it's 
probably important to be able to control the behavior regardless 
of which one is default.  I choose pipeOnPop as the default 
because:

1)  It more closely tied in to the idea of tapping into the data 
as the wrapped range is iterated over (i.e. calling front 
multiple times won't call the function multiple times, as you 
said).
2)  I felt like I would personally use pipeOnPop more often, and 
figured the most commonly used case should not require the flag.

But I'm not especially tied to it, and could see making 
pipeOnFront default if that is preferred.

And actually, as I think about what I just wrote and also my 
previous unittest example, it almost feels like pipeOnPop gives 
you insight into the wrapped range itself, and pipeOnFront gives 
you more insight into how the range is used.  The incoming vs. 
the outgoing, as it were.  And I imagine I'd like to know more 
about what is coming in more of the time, but that's just my 
opinion.

irritate
1 2 3
Top | Discussion index | About this forum | D home