Overhauling the notion of output range (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Overhauling the notion of output range (page 2)

July 12, 2010

Re: Overhauling the notion of output range

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

On Mon, 12 Jul 2010 11:05:33 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 07/12/2010 09:59 AM, Steven Schveighoffer wrote:
>> If I always have to do something like this in order to append a single
>> element:
>>
>> put(r, (&elem)[0..1]);
>
> No, the library does that. Look here:
>
> http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d#L306

So you're saying it's not ok for an output range to support appending a single element, but it's ok for put to support appending a single element?

Well, all that will end up happening is cases where appending a single element is the only possibility will produce overloaded add functions, one that takes a single element, and one that takes an array.  The one that takes an array will look like this:

foreach(e; arr)
   add(e);

I can tell you this for sure, because it's exactly what's in many dcollections classes.

So what happens when you call put(r, e) for one of these output classes?  Instead of just calling add(e), it calls (add((&e)[0..1])) which in turn goes through some needless loop, which then ends up calling add(e).  I don't see why this is preferable.

-Steve

July 12, 2010

Re: Overhauling the notion of output range

Posted by torhu
in reply to Steven Schveighoffer

torhu

Posted in reply to Steven Schveighoffer

On 12.07.2010 18:48, Steven Schveighoffer wrote:
[...]
> So what happens when you call put(r, e) for one of these output
> classes? Instead of just calling add(e), it calls (add((&e)[0..1]))
> which in turn goes through some needless loop, which then ends up
> calling add(e).  I don't see why this is preferable.

put(r, e) prefers to call r.put(e) for single element adds.  Doesn't that take care of it?

July 12, 2010

Re: Overhauling the notion of output range

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

On 07/12/2010 11:48 AM, Steven Schveighoffer wrote:
> On Mon, 12 Jul 2010 11:05:33 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> On 07/12/2010 09:59 AM, Steven Schveighoffer wrote:
>>> If I always have to do something like this in order to append a single
>>> element:
>>>
>>> put(r, (&elem)[0..1]);
>>
>> No, the library does that. Look here:
>>
>> http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d#L306
>>
>
> So you're saying it's not ok for an output range to support appending a
> single element, but it's ok for put to support appending a single element?

Yes. The point is that with a delegate you must choose between accepting E and E[]. Given the constraint, it's better for everyone to accept E[] and let put() take care of the occasional E by doing the wraparoo (&elem)[0..1].

> Well, all that will end up happening is cases where appending a single
> element is the only possibility will produce overloaded add functions,
> one that takes a single element, and one that takes an array. The one
> that takes an array will look like this:
>
> foreach(e; arr)
> add(e);

That's fine. the point is that if you put this loop on the wrong side of the delegate, it's much less efficient.

> I can tell you this for sure, because it's exactly what's in many
> dcollections classes.
>
> So what happens when you call put(r, e) for one of these output classes?
> Instead of just calling add(e), it calls (add((&e)[0..1])) which in turn
> goes through some needless loop, which then ends up calling add(e). I
> don't see why this is preferable.

Ah, I see. There is a confusion. The array restriction is only for delegates. For straight ranges, you should accept individual Es. Let me copy the current definition of isOutputRange:

template isOutputRange(R, E)
{
    enum bool isOutputRange = is(typeof(
    {
        R r;
        E e;
        r.put(e);          // can write element to range
    }()))
        ||
    isInputRange!R && is(typeof(
    {
        R r;
        E e;
        r.front = e;       // can assign to the front of range
    }()))
        ||
        is(typeof(
    {
        R r;
        E[] es;
        r(es);
    }()));
}

Notice how the E[] requirement is for delegates only.


Andrei

July 12, 2010

Re: Overhauling the notion of output range

Posted by Steven Schveighoffer
in reply to torhu

Steven Schveighoffer

Posted in reply to torhu

On Mon, 12 Jul 2010 13:43:18 -0400, torhu <no@spam.invalid> wrote:

> On 12.07.2010 18:48, Steven Schveighoffer wrote:
> [...]
>> So what happens when you call put(r, e) for one of these output
>> classes? Instead of just calling add(e), it calls (add((&e)[0..1]))
>> which in turn goes through some needless loop, which then ends up
>> calling add(e).  I don't see why this is preferable.
>
> put(r, e) prefers to call r.put(e) for single element adds.  Doesn't that take care of it?

No, dcollections doesn't define put.  But you could take a delegate to add.  The question is, why the prejudice against the single element version when using a delegate as an output range?

The fact that an output range can define a put function that takes a single element, but you can't use a delegate just makes no sense to me.

-Steve

July 12, 2010

Re: Overhauling the notion of output range

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

On 07/12/2010 12:55 PM, Steven Schveighoffer wrote:
> On Mon, 12 Jul 2010 13:43:18 -0400, torhu <no@spam.invalid> wrote:
>
>> On 12.07.2010 18:48, Steven Schveighoffer wrote:
>> [...]
>>> So what happens when you call put(r, e) for one of these output
>>> classes? Instead of just calling add(e), it calls (add((&e)[0..1]))
>>> which in turn goes through some needless loop, which then ends up
>>> calling add(e). I don't see why this is preferable.
>>
>> put(r, e) prefers to call r.put(e) for single element adds. Doesn't
>> that take care of it?
>
> No, dcollections doesn't define put. But you could take a delegate to
> add. The question is, why the prejudice against the single element
> version when using a delegate as an output range?
>
> The fact that an output range can define a put function that takes a
> single element, but you can't use a delegate just makes no sense to me.
>
> -Steve

Efficiency?

Andrei

July 12, 2010

Re: Overhauling the notion of output range

Posted by dsimcha
in reply to Andrei Alexandrescu

dsimcha

Posted in reply to Andrei Alexandrescu

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> On 07/12/2010 12:55 PM, Steven Schveighoffer wrote:
> > On Mon, 12 Jul 2010 13:43:18 -0400, torhu <no@spam.invalid> wrote:
> >
> >> On 12.07.2010 18:48, Steven Schveighoffer wrote:
> >> [...]
> >>> So what happens when you call put(r, e) for one of these output
> >>> classes? Instead of just calling add(e), it calls (add((&e)[0..1]))
> >>> which in turn goes through some needless loop, which then ends up
> >>> calling add(e). I don't see why this is preferable.
> >>
> >> put(r, e) prefers to call r.put(e) for single element adds. Doesn't
> >> that take care of it?
> >
> > No, dcollections doesn't define put. But you could take a delegate to add. The question is, why the prejudice against the single element version when using a delegate as an output range?
> >
> > The fact that an output range can define a put function that takes a single element, but you can't use a delegate just makes no sense to me.
> >
> > -Steve
> Efficiency?
> Andrei

But efficiency only matters some of the time, assuming the kind of inefficiency in question is small constant term inefficiency (as is the case here) and not O(N!) inefficiency or 1000-fold constant term inefficiency.

Calling a delegate once per element is inefficient, but not **that** inefficient, and it's convenient at times, such as when the delegate will be used in contexts other than as an output range.  I think inefficiency is a poor excuse for not supporting this, and the users of Phobos should decide on a case-by-case basis whether calling a delegate for every element is efficient enough.

The following benchmark takes a whopping 830 milliseconds on my computer, for a **hundred million** iterations:

import std.stdio, std.perf;

void main() {
    void foo() {}

    auto pc = new PerformanceCounter;
    pc.start;

    auto del = &foo;
    foreach(i; 0..100_000_000) {
        del();
    }

    pc.stop;
    writeln(pc.milliseconds);
}

July 12, 2010

Re: Overhauling the notion of output range

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

On Mon, 12 Jul 2010 13:49:50 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 07/12/2010 11:48 AM, Steven Schveighoffer wrote:
>> On Mon, 12 Jul 2010 11:05:33 -0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> On 07/12/2010 09:59 AM, Steven Schveighoffer wrote:
>>>> If I always have to do something like this in order to append a single
>>>> element:
>>>>
>>>> put(r, (&elem)[0..1]);
>>>
>>> No, the library does that. Look here:
>>>
>>> http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d#L306
>>>
>>
>> So you're saying it's not ok for an output range to support appending a
>> single element, but it's ok for put to support appending a single element?
>
> Yes. The point is that with a delegate you must choose between accepting E and E[]. Given the constraint, it's better for everyone to accept E[] and let put() take care of the occasional E by doing the wraparoo (&elem)[0..1].

But given a delegate that takes a single element, there's no way to wrap it so it can be an output range.  Yet such a delegate can easily be something that outputs something.

Indeed, a delegate that takes a string takes a single element, but because a string happens to be defined as a range of chars, it passes the test for output ranges.

I could loop on an array of strings of one character, and output that to a valid output range no problem.  The only thing that solves this problem correctly is buffering.

What if I have my own container types that are large chunks of data, but don't happen to define the input range primitives?  Why should I be artificially prevented from using those as input to output ranges?

Really to me, you are saying, "I want your delegate to be efficient", but you defined something that is related to that in a small set of circumstances (when the arrays being passed in are large).

>> Well, all that will end up happening is cases where appending a single
>> element is the only possibility will produce overloaded add functions,
>> one that takes a single element, and one that takes an array. The one
>> that takes an array will look like this:
>>
>> foreach(e; arr)
>> add(e);
>
> That's fine. the point is that if you put this loop on the wrong side of the delegate, it's much less efficient.

Here's a proposal for put/isOutputRange which would solve my problem and not have any for loops in it:


template isOutputRange(R, E)
{
    enum bool isOutputRange = is(typeof(
    {
        R r;
        E e;
        r.put(e);          // can write element to range
    }()))
        ||
    isInputRange!R && is(typeof(
    {
        R r;
        E e;
        r.front = e;       // can assign to the front of range
    }()))
        ||
        is(typeof(
    {
        R r;
        E[] es;
        r(es);
    }()))
        ||
        is(typeof(
    {
        R r;
        E es;
        r(es);
    }()));
}

void put(R, E)(ref R r, E e) if (isOutputRange!(R, E))
{
    static if (!isArray!R && is(typeof(r.put(e))))
    {
        r.put(e);
    }
    else static if (isInputRange!R && is(typeof(r.front = e)))
    {
        r.front = e;
        r.popFront();
    }
    else static if (is(typeof(r(e))))
    {
        r(e);
    }
    else
    {
        static assert(false);
    }
}

>
>> I can tell you this for sure, because it's exactly what's in many
>> dcollections classes.
>>
>> So what happens when you call put(r, e) for one of these output classes?
>> Instead of just calling add(e), it calls (add((&e)[0..1])) which in turn
>> goes through some needless loop, which then ends up calling add(e). I
>> don't see why this is preferable.
>
> Ah, I see. There is a confusion. The array restriction is only for delegates. For straight ranges, you should accept individual Es.

Why the discrepency?  A naive coder can define inefficient ranges just as well as he can define inefficient delegates.

-Steve

July 12, 2010

Re: Overhauling the notion of output range

Posted by Christian Kamm
in reply to Andrei Alexandrescu

Christian Kamm

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> Yes. The point is that with a delegate you must choose between accepting E and E[]. Given the constraint, it's better for everyone to accept E[] and let put() take care of the occasional E by doing the wraparoo (&elem)[0..1].

I don't think the current implementation of put allows passing E and E[] correctly:

void put(R, E)(ref R r, E e) if (isOutputRange!(R, E))
{
  if (isArray!E &&  is(typeof(r(e))))
  {
    r(e);
  }
  else static if (is(typeof(r(new E[]))))
  {
    r((&e)[0 .. 1]);
  }
  else
  {
     static assert(false);
  }
}

Example: typeof(fn) == void function(int[][]).
put(fn, int[]) -> ok
put(fn, int[][]) -> no match, isOutputRange!(fn, int[][]) fails.

You'll need a
void put(R, E)(ref R r, E[] a) if (isOutputRange!(R, E))
overload get the desired behavior.

Implementing that one generically also makes explicit what Steve was saying: front-assignable ranges and classic output ranges are stuck using put with a single element in a loop. Associating different performance tradeoffs with abstractions that should be interchangeable sounds odd.

In the end it comes down to the question why R.put(E) and void R(E[]) should both be output ranges for E. They should either take E or E[].

I'd go with void foo(E) being an output range for E and define the overloads
void put(ref R r, T t) if (isOutputRange!(R, T[])) // makes 1-element slice
void put(ref R r, E e) if (isOutputRange!(R, E))
void put(ref R r, A[] a) if (isOutputRange!(R, A)) // uses foreach

Christian

July 12, 2010

Re: Overhauling the notion of output range

Posted by Andrei Alexandrescu
in reply to dsimcha

Andrei Alexandrescu

Posted in reply to dsimcha

On 07/12/2010 01:31 PM, dsimcha wrote:
> But efficiency only matters some of the time, assuming the kind of inefficiency in
> question is small constant term inefficiency (as is the case here) and not O(N!)
> inefficiency or 1000-fold constant term inefficiency.

Historically the constant was large.

> Calling a delegate once per element is inefficient, but not **that** inefficient,
> and it's convenient at times, such as when the delegate will be used in contexts
> other than as an output range.  I think inefficiency is a poor excuse for not
> supporting this, and the users of Phobos should decide on a case-by-case basis
> whether calling a delegate for every element is efficient enough.

I guess that's fine, though I clearly recall that doFormat was, at the time I decided to rewrite it (2008 or so), virtually unusable for serious workloads. I don't know to what extent improvement in the compiler have eroded that margin.


Andrei

July 12, 2010

Re: Overhauling the notion of output range

Posted by dsimcha
in reply to Andrei Alexandrescu

dsimcha

Posted in reply to Andrei Alexandrescu

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> On 07/12/2010 01:31 PM, dsimcha wrote:
> > But efficiency only matters some of the time, assuming the kind of inefficiency in question is small constant term inefficiency (as is the case here) and not O(N!) inefficiency or 1000-fold constant term inefficiency.
> Historically the constant was large.

I'm not saying don't encourage the use of delegates that take ranges where it makes sense, such as for formatting text.  I'm just saying support both and don't force their use even where it doesn't matter.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation