output ranges: by ref or by value? (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » output ranges: by ref or by value? (page 2)

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Andrei Alexandrescu
in reply to Jason House

Andrei Alexandrescu

Posted in reply to Jason House

Jason House wrote:
> Andrei Alexandrescu Wrote:
> 
>> Jason House wrote:
>>> Andrei Alexandrescu wrote:
>>> 
>>>> Philippe Sigaud wrote:
>>>>> On Thu, Dec 31, 2009 at 16:47, Michel Fortin
>>>>> <michel.fortin@michelf.com <mailto:michel.fortin@michelf.com>> wrote:
>>>>> 
>>>>> On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org <mailto:SeeWebsiteForEmail@erdani.org>> said:
>>>>> 
>>>>> The question of this post is the following: should output
>>>>> ranges be passed by value or by reference? ArrayAppender uses
>>>>> an extra indirection to work properly when passed by value.
>>>>> But if we want to model built-in arrays' operator ~=, we'd
>>>>> need to request that all output ranges be passed by
>>>>> reference.
>>>>> 
>>>>> 
>>>>> I think modeling built-in arrays is the way to go as it makes
>>>>> less things to learn. In fact, it makes it easier to learn
>>>>> ranges because you can begin by learning arrays, then
>>>>> transpose this knowledge to ranges which are more abstract
>>>>> and harder to grasp.
>>>>> 
>>>>> 
>>>>> I agree. And arrays may well be the most used range anyway.
>>>> Upon more thinking, I'm leaning the other way. ~= is a quirk of
>>>> arrays motivated by practical necessity. I don't want to
>>>> propagate that quirk into ranges. The best output range is one
>>>> that works properly when passed by value.
>>> I worry about a growing level of convention with ranges.  Another
>>> recent range thread discussed the need to call consume after a
>>> successful call to startsWith.  If I violated convention and had
>>> a range class, things would fail miserably.  There would be no
>>> need to consume after a successful call to startsWith and the
>>> range would have a random number of elements removed on an
>>> unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs
>>> and classes, but in practice that isn't the case.
>> I am implementing right now a change in the range interface
>> mentioned in http://www.informit.com/articles/printerfriendly.aspx?p=1407357,
>> namely: add a function save() that saves the iteration state of a
>> range.
>> 
>> With save() in tow, class ranges and struct ranges can be used the
>> same way. True, if someone forgets to say
>> 
>> auto copy = r.save();
>> 
>> and instead says:
>> 
>> auto copy = r;
>> 
>> the behavior will indeed be different for class ranges and struct
>> ranges.
> 
> Or if they completely forgot that bit of convention and omit creating
> a variable called save... Also, doesn't use of save degrade
> performance for structs? Or does the inliner/optimizer remove the
> copy variable altogether?

It may be best to discuss this on an example:

/**
If $(D startsWith(r1, r2)), consume the corresponding elements off $(D
r1) and return $(D true). Otherwise, leave $(D r1) unchanged and
return $(D false).
 */
bool consume(R1, R2)(ref R1 r1, R2 r2)
        if (isForwardRange!R1 && isInputRange!R2)
{
    auto r = r1.save();
    while (!r2.empty && !r.empty && r.front == r2.front) {
        r.popFront();
        r2.popFront();
    }
    if (r2.empty) {
        r1 = r;
        return true;
    }
    return false;
}

For most structs, save() is very simple:

auto save() { return this; }

For classes, save() entails creating a new object:

auto save() { return new typeof(this)(this); }

If the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that.

Anyway, it's not entirely a convention. I'll change isForwardRange to require the existence of save().


Andrei

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Michel Fortin
in reply to Andrei Alexandrescu

Michel Fortin

Posted in reply to Andrei Alexandrescu

On 2010-01-01 15:53:42 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> save() is not only for classes. It also distinguishes input ranges from forward ranges. It's the primitive that STL didn't define but should have.

Right. I still maintain that it's a bad approach. I've written a lot of algorithms in C++ that worked with iterators, always assuming assignment would copy the state. Fortunately I didn't had to use input iterators with them, most of the time. But I did once or twice, and the thing was working slightly off.

What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Andrei Alexandrescu
in reply to Michel Fortin

Andrei Alexandrescu

Posted in reply to Michel Fortin

Michel Fortin wrote:
> On 2010-01-01 15:53:42 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
> 
>> save() is not only for classes. It also distinguishes input ranges from forward ranges. It's the primitive that STL didn't define but should have.
> 
> Right. I still maintain that it's a bad approach. I've written a lot of algorithms in C++ that worked with iterators, always assuming assignment would copy the state. Fortunately I didn't had to use input iterators with them, most of the time. But I did once or twice, and the thing was working slightly off.
> 
> What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.

I tried that, it makes input ranges next to unusable. save() is an imperfect but workable solution.

Andrei

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Rainer Deyke
in reply to Andrei Alexandrescu

Rainer Deyke

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> If the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that.

Require that all ranges are structs.  If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function.  The wrapper struct can be made generic and placed in the standard library.

Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.


-- 
Rainer Deyke - rainerd@eldwood.com

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Andrei Alexandrescu
in reply to Rainer Deyke

Andrei Alexandrescu

Posted in reply to Rainer Deyke

Rainer Deyke wrote:
> Andrei Alexandrescu wrote:
>> If the implementor of consume() forgets to call save(), the situation is
>> unpleasant albeit not catastrophic: for most struct ranges things will
>> continue to work, but for class ranges the function will fail to perform
>> to spec. I don't know how to improve on that.
> 
> Require that all ranges are structs.  If you want to implement a range
> as a class, use a wrapper struct that creates a new object in its
> postblit function.  The wrapper struct can be made generic and placed in
> the standard library.

That's a good idea, but it doesn't work with covariant return types. Those are needed for the container hierarchy that I'm working on.


Andrei

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Andrei Alexandrescu
in reply to Rainer Deyke

Andrei Alexandrescu

Posted in reply to Rainer Deyke

Rainer Deyke wrote:
> Andrei Alexandrescu wrote:
>> If the implementor of consume() forgets to call save(), the situation is
>> unpleasant albeit not catastrophic: for most struct ranges things will
>> continue to work, but for class ranges the function will fail to perform
>> to spec. I don't know how to improve on that.
> 
> Require that all ranges are structs.  If you want to implement a range
> as a class, use a wrapper struct that creates a new object in its
> postblit function.  The wrapper struct can be made generic and placed in
> the standard library.
> 
> Same performance as the current approach, slightly more effort on the
> part of the range implementor, much easier and less error-prone on the
> side of the range user.

Oh, besides it doesn't work for struct ranges that iterate one-pass streams.

Andrei

January 02, 2010

Re: output ranges: by ref or by value?

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Rainer Deyke wrote:
>> Andrei Alexandrescu wrote:
>>> If the implementor of consume() forgets to call save(), the situation is
>>> unpleasant albeit not catastrophic: for most struct ranges things will
>>> continue to work, but for class ranges the function will fail to perform
>>> to spec. I don't know how to improve on that.
>>  Require that all ranges are structs.  If you want to implement a range
>> as a class, use a wrapper struct that creates a new object in its
>> postblit function.  The wrapper struct can be made generic and placed in
>> the standard library.
>>  Same performance as the current approach, slightly more effort on the
>> part of the range implementor, much easier and less error-prone on the
>> side of the range user.
>
> Oh, besides it doesn't work for struct ranges that iterate one-pass streams.

What does save do in those cases?

-Steve

January 02, 2010

Re: output ranges: by ref or by value?

Posted by Michel Fortin
in reply to Andrei Alexandrescu

Michel Fortin

Posted in reply to Andrei Alexandrescu

On 2010-01-01 17:54:12 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Michel Fortin wrote:
>> What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.
> 
> I tried that, it makes input ranges next to unusable.

I think I can see why. You can't have ref member and local variables like in C++, so it's pretty hard to use references.

> save() is an imperfect but workable solution.

save() is an workable but error-prone solution.

Perhaps we could mitigate this by making people more aware of the difference instead. Couldn't we rename "input range" for "input stream"?

Currently you have ranges that behave one way and ranges that behave the other way, which is confusing. Having a different name for both would emphasize there is a difference. With different names, you're guarantied to get the "what's the difference?" question from newbies.

And it's simple to explain: "You can often use ranges and streams interchangeably, but for that to work you must use save() when you need a copy of the current state. Also, not all streams support save(). It's good practice to always use save() so that algorithms work for both for ranges and streams."

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

January 02, 2010

Re: output ranges: by ref or by value?

Posted by Andrei Alexandrescu
in reply to Michel Fortin

Andrei Alexandrescu

Posted in reply to Michel Fortin

Michel Fortin wrote:
> On 2010-01-01 17:54:12 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
> 
>> Michel Fortin wrote:
>>> What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.
>>
>> I tried that, it makes input ranges next to unusable.
> 
> I think I can see why. You can't have ref member and local variables like in C++, so it's pretty hard to use references.
> 
> 
>> save() is an imperfect but workable solution.
> 
> save() is an workable but error-prone solution.
> 
> Perhaps we could mitigate this by making people more aware of the difference instead. Couldn't we rename "input range" for "input stream"?
> 
> Currently you have ranges that behave one way and ranges that behave the other way, which is confusing. Having a different name for both would emphasize there is a difference. With different names, you're guarantied to get the "what's the difference?" question from newbies.
> 
> And it's simple to explain: "You can often use ranges and streams interchangeably, but for that to work you must use save() when you need a copy of the current state. Also, not all streams support save(). It's good practice to always use save() so that algorithms work for both for ranges and streams."

That's an idea, and names are powerful, but I think it's reasonable to not expect miracles from that name change. It has disadvantages too - "input range" vs. "forward range"  clarifies there's a conceptual relationship between the two, whereas "streams" are different from anything else.

Andrei

January 02, 2010

Re: output ranges: by ref or by value?

Posted by Michel Fortin
in reply to Andrei Alexandrescu

Michel Fortin

Posted in reply to Andrei Alexandrescu

On 2010-01-02 09:59:51 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> That's an idea, and names are powerful, but I think it's reasonable to not expect miracles from that name change. It has disadvantages too - "input range" vs. "forward range"  clarifies there's a conceptual relationship between the two, whereas "streams" are different from anything else.

I'm not expecting a miracle from it either, it'd just be much less confusing.

You could say that assignment of an input stream might or might not save its state (depending on the stream type) so you must call save() to save the state when working with streams, but ranges are guarantied to save their state on assignment, thus behaving more predictably and just like arrays. So if you're working only with ranges, not streams, you never need to worry about save().

A similar option would be to have both input ranges and input streams:

* input range:  by value semantics, no need for save()
* input stream: by reference semantics

A pointer to an input range would thus automatically qualify as an input stream, so it's easy to give an input range to a function taking an input stream. Well, except for stack-allocated ranges in SafeD for which you can't create a pointer. This pretty much break the idea, I think.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation