January 01, 2010
Jason House wrote:
> Andrei Alexandrescu Wrote:
> 
>> Jason House wrote:
>>> Andrei Alexandrescu wrote:
>>> 
>>>> Philippe Sigaud wrote:
>>>>> On Thu, Dec 31, 2009 at 16:47, Michel Fortin
>>>>> <michel.fortin@michelf.com <mailto:michel.fortin@michelf.com>> wrote:
>>>>> 
>>>>> On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org <mailto:SeeWebsiteForEmail@erdani.org>> said:
>>>>> 
>>>>> The question of this post is the following: should output
>>>>> ranges be passed by value or by reference? ArrayAppender uses
>>>>> an extra indirection to work properly when passed by value.
>>>>> But if we want to model built-in arrays' operator ~=, we'd
>>>>> need to request that all output ranges be passed by
>>>>> reference.
>>>>> 
>>>>> 
>>>>> I think modeling built-in arrays is the way to go as it makes
>>>>> less things to learn. In fact, it makes it easier to learn
>>>>> ranges because you can begin by learning arrays, then
>>>>> transpose this knowledge to ranges which are more abstract
>>>>> and harder to grasp.
>>>>> 
>>>>> 
>>>>> I agree. And arrays may well be the most used range anyway.
>>>> Upon more thinking, I'm leaning the other way. ~= is a quirk of
>>>> arrays motivated by practical necessity. I don't want to
>>>> propagate that quirk into ranges. The best output range is one
>>>> that works properly when passed by value.
>>> I worry about a growing level of convention with ranges.  Another
>>> recent range thread discussed the need to call consume after a
>>> successful call to startsWith.  If I violated convention and had
>>> a range class, things would fail miserably.  There would be no
>>> need to consume after a successful call to startsWith and the
>>> range would have a random number of elements removed on an
>>> unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs
>>> and classes, but in practice that isn't the case.
>> I am implementing right now a change in the range interface
>> mentioned in http://www.informit.com/articles/printerfriendly.aspx?p=1407357,
>> namely: add a function save() that saves the iteration state of a
>> range.
>> 
>> With save() in tow, class ranges and struct ranges can be used the
>> same way. True, if someone forgets to say
>> 
>> auto copy = r.save();
>> 
>> and instead says:
>> 
>> auto copy = r;
>> 
>> the behavior will indeed be different for class ranges and struct
>> ranges.
> 
> Or if they completely forgot that bit of convention and omit creating
> a variable called save... Also, doesn't use of save degrade
> performance for structs? Or does the inliner/optimizer remove the
> copy variable altogether?

It may be best to discuss this on an example:

/**
If $(D startsWith(r1, r2)), consume the corresponding elements off $(D
r1) and return $(D true). Otherwise, leave $(D r1) unchanged and
return $(D false).
 */
bool consume(R1, R2)(ref R1 r1, R2 r2)
        if (isForwardRange!R1 && isInputRange!R2)
{
    auto r = r1.save();
    while (!r2.empty && !r.empty && r.front == r2.front) {
        r.popFront();
        r2.popFront();
    }
    if (r2.empty) {
        r1 = r;
        return true;
    }
    return false;
}

For most structs, save() is very simple:

auto save() { return this; }

For classes, save() entails creating a new object:

auto save() { return new typeof(this)(this); }

If the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that.

Anyway, it's not entirely a convention. I'll change isForwardRange to require the existence of save().


Andrei
January 01, 2010
On 2010-01-01 15:53:42 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> save() is not only for classes. It also distinguishes input ranges from forward ranges. It's the primitive that STL didn't define but should have.

Right. I still maintain that it's a bad approach. I've written a lot of algorithms in C++ that worked with iterators, always assuming assignment would copy the state. Fortunately I didn't had to use input iterators with them, most of the time. But I did once or twice, and the thing was working slightly off.

What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

January 01, 2010
Michel Fortin wrote:
> On 2010-01-01 15:53:42 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
> 
>> save() is not only for classes. It also distinguishes input ranges from forward ranges. It's the primitive that STL didn't define but should have.
> 
> Right. I still maintain that it's a bad approach. I've written a lot of algorithms in C++ that worked with iterators, always assuming assignment would copy the state. Fortunately I didn't had to use input iterators with them, most of the time. But I did once or twice, and the thing was working slightly off.
> 
> What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.

I tried that, it makes input ranges next to unusable. save() is an imperfect but workable solution.

Andrei
January 01, 2010
Andrei Alexandrescu wrote:
> If the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that.

Require that all ranges are structs.  If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function.  The wrapper struct can be made generic and placed in the standard library.

Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.


-- 
Rainer Deyke - rainerd@eldwood.com
January 01, 2010
Rainer Deyke wrote:
> Andrei Alexandrescu wrote:
>> If the implementor of consume() forgets to call save(), the situation is
>> unpleasant albeit not catastrophic: for most struct ranges things will
>> continue to work, but for class ranges the function will fail to perform
>> to spec. I don't know how to improve on that.
> 
> Require that all ranges are structs.  If you want to implement a range
> as a class, use a wrapper struct that creates a new object in its
> postblit function.  The wrapper struct can be made generic and placed in
> the standard library.

That's a good idea, but it doesn't work with covariant return types. Those are needed for the container hierarchy that I'm working on.


Andrei
January 01, 2010
Rainer Deyke wrote:
> Andrei Alexandrescu wrote:
>> If the implementor of consume() forgets to call save(), the situation is
>> unpleasant albeit not catastrophic: for most struct ranges things will
>> continue to work, but for class ranges the function will fail to perform
>> to spec. I don't know how to improve on that.
> 
> Require that all ranges are structs.  If you want to implement a range
> as a class, use a wrapper struct that creates a new object in its
> postblit function.  The wrapper struct can be made generic and placed in
> the standard library.
> 
> Same performance as the current approach, slightly more effort on the
> part of the range implementor, much easier and less error-prone on the
> side of the range user.

Oh, besides it doesn't work for struct ranges that iterate one-pass streams.

Andrei
January 02, 2010
On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Rainer Deyke wrote:
>> Andrei Alexandrescu wrote:
>>> If the implementor of consume() forgets to call save(), the situation is
>>> unpleasant albeit not catastrophic: for most struct ranges things will
>>> continue to work, but for class ranges the function will fail to perform
>>> to spec. I don't know how to improve on that.
>>  Require that all ranges are structs.  If you want to implement a range
>> as a class, use a wrapper struct that creates a new object in its
>> postblit function.  The wrapper struct can be made generic and placed in
>> the standard library.
>>  Same performance as the current approach, slightly more effort on the
>> part of the range implementor, much easier and less error-prone on the
>> side of the range user.
>
> Oh, besides it doesn't work for struct ranges that iterate one-pass streams.

What does save do in those cases?

-Steve
January 02, 2010
On 2010-01-01 17:54:12 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Michel Fortin wrote:
>> What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.
> 
> I tried that, it makes input ranges next to unusable.

I think I can see why. You can't have ref member and local variables like in C++, so it's pretty hard to use references.


> save() is an imperfect but workable solution.

save() is an workable but error-prone solution.

Perhaps we could mitigate this by making people more aware of the difference instead. Couldn't we rename "input range" for "input stream"?

Currently you have ranges that behave one way and ranges that behave the other way, which is confusing. Having a different name for both would emphasize there is a difference. With different names, you're guarantied to get the "what's the difference?" question from newbies.

And it's simple to explain: "You can often use ranges and streams interchangeably, but for that to work you must use save() when you need a copy of the current state. Also, not all streams support save(). It's good practice to always use save() so that algorithms work for both for ranges and streams."


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

January 02, 2010
Michel Fortin wrote:
> On 2010-01-01 17:54:12 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
> 
>> Michel Fortin wrote:
>>> What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.
>>
>> I tried that, it makes input ranges next to unusable.
> 
> I think I can see why. You can't have ref member and local variables like in C++, so it's pretty hard to use references.
> 
> 
>> save() is an imperfect but workable solution.
> 
> save() is an workable but error-prone solution.
> 
> Perhaps we could mitigate this by making people more aware of the difference instead. Couldn't we rename "input range" for "input stream"?
> 
> Currently you have ranges that behave one way and ranges that behave the other way, which is confusing. Having a different name for both would emphasize there is a difference. With different names, you're guarantied to get the "what's the difference?" question from newbies.
> 
> And it's simple to explain: "You can often use ranges and streams interchangeably, but for that to work you must use save() when you need a copy of the current state. Also, not all streams support save(). It's good practice to always use save() so that algorithms work for both for ranges and streams."

That's an idea, and names are powerful, but I think it's reasonable to not expect miracles from that name change. It has disadvantages too - "input range" vs. "forward range"  clarifies there's a conceptual relationship between the two, whereas "streams" are different from anything else.

Andrei
January 02, 2010
On 2010-01-02 09:59:51 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> That's an idea, and names are powerful, but I think it's reasonable to not expect miracles from that name change. It has disadvantages too - "input range" vs. "forward range"  clarifies there's a conceptual relationship between the two, whereas "streams" are different from anything else.

I'm not expecting a miracle from it either, it'd just be much less confusing.

You could say that assignment of an input stream might or might not save its state (depending on the stream type) so you must call save() to save the state when working with streams, but ranges are guarantied to save their state on assignment, thus behaving more predictably and just like arrays. So if you're working only with ranges, not streams, you never need to worry about save().

A similar option would be to have both input ranges and input streams:

* input range:  by value semantics, no need for save()
* input stream: by reference semantics

A pointer to an input range would thus automatically qualify as an input stream, so it's easy to give an input range to a function taking an input stream. Well, except for stack-allocated ranges in SafeD for which you can't create a pointer. This pretty much break the idea, I think.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/