output ranges: by ref or by value? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » output ranges: by ref or by value?

Thread overview

output ranges: by ref or by value?
Dec 31, 2009 Andrei Alexandrescu
Dec 31, 2009 Michel Fortin
Jan 01, 2010 Philippe Sigaud
Jan 01, 2010 Andrei Alexandrescu
Jan 01, 2010 Michel Fortin
Jan 01, 2010 Jason House
Jan 01, 2010 Andrei Alexandrescu
Jan 01, 2010 Jason House
Jan 01, 2010 Michel Fortin
Jan 01, 2010 Andrei Alexandrescu
Jan 01, 2010 Michel Fortin
Jan 01, 2010 Andrei Alexandrescu
Jan 02, 2010 Michel Fortin
Jan 02, 2010 Andrei Alexandrescu
Jan 02, 2010 Michel Fortin
Jan 01, 2010 Andrei Alexandrescu
Jan 01, 2010 Rainer Deyke
Jan 01, 2010 Andrei Alexandrescu
Jan 01, 2010 Andrei Alexandrescu
Jan 02, 2010 Steven Schveighoffer
Jan 02, 2010 Andrei Alexandrescu
Jan 02, 2010 Steven Schveighoffer
Jan 02, 2010 Andrei Alexandrescu
Jan 02, 2010 Steven Schveighoffer
Jan 03, 2010 Andrei Alexandrescu
Jan 03, 2010 Steven Schveighoffer
Jan 03, 2010 Michel Fortin
Jan 03, 2010 Andrei Alexandrescu
Jan 03, 2010 Michel Fortin
Jan 03, 2010 Andrei Alexandrescu
Jan 03, 2010 Steven Schveighoffer
Jan 03, 2010 Andrei Alexandrescu
Jan 03, 2010 Steven Schveighoffer
Jan 03, 2010 Andrei Alexandrescu
Jan 04, 2010 Steven Schveighoffer

December 31, 2009

output ranges: by ref or by value?

Posted by Andrei Alexandrescu

Andrei Alexandrescu

Consider:

R2 copy(R1, R2)(R1 src, R2 tgt) {
   foreach (ref e; src) tgt.put(e);
   return tgt;
}

Currently output ranges are supposed to define only the put() method. However, you also want to copy using regular ranges as a target, hence the shim:

// pseudo-method
void put(R, E)(ref R tgt, E e) {
   tgt.front = e;
   tgt.popFront();
}

Now copying ranges is possible even when they don't define put().

An example of "pure" output range is ArrayAppender. It doesn't define anything interesting except put.

The question of this post is the following: should output ranges be passed by value or by reference? ArrayAppender uses an extra indirection to work properly when passed by value. But if we want to model built-in arrays' operator ~=, we'd need to request that all output ranges be passed by reference.

Any ideas - please share.


Andrei

December 31, 2009

Re: output ranges: by ref or by value?

Posted by Michel Fortin
in reply to Andrei Alexandrescu

Michel Fortin

Posted in reply to Andrei Alexandrescu

On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> The question of this post is the following: should output ranges be passed by value or by reference? ArrayAppender uses an extra indirection to work properly when passed by value. But if we want to model built-in arrays' operator ~=, we'd need to request that all output ranges be passed by reference.

I think modeling built-in arrays is the way to go as it makes less things to learn. In fact, it makes it easier to learn ranges because you can begin by learning arrays, then transpose this knowledge to ranges which are more abstract and harder to grasp.

Beside, an extra indirection is wasteful when you don't need it. It's easier to add a new layer of indirection when you need one than the reverse, so the primitive shouldn't require any indirection.


> // pseudo-method
> void put(R, E)(ref R tgt, E e) {
>     tgt.front = e;
>     tgt.popFront();
> }

I like that because it works especially well with arrays. Here's what I'm thinking of:

	char[10] buffer;
	char[] remainingSpace = buffer[];
	while (!remainingSpace.empty)
		remainingSpace.put(getc());

	// now buffer is full
	writeln(buffer);

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Philippe Sigaud
in reply to Michel Fortin

Philippe Sigaud

Posted in reply to Michel Fortin

Attachments:

text/html part

On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin@michelf.com>wrote:

> On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu < SeeWebsiteForEmail@erdani.org> said:
>
>  The question of this post is the following: should output ranges be passed
>> by value or by reference? ArrayAppender uses an extra indirection to work properly when passed by value. But if we want to model built-in arrays' operator ~=, we'd need to request that all output ranges be passed by reference.
>>
>
> I think modeling built-in arrays is the way to go as it makes less things to learn. In fact, it makes it easier to learn ranges because you can begin by learning arrays, then transpose this knowledge to ranges which are more abstract and harder to grasp.
>

I agree. And arrays may well be the most used range anyway.

Beside, an extra indirection is wasteful when you don't need it. It's easier
> to add a new layer of indirection when you need one than the reverse, so the primitive shouldn't require any indirection.


So (squint through sleep-deprived eyes:) that makes it by ref, right?



>  // pseudo-method
>> void put(R, E)(ref R tgt, E e) {
>>    tgt.front = e;
>>    tgt.popFront();
>> }
>>
>
A few random comments, sorry if they are not entirely coherent:

- this new put needs hasAssignableElements!R, right? What's in this case the difference between isOutputRange!R and hasAssignableElements!R?

- should all higher-order ranges expose a put method if possible? (stride comes to mind, but also take or filter).

- does that play nice with the new auto ref / ref template parameter from 2.038? It seems to me that this new feature will go hand in hand with this, but I may be mistaken.

- your shim method will be used like this:

put(r,e);

whereas for an output range you use:

r.put(e);

and you cannot freely go from one form to the other, except for arrays, which are output ranges anyway [*]. Does that mean that you must disseminate static if ByRef/Assignable/Output/Whatever checks everywhere, to use either put(r,e) or r.put(e)?

- what if R is a range of ranges (ie: if E is itself a range). Should put by invoked recursively? What if its a chunked range?

- something I wanted to ask for a long time: does put really write to the range as written in the docs or to the underlying container for which the output range is but a 'writable' view? The container/range separation does not exist for arrays, but is important for other cases.


  Philippe

[*] except if this transformation rule is implemented now?

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Andrei Alexandrescu
in reply to Philippe Sigaud

Andrei Alexandrescu

Posted in reply to Philippe Sigaud

Philippe Sigaud wrote:
> On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin@michelf.com <mailto:michel.fortin@michelf.com>> wrote:
> 
>     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
>     <SeeWebsiteForEmail@erdani.org
>     <mailto:SeeWebsiteForEmail@erdani.org>> said:
> 
>         The question of this post is the following: should output ranges
>         be passed by value or by reference? ArrayAppender uses an extra
>         indirection to work properly when passed by value. But if we
>         want to model built-in arrays' operator ~=, we'd need to request
>         that all output ranges be passed by reference.
> 
> 
>     I think modeling built-in arrays is the way to go as it makes less
>     things to learn. In fact, it makes it easier to learn ranges because
>     you can begin by learning arrays, then transpose this knowledge to
>     ranges which are more abstract and harder to grasp.
> 
> 
> I agree. And arrays may well be the most used range anyway.

Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.

>     Beside, an extra indirection is wasteful when you don't need it.
>     It's easier to add a new layer of indirection when you need one than
>     the reverse, so the primitive shouldn't require any indirection.
> 
> 
> So (squint through sleep-deprived eyes:) that makes it by ref, right?
>  
> 
> 
>         // pseudo-method
>         void put(R, E)(ref R tgt, E e) {
>            tgt.front = e;
>            tgt.popFront();
>         }

It doesn't. The ref in there is to pass tgt to the pseudo-method put, not to the function that invokes it.

> A few random comments, sorry if they are not entirely coherent:
> 
> - this new put needs hasAssignableElements!R, right? What's in this case the difference between isOutputRange!R and hasAssignableElements!R?

It's a good question. There are two possible designs:

1. In the current design, the difference is that hasAssignableElements!R does not imply the range may grow. Consider this:

auto a = new int[10], b = new int[10];
copy(a, b);

This should work. But this shouldn't:

auto a = new int[10], b = new int[5];
copy(a, b);

because copy does not grow the target. If you want to append to b, you write:

copy(a, appender(&b));

2. In the design sketched in http://www.informit.com/articles/printerfriendly.aspx?p=1407357, iteration is separated from access. In that approach, you'd have a one-pass range for both input and output.

I'm not sure which design is better. (Suggestions are welcome.) For a pure output range, it's awkward to define empty() (what should it return?) and it's also awkward to put elements by calling two functions front/popFront instead of one.

> - should all higher-order ranges expose a put method if possible? (stride comes to mind, but also take or filter).

I don't think so. The generic put will take care of that.

> - does that play nice with the new auto ref / ref template parameter from 2.038? It seems to me that this new feature will go hand in hand with this, but I may be mistaken.

There's no obvious connection. The nice thing about auto ref is this:

struct SomeAdapterRange(R) if (isWhateverRange!R) {
   private R src;
   @property auto ref front() {
      return src.front;
   }
}

You don't want to see how that looks today. Actually:

http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d

Search the page for "mixin" :o}.

> - your shim method will be used like this:
> 
> put(r,e);
> 
> whereas for an output range you use:
> 
> r.put(e);
> 
> and you cannot freely go from one form to the other, except for arrays, which are output ranges anyway [*]. Does that mean that you must disseminate static if ByRef/Assignable/Output/Whatever checks everywhere, to use either put(r,e) or r.put(e)?

The D language automatically rewrites the latter into the former.

> - what if R is a range of ranges (ie: if E is itself a range). Should put by invoked recursively? What if its a chunked range?

I don't know.

> - something I wanted to ask for a long time: does put really write to the range as written in the docs or to the underlying container for which the output range is but a 'writable' view? The container/range separation does not exist for arrays, but is important for other cases.

Depends on how the range is defined. Appender holds a pointer to an array and appends to it. But appender is a special-purpose range. A usual range cannot change the topology of the container it's under.

Andrei

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Michel Fortin
in reply to Andrei Alexandrescu

Michel Fortin

Posted in reply to Andrei Alexandrescu

On 2010-01-01 09:47:58 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.

I agree and disagree. I wasn't proposing that ranges support ~=, and I don't thing it'd be a good idea, so I agree with you here.

But I still believe output ranges should behave like arrays. I was proposing that you model output ranges after buffers. A stream then becomes a buffer of infinite length.

Look at this example:

	char[10] buffer;
	char[] remainingSpace = buffer[];
	while (!remainingSpace.empty)
		remainingSpace.put(getc());

	// now buffer is full
	writeln(buffer);

Now rename "remainingSpace" for "outputStream" and it works fine, except for two things: "empty" sounds strange, and an output stream is never empty of remaining space: it's infinite length.

But conceptually, a buffer and a stream are almost the same, one having finite capacity while the other is infinite.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Jason House
in reply to Andrei Alexandrescu

Jason House

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:

> Philippe Sigaud wrote:
>> On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin@michelf.com <mailto:michel.fortin@michelf.com>> wrote:
>> 
>>     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
>>     <SeeWebsiteForEmail@erdani.org
>>     <mailto:SeeWebsiteForEmail@erdani.org>> said:
>> 
>>         The question of this post is the following: should output ranges
>>         be passed by value or by reference? ArrayAppender uses an extra
>>         indirection to work properly when passed by value. But if we
>>         want to model built-in arrays' operator ~=, we'd need to request
>>         that all output ranges be passed by reference.
>> 
>> 
>>     I think modeling built-in arrays is the way to go as it makes less
>>     things to learn. In fact, it makes it easier to learn ranges because
>>     you can begin by learning arrays, then transpose this knowledge to
>>     ranges which are more abstract and harder to grasp.
>> 
>> 
>> I agree. And arrays may well be the most used range anyway.
> 
> Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.

I worry about a growing level of convention with ranges.  Another recent range thread discussed the need to call consume after a successful call to startsWith.  If I violated convention and had a range class, things would fail miserably.  There would be no need to consume after a successful call to startsWith and the range would have a random number of elements removed on an unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs and classes, but in practice that isn't the case.

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Andrei Alexandrescu
in reply to Jason House

Andrei Alexandrescu

Posted in reply to Jason House

Jason House wrote:
> Andrei Alexandrescu wrote:
> 
>> Philippe Sigaud wrote:
>>> On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin@michelf.com
>>> <mailto:michel.fortin@michelf.com>> wrote:
>>>
>>>     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
>>>     <SeeWebsiteForEmail@erdani.org
>>>     <mailto:SeeWebsiteForEmail@erdani.org>> said:
>>>
>>>         The question of this post is the following: should output ranges
>>>         be passed by value or by reference? ArrayAppender uses an extra
>>>         indirection to work properly when passed by value. But if we
>>>         want to model built-in arrays' operator ~=, we'd need to request
>>>         that all output ranges be passed by reference.
>>>
>>>
>>>     I think modeling built-in arrays is the way to go as it makes less
>>>     things to learn. In fact, it makes it easier to learn ranges because
>>>     you can begin by learning arrays, then transpose this knowledge to
>>>     ranges which are more abstract and harder to grasp.
>>>
>>>
>>> I agree. And arrays may well be the most used range anyway.
>> Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays
>> motivated by practical necessity. I don't want to propagate that quirk
>> into ranges. The best output range is one that works properly when
>> passed by value.
> 
> I worry about a growing level of convention with ranges.  Another recent range thread discussed the need to call consume after a successful call to startsWith.  If I violated convention and had a range class, things would fail miserably.  There would be no need to consume after a successful call to startsWith and the range would have a random number of elements removed on an unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs and classes, but in practice that isn't the case.

I am implementing right now a change in the range interface mentioned in http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: add a function save() that saves the iteration state of a range.

With save() in tow, class ranges and struct ranges can be used the same way. True, if someone forgets to say

auto copy = r.save();

and instead says:

auto copy = r;

the behavior will indeed be different for class ranges and struct ranges.


Andrei

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Jason House
in reply to Andrei Alexandrescu

Jason House

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu Wrote:

> Jason House wrote:
> > Andrei Alexandrescu wrote:
> > 
> >> Philippe Sigaud wrote:
> >>> On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin@michelf.com <mailto:michel.fortin@michelf.com>> wrote:
> >>>
> >>>     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
> >>>     <SeeWebsiteForEmail@erdani.org
> >>>     <mailto:SeeWebsiteForEmail@erdani.org>> said:
> >>>
> >>>         The question of this post is the following: should output ranges
> >>>         be passed by value or by reference? ArrayAppender uses an extra
> >>>         indirection to work properly when passed by value. But if we
> >>>         want to model built-in arrays' operator ~=, we'd need to request
> >>>         that all output ranges be passed by reference.
> >>>
> >>>
> >>>     I think modeling built-in arrays is the way to go as it makes less
> >>>     things to learn. In fact, it makes it easier to learn ranges because
> >>>     you can begin by learning arrays, then transpose this knowledge to
> >>>     ranges which are more abstract and harder to grasp.
> >>>
> >>>
> >>> I agree. And arrays may well be the most used range anyway.
> >> Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.
> > 
> > I worry about a growing level of convention with ranges.  Another recent range thread discussed the need to call consume after a successful call to startsWith.  If I violated convention and had a range class, things would fail miserably.  There would be no need to consume after a successful call to startsWith and the range would have a random number of elements removed on an unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs and classes, but in practice that isn't the case.
> 
> I am implementing right now a change in the range interface mentioned in http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: add a function save() that saves the iteration state of a range.
> 
> With save() in tow, class ranges and struct ranges can be used the same way. True, if someone forgets to say
> 
> auto copy = r.save();
> 
> and instead says:
> 
> auto copy = r;
> 
> the behavior will indeed be different for class ranges and struct ranges.

Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?

> 
> Andrei

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Michel Fortin
in reply to Jason House

Michel Fortin

Posted in reply to Jason House

On 2010-01-01 14:10:40 -0500, Jason House <jason.james.house@gmail.com> said:

>> With save() in tow, class ranges and struct ranges can be used the same
>> way. True, if someone forgets to say
>> 
>> auto copy = r.save();
>> 
>> and instead says:
>> 
>> auto copy = r;
>> 
>> the behavior will indeed be different for class ranges and struct ranges.
> 
> Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?

I don't think this will be a problem for the optimizer.

But I say dump save(): it's more trouble than it's worth.

I'm sure it will be a problem for most programmers, as most will learn and test algorithms with arrays and struct ranges and not classes. "save()" is a detail too easy to miss. The benefit of supporting classes by asking everyone to remember adding save() seems tiny considering you can easily create a struct wrapper if you really need your range to be a class.

I know that the struct wrapper would probably force more copies of a range than necessary. But there are many ways this could be alleviated. For instance, the struct wrapper could use copy-on-write with a reference counter to detect when a copy is necessary. So while it'd be a little harder to design a range from a class, it'd be easier for everyone to use ranges correctly.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

January 01, 2010

Re: output ranges: by ref or by value?

Posted by Andrei Alexandrescu
in reply to Michel Fortin

Andrei Alexandrescu

Posted in reply to Michel Fortin

Michel Fortin wrote:
> On 2010-01-01 14:10:40 -0500, Jason House <jason.james.house@gmail.com> said:
> 
>>> With save() in tow, class ranges and struct ranges can be used the same
>>> way. True, if someone forgets to say
>>>
>>> auto copy = r.save();
>>>
>>> and instead says:
>>>
>>> auto copy = r;
>>>
>>> the behavior will indeed be different for class ranges and struct ranges.
>>
>> Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?
> 
> I don't think this will be a problem for the optimizer.
> 
> But I say dump save(): it's more trouble than it's worth.
> 
> I'm sure it will be a problem for most programmers, as most will learn and test algorithms with arrays and struct ranges and not classes. "save()" is a detail too easy to miss. The benefit of supporting classes by asking everyone to remember adding save() seems tiny considering you can easily create a struct wrapper if you really need your range to be a class.
> 
> I know that the struct wrapper would probably force more copies of a range than necessary. But there are many ways this could be alleviated. For instance, the struct wrapper could use copy-on-write with a reference counter to detect when a copy is necessary. So while it'd be a little harder to design a range from a class, it'd be easier for everyone to use ranges correctly.
> 

save() is not only for classes. It also distinguishes input ranges from forward ranges. It's the primitive that STL didn't define but should have.

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation