View mode: basic / threaded / horizontal-split · Log in · Help
December 31, 2009
output ranges: by ref or by value?
Consider:

R2 copy(R1, R2)(R1 src, R2 tgt) {
   foreach (ref e; src) tgt.put(e);
   return tgt;
}

Currently output ranges are supposed to define only the put() method. 
However, you also want to copy using regular ranges as a target, hence 
the shim:

// pseudo-method
void put(R, E)(ref R tgt, E e) {
   tgt.front = e;
   tgt.popFront();
}

Now copying ranges is possible even when they don't define put().

An example of "pure" output range is ArrayAppender. It doesn't define 
anything interesting except put.

The question of this post is the following: should output ranges be 
passed by value or by reference? ArrayAppender uses an extra indirection 
to work properly when passed by value. But if we want to model built-in 
arrays' operator ~=, we'd need to request that all output ranges be 
passed by reference.

Any ideas - please share.


Andrei
December 31, 2009
Re: output ranges: by ref or by value?
On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail@erdani.org> said:

> The question of this post is the following: should output ranges be 
> passed by value or by reference? ArrayAppender uses an extra 
> indirection to work properly when passed by value. But if we want to 
> model built-in arrays' operator ~=, we'd need to request that all 
> output ranges be passed by reference.

I think modeling built-in arrays is the way to go as it makes less 
things to learn. In fact, it makes it easier to learn ranges because 
you can begin by learning arrays, then transpose this knowledge to 
ranges which are more abstract and harder to grasp.

Beside, an extra indirection is wasteful when you don't need it. It's 
easier to add a new layer of indirection when you need one than the 
reverse, so the primitive shouldn't require any indirection.


> // pseudo-method
> void put(R, E)(ref R tgt, E e) {
>     tgt.front = e;
>     tgt.popFront();
> }

I like that because it works especially well with arrays. Here's what 
I'm thinking of:

	char[10] buffer;
	char[] remainingSpace = buffer[];
	while (!remainingSpace.empty)
		remainingSpace.put(getc());

	// now buffer is full
	writeln(buffer);

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/
January 01, 2010
Re: output ranges: by ref or by value?
On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin@michelf.com>wrote:

> On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <
> SeeWebsiteForEmail@erdani.org> said:
>
>  The question of this post is the following: should output ranges be passed
>> by value or by reference? ArrayAppender uses an extra indirection to work
>> properly when passed by value. But if we want to model built-in arrays'
>> operator ~=, we'd need to request that all output ranges be passed by
>> reference.
>>
>
> I think modeling built-in arrays is the way to go as it makes less things
> to learn. In fact, it makes it easier to learn ranges because you can begin
> by learning arrays, then transpose this knowledge to ranges which are more
> abstract and harder to grasp.
>

I agree. And arrays may well be the most used range anyway.

Beside, an extra indirection is wasteful when you don't need it. It's easier
> to add a new layer of indirection when you need one than the reverse, so the
> primitive shouldn't require any indirection.


So (squint through sleep-deprived eyes:) that makes it by ref, right?



>  // pseudo-method
>> void put(R, E)(ref R tgt, E e) {
>>    tgt.front = e;
>>    tgt.popFront();
>> }
>>
>
A few random comments, sorry if they are not entirely coherent:

- this new put needs hasAssignableElements!R, right? What's in this case the
difference between isOutputRange!R and hasAssignableElements!R?

- should all higher-order ranges expose a put method if possible? (stride
comes to mind, but also take or filter).

- does that play nice with the new auto ref / ref template parameter from
2.038? It seems to me that this new feature will go hand in hand with this,
but I may be mistaken.

- your shim method will be used like this:

put(r,e);

whereas for an output range you use:

r.put(e);

and you cannot freely go from one form to the other, except for arrays,
which are output ranges anyway [*]. Does that mean that you must disseminate
static if ByRef/Assignable/Output/Whatever checks everywhere, to use either
put(r,e) or r.put(e)?

- what if R is a range of ranges (ie: if E is itself a range). Should put by
invoked recursively? What if its a chunked range?

- something I wanted to ask for a long time: does put really write to the
range as written in the docs or to the underlying container for which the
output range is but a 'writable' view? The container/range separation does
not exist for arrays, but is important for other cases.


 Philippe

[*] except if this transformation rule is implemented now?
January 01, 2010
Re: output ranges: by ref or by value?
Philippe Sigaud wrote:
> On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin@michelf.com 
> <mailto:michel.fortin@michelf.com>> wrote:
> 
>     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
>     <SeeWebsiteForEmail@erdani.org
>     <mailto:SeeWebsiteForEmail@erdani.org>> said:
> 
>         The question of this post is the following: should output ranges
>         be passed by value or by reference? ArrayAppender uses an extra
>         indirection to work properly when passed by value. But if we
>         want to model built-in arrays' operator ~=, we'd need to request
>         that all output ranges be passed by reference.
> 
> 
>     I think modeling built-in arrays is the way to go as it makes less
>     things to learn. In fact, it makes it easier to learn ranges because
>     you can begin by learning arrays, then transpose this knowledge to
>     ranges which are more abstract and harder to grasp.
> 
> 
> I agree. And arrays may well be the most used range anyway.

Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays 
motivated by practical necessity. I don't want to propagate that quirk 
into ranges. The best output range is one that works properly when 
passed by value.

>     Beside, an extra indirection is wasteful when you don't need it.
>     It's easier to add a new layer of indirection when you need one than
>     the reverse, so the primitive shouldn't require any indirection.
> 
> 
> So (squint through sleep-deprived eyes:) that makes it by ref, right?
>  
> 
> 
>         // pseudo-method
>         void put(R, E)(ref R tgt, E e) {
>            tgt.front = e;
>            tgt.popFront();
>         }

It doesn't. The ref in there is to pass tgt to the pseudo-method put, 
not to the function that invokes it.

> A few random comments, sorry if they are not entirely coherent:
> 
> - this new put needs hasAssignableElements!R, right? What's in this case 
> the difference between isOutputRange!R and hasAssignableElements!R?

It's a good question. There are two possible designs:

1. In the current design, the difference is that hasAssignableElements!R 
does not imply the range may grow. Consider this:

auto a = new int[10], b = new int[10];
copy(a, b);

This should work. But this shouldn't:

auto a = new int[10], b = new int[5];
copy(a, b);

because copy does not grow the target. If you want to append to b, you 
write:

copy(a, appender(&b));

2. In the design sketched in 
http://www.informit.com/articles/printerfriendly.aspx?p=1407357, 
iteration is separated from access. In that approach, you'd have a 
one-pass range for both input and output.

I'm not sure which design is better. (Suggestions are welcome.) For a 
pure output range, it's awkward to define empty() (what should it 
return?) and it's also awkward to put elements by calling two functions 
front/popFront instead of one.

> - should all higher-order ranges expose a put method if possible? 
> (stride comes to mind, but also take or filter).

I don't think so. The generic put will take care of that.

> - does that play nice with the new auto ref / ref template parameter 
> from 2.038? It seems to me that this new feature will go hand in hand 
> with this, but I may be mistaken.

There's no obvious connection. The nice thing about auto ref is this:

struct SomeAdapterRange(R) if (isWhateverRange!R) {
   private R src;
   @property auto ref front() {
      return src.front;
   }
}

You don't want to see how that looks today. Actually:

http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d

Search the page for "mixin" :o}.

> - your shim method will be used like this:
> 
> put(r,e);
> 
> whereas for an output range you use:
> 
> r.put(e);
> 
> and you cannot freely go from one form to the other, except for arrays, 
> which are output ranges anyway [*]. Does that mean that you must 
> disseminate static if ByRef/Assignable/Output/Whatever checks 
> everywhere, to use either put(r,e) or r.put(e)?

The D language automatically rewrites the latter into the former.

> - what if R is a range of ranges (ie: if E is itself a range). Should 
> put by invoked recursively? What if its a chunked range?

I don't know.

> - something I wanted to ask for a long time: does put really write to 
> the range as written in the docs or to the underlying container for 
> which the output range is but a 'writable' view? The container/range 
> separation does not exist for arrays, but is important for other cases.

Depends on how the range is defined. Appender holds a pointer to an 
array and appends to it. But appender is a special-purpose range. A 
usual range cannot change the topology of the container it's under.


Andrei
January 01, 2010
Re: output ranges: by ref or by value?
On 2010-01-01 09:47:58 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail@erdani.org> said:

> Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays 
> motivated by practical necessity. I don't want to propagate that quirk 
> into ranges. The best output range is one that works properly when 
> passed by value.

I agree and disagree. I wasn't proposing that ranges support ~=, and I 
don't thing it'd be a good idea, so I agree with you here.

But I still believe output ranges should behave like arrays. I was 
proposing that you model output ranges after buffers. A stream then 
becomes a buffer of infinite length.

Look at this example:

	char[10] buffer;
	char[] remainingSpace = buffer[];
	while (!remainingSpace.empty)
		remainingSpace.put(getc());

	// now buffer is full
	writeln(buffer);

Now rename "remainingSpace" for "outputStream" and it works fine, 
except for two things: "empty" sounds strange, and an output stream is 
never empty of remaining space: it's infinite length.

But conceptually, a buffer and a stream are almost the same, one having 
finite capacity while the other is infinite.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/
January 01, 2010
Re: output ranges: by ref or by value?
Andrei Alexandrescu wrote:

> Philippe Sigaud wrote:
>> On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin@michelf.com
>> <mailto:michel.fortin@michelf.com>> wrote:
>> 
>>     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
>>     <SeeWebsiteForEmail@erdani.org
>>     <mailto:SeeWebsiteForEmail@erdani.org>> said:
>> 
>>         The question of this post is the following: should output ranges
>>         be passed by value or by reference? ArrayAppender uses an extra
>>         indirection to work properly when passed by value. But if we
>>         want to model built-in arrays' operator ~=, we'd need to request
>>         that all output ranges be passed by reference.
>> 
>> 
>>     I think modeling built-in arrays is the way to go as it makes less
>>     things to learn. In fact, it makes it easier to learn ranges because
>>     you can begin by learning arrays, then transpose this knowledge to
>>     ranges which are more abstract and harder to grasp.
>> 
>> 
>> I agree. And arrays may well be the most used range anyway.
> 
> Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays
> motivated by practical necessity. I don't want to propagate that quirk
> into ranges. The best output range is one that works properly when
> passed by value.

I worry about a growing level of convention with ranges.  Another recent 
range thread discussed the need to call consume after a successful call to 
startsWith.  If I violated convention and had a range class, things would 
fail miserably.  There would be no need to consume after a successful call 
to startsWith and the range would have a random number of elements removed 
on an unsuccessful call to startsWith. I'm pretty sure that early 
discussions of ranges claimed that they could be either structs and classes, 
but in practice that isn't the case.
January 01, 2010
Re: output ranges: by ref or by value?
Jason House wrote:
> Andrei Alexandrescu wrote:
> 
>> Philippe Sigaud wrote:
>>> On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin@michelf.com
>>> <mailto:michel.fortin@michelf.com>> wrote:
>>>
>>>     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
>>>     <SeeWebsiteForEmail@erdani.org
>>>     <mailto:SeeWebsiteForEmail@erdani.org>> said:
>>>
>>>         The question of this post is the following: should output ranges
>>>         be passed by value or by reference? ArrayAppender uses an extra
>>>         indirection to work properly when passed by value. But if we
>>>         want to model built-in arrays' operator ~=, we'd need to request
>>>         that all output ranges be passed by reference.
>>>
>>>
>>>     I think modeling built-in arrays is the way to go as it makes less
>>>     things to learn. In fact, it makes it easier to learn ranges because
>>>     you can begin by learning arrays, then transpose this knowledge to
>>>     ranges which are more abstract and harder to grasp.
>>>
>>>
>>> I agree. And arrays may well be the most used range anyway.
>> Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays
>> motivated by practical necessity. I don't want to propagate that quirk
>> into ranges. The best output range is one that works properly when
>> passed by value.
> 
> I worry about a growing level of convention with ranges.  Another recent 
> range thread discussed the need to call consume after a successful call to 
> startsWith.  If I violated convention and had a range class, things would 
> fail miserably.  There would be no need to consume after a successful call 
> to startsWith and the range would have a random number of elements removed 
> on an unsuccessful call to startsWith. I'm pretty sure that early 
> discussions of ranges claimed that they could be either structs and classes, 
> but in practice that isn't the case.

I am implementing right now a change in the range interface mentioned in 
http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: 
add a function save() that saves the iteration state of a range.

With save() in tow, class ranges and struct ranges can be used the same 
way. True, if someone forgets to say

auto copy = r.save();

and instead says:

auto copy = r;

the behavior will indeed be different for class ranges and struct ranges.


Andrei
January 01, 2010
Re: output ranges: by ref or by value?
Andrei Alexandrescu Wrote:

> Jason House wrote:
> > Andrei Alexandrescu wrote:
> > 
> >> Philippe Sigaud wrote:
> >>> On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin@michelf.com
> >>> <mailto:michel.fortin@michelf.com>> wrote:
> >>>
> >>>     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
> >>>     <SeeWebsiteForEmail@erdani.org
> >>>     <mailto:SeeWebsiteForEmail@erdani.org>> said:
> >>>
> >>>         The question of this post is the following: should output ranges
> >>>         be passed by value or by reference? ArrayAppender uses an extra
> >>>         indirection to work properly when passed by value. But if we
> >>>         want to model built-in arrays' operator ~=, we'd need to request
> >>>         that all output ranges be passed by reference.
> >>>
> >>>
> >>>     I think modeling built-in arrays is the way to go as it makes less
> >>>     things to learn. In fact, it makes it easier to learn ranges because
> >>>     you can begin by learning arrays, then transpose this knowledge to
> >>>     ranges which are more abstract and harder to grasp.
> >>>
> >>>
> >>> I agree. And arrays may well be the most used range anyway.
> >> Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays
> >> motivated by practical necessity. I don't want to propagate that quirk
> >> into ranges. The best output range is one that works properly when
> >> passed by value.
> > 
> > I worry about a growing level of convention with ranges.  Another recent 
> > range thread discussed the need to call consume after a successful call to 
> > startsWith.  If I violated convention and had a range class, things would 
> > fail miserably.  There would be no need to consume after a successful call 
> > to startsWith and the range would have a random number of elements removed 
> > on an unsuccessful call to startsWith. I'm pretty sure that early 
> > discussions of ranges claimed that they could be either structs and classes, 
> > but in practice that isn't the case.
> 
> I am implementing right now a change in the range interface mentioned in 
> http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: 
> add a function save() that saves the iteration state of a range.
> 
> With save() in tow, class ranges and struct ranges can be used the same 
> way. True, if someone forgets to say
> 
> auto copy = r.save();
> 
> and instead says:
> 
> auto copy = r;
> 
> the behavior will indeed be different for class ranges and struct ranges.

Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?

> 
> Andrei
January 01, 2010
Re: output ranges: by ref or by value?
On 2010-01-01 14:10:40 -0500, Jason House <jason.james.house@gmail.com> said:

>> With save() in tow, class ranges and struct ranges can be used the same
>> way. True, if someone forgets to say
>> 
>> auto copy = r.save();
>> 
>> and instead says:
>> 
>> auto copy = r;
>> 
>> the behavior will indeed be different for class ranges and struct ranges.
> 
> Or if they completely forgot that bit of convention and omit creating a 
> variable called save... Also, doesn't use of save degrade performance 
> for structs? Or does the inliner/optimizer remove the copy variable 
> altogether?

I don't think this will be a problem for the optimizer.

But I say dump save(): it's more trouble than it's worth.

I'm sure it will be a problem for most programmers, as most will learn 
and test algorithms with arrays and struct ranges and not classes. 
"save()" is a detail too easy to miss. The benefit of supporting 
classes by asking everyone to remember adding save() seems tiny 
considering you can easily create a struct wrapper if you really need 
your range to be a class.

I know that the struct wrapper would probably force more copies of a 
range than necessary. But there are many ways this could be alleviated. 
For instance, the struct wrapper could use copy-on-write with a 
reference counter to detect when a copy is necessary. So while it'd be 
a little harder to design a range from a class, it'd be easier for 
everyone to use ranges correctly.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/
January 01, 2010
Re: output ranges: by ref or by value?
Michel Fortin wrote:
> On 2010-01-01 14:10:40 -0500, Jason House <jason.james.house@gmail.com> 
> said:
> 
>>> With save() in tow, class ranges and struct ranges can be used the same
>>> way. True, if someone forgets to say
>>>
>>> auto copy = r.save();
>>>
>>> and instead says:
>>>
>>> auto copy = r;
>>>
>>> the behavior will indeed be different for class ranges and struct 
>>> ranges.
>>
>> Or if they completely forgot that bit of convention and omit creating 
>> a variable called save... Also, doesn't use of save degrade 
>> performance for structs? Or does the inliner/optimizer remove the copy 
>> variable altogether?
> 
> I don't think this will be a problem for the optimizer.
> 
> But I say dump save(): it's more trouble than it's worth.
> 
> I'm sure it will be a problem for most programmers, as most will learn 
> and test algorithms with arrays and struct ranges and not classes. 
> "save()" is a detail too easy to miss. The benefit of supporting classes 
> by asking everyone to remember adding save() seems tiny considering you 
> can easily create a struct wrapper if you really need your range to be a 
> class.
> 
> I know that the struct wrapper would probably force more copies of a 
> range than necessary. But there are many ways this could be alleviated. 
> For instance, the struct wrapper could use copy-on-write with a 
> reference counter to detect when a copy is necessary. So while it'd be a 
> little harder to design a range from a class, it'd be easier for 
> everyone to use ranges correctly.
> 

save() is not only for classes. It also distinguishes input ranges from 
forward ranges. It's the primitive that STL didn't define but should have.

Andrei
« First   ‹ Prev
1 2 3 4
Top | Discussion index | About this forum | D home