Thread overview
phobos by ref or by value
Dec 16, 2012
Dan
Dec 16, 2012
Jonathan M Davis
Dec 17, 2012
Dan
Dec 17, 2012
bearophile
Dec 17, 2012
Dan
Dec 17, 2012
Jonathan M Davis
December 16, 2012
Is there a general philosophy in D phobos on when to pass by value or
reference?  For instance, to find a slice using lowerBound many copies
of the target item, as well as copies of items in the collection are
made (see code example below). This seems unnecessary - why not have
functions like:

    auto lowerBound(...)(V value)

be:

    auto lowerBound(...)(ref V value)

or:

    auto lowerBound(...)(auto ref V value)

Is this a source for desire for no postblits, shallow semantics on
copy/assignment with additional logic for copy on write semantics. If
libraries in general are coded to make many copies of parameters it
might be a big improvement to not have postblits. A general purpose
library accessing ranges will not know the semantics of V (deep or
shallow), so why incur the cost of copies? Certainly finding a
lowerBound on a range of V can be done with 0 copies of elements?

Is there an established philosophy?

Thanks
Dan
-------
  struct S {
    DateTime date;
    double val;
    this(this) { writeln("copying S ", &this, ' ', date, ',', val); }
  }
---------------
  auto before(ref const(ValueType) vt) const {
    auto ass = assumeSorted!orderingPred(_history[]);
    writeln("Before lb");
    auto lb = ass.lowerBound(vt);
    writeln("After lb");
    return History!(V, orderingPred)(_history[0 .. lb.length]);
  }

---------------
Before lb
copying S 7FFF622CE110 2001-Nov-01 00:00:00,0
copying S 7FFF622CE090 2001-Nov-01 00:00:00,0
copying S 7FFF622CE020 2001-Jan-01 00:00:00,100
copying S 7FFF622CE030 2001-Nov-01 00:00:00,0
copying S 7FFF622CDF90 2001-Jan-01 00:00:00,100
copying S 7FFF622CDFA0 2001-Nov-01 00:00:00,0
copying S 7FFF622CE020 2002-Jan-01 00:00:00,200
copying S 7FFF622CE030 2001-Nov-01 00:00:00,0
copying S 7FFF622CDF90 2002-Jan-01 00:00:00,200
copying S 7FFF622CDFA0 2001-Nov-01 00:00:00,0
copying S 7FFF622CE020 2001-Jan-01 00:00:00,200
copying S 7FFF622CE030 2001-Nov-01 00:00:00,0
copying S 7FFF622CDF90 2001-Jan-01 00:00:00,200
copying S 7FFF622CDFA0 2001-Nov-01 00:00:00,0
After lb
December 16, 2012
On Sunday, December 16, 2012 16:09:45 Dan wrote:
> Is there a general philosophy in D phobos on when to pass by
> value or
> reference?  For instance, to find a slice using lowerBound many
> copies
> of the target item, as well as copies of items in the collection
> are
> made (see code example below). This seems unnecessary - why not
> have
> functions like:
> 
>      auto lowerBound(...)(V value)
> 
> be:
> 
>      auto lowerBound(...)(ref V value)
> 
> or:
> 
>      auto lowerBound(...)(auto ref V value)
> 
> Is this a source for desire for no postblits, shallow semantics on
> copy/assignment with additional logic for copy on write
> semantics. If
> libraries in general are coded to make many copies of parameters
> it
> might be a big improvement to not have postblits. A general
> purpose
> library accessing ranges will not know the semantics of V (deep or
> shallow), so why incur the cost of copies? Certainly finding a
> lowerBound on a range of V can be done with 0 copies of elements?
> 
> Is there an established philosophy?

You _don't_ take ranges by ref unless you want to alter the original, which is almost never the case. Functions like popFrontN are the exception. And since you _are_ going to mutate the parameter (since ranges iterate via mutation), something like const ref would never make sense, even if it had C++'s semantics. I'm not sure if auto ref screams at you if you try and mutate the original, but if it doesn't, then you get problems when passing it lvalue ranges, because they'd be being passed by ref and mutated, which you don't want. So, auto ref makes no sense either. You pretty much always pass ranges by value. And a range which does a deep copy when it's copied is a fundamentally broken range anyway. It has the wrong semantics and won't function correctly with many range-based functions. Ranges are supposed to be a view into a range of values (possibly in a container), and copying the view shouldn't copy the actual elements. Otherwise, you'd be doing the equivalent of passing around a container by value, which is almost always a horrible idea.

As for types which aren't ranges, they're almost a non-issue in Phobos. Most functions in Phobos take either a range or a primitive type. There aren't very many user-defined types in Phobos which aren't ranges (e.g. the types in std.datetime), but those that aren't ranges are generally either small enough that trying to pass by const ref or auto ref doesn't buy you much (if anything), or they're classes, in which case, it's a non-issue. And almost every generic function in Phobos takes a range. So, functions in Phobos almost always take their arguments by value. They'll use ref when it's required for the semantics of what they're doing, but auto ref on function parameters is rare.

- Jonathan M Davis
December 17, 2012
On Sunday, 16 December 2012 at 23:02:30 UTC, Jonathan M Davis wrote:
>
> You _don't_ take ranges by ref unless you want to alter the original, which is
> almost never the case. Functions like popFrontN are the exception. And since
> you _are_ going to mutate the parameter (since ranges iterate via mutation),
> something like const ref would never make sense, even if it had C++'s
> semantics. I'm not sure if auto ref screams at you if you try and mutate the
> original, but if it doesn't, then you get problems when passing it lvalue
> ranges, because they'd be being passed by ref and mutated, which you don't
> want. So, auto ref makes no sense either. You pretty much always pass ranges
> by value. And a range which does a deep copy when it's copied is a
> fundamentally broken range anyway. It has the wrong semantics and won't
> function correctly with many range-based functions. Ranges are supposed to be
> a view into a range of values (possibly in a container), and copying the view
> shouldn't copy the actual elements. Otherwise, you'd be doing the equivalent
> of passing around a container by value, which is almost always a horrible
> idea.
>
> As for types which aren't ranges, they're almost a non-issue in Phobos. Most
> functions in Phobos take either a range or a primitive type. There aren't very
> many user-defined types in Phobos which aren't ranges (e.g. the types in
> std.datetime), but those that aren't ranges are generally either small enough
> that trying to pass by const ref or auto ref doesn't buy you much (if
> anything), or they're classes, in which case, it's a non-issue. And almost
> every generic function in Phobos takes a range. So, functions in Phobos almost
> always take their arguments by value.

I assume you are talking about functions other than lowerBound, upperBound, trisect.

> They'll use ref when it's required for
> the semantics of what they're doing, but auto ref on function parameters is
> rare.

When would ref be required for semantics? I am asking this to learn the D way - so any guidelines are helpful. We have language spec and TDPL. Maybe we need another book or three in the vein of Meyers "50 Effective Ways".


Sorry, but I don't understand the focus on ranges. I know ranges are involved because lowerBound is a method on SortedRange. But I am asking why a member function of a range (i.e. lowerBound) takes its argument by value. I don't mind copies of ranges being made when needed - as I think they are "light copies" of pointers. But by value of type V in lowerBound performs unnecessary copy of the element of unknown size/complexity. The library can not know the cost of that *and* it can be avoided (I think). I thought ranges were a refinement or improvement on pair of iterators. So I have a range of items already existing in memory and I want to find all elements in the range less than some value of type V. I don't understand the choice of the V as opposed to 'ref const(V)'. What this does is cause the fire of postblits again and again on a non-phobos user defined struct - and I think they are needless. *find* or *lower_bound* in C++, for example, take the element to be found as 'const &' so copies are not made. Why is that not done here? If it is not an oversight, I have more to learn on how things work in D and therefore want a broader set of guidelines. I would think a guideline like: "In generic code always take generic types that are not known to be primitives or very small collections of pointers (like dynamic array, associative array) by reference since you can not know the cost of copying".

Usually the best place to learn the way of a language is studying its standard libraries, so that is what I am after - the why's of it.


Thanks
Dan
December 17, 2012
Dan:

> Usually the best place to learn the way of a language is studying its standard libraries,

Then I suggest you to not study std.random because it currently contains know flaws regarding what you are saying.

Bye,
bearophile
December 17, 2012
On Monday, 17 December 2012 at 03:23:13 UTC, bearophile wrote:
> Then I suggest you to not study std.random because it currently contains know flaws regarding what you are saying.

Fine, thanks. But which would be recommended to study?
December 17, 2012
On Monday, December 17, 2012 04:06:52 Dan wrote:
> > They'll use ref when it's required for
> > the semantics of what they're doing, but auto ref on function
> > parameters is
> > rare.
> 
> When would ref be required for semantics? I am asking this to learn the D way - so any guidelines are helpful. We have language spec and TDPL. Maybe we need another book or three in the vein of Meyers "50 Effective Ways".

ref is required when you want the argument you're passing in to be altered rather than the copy being altered. That's the same as in C++.

Ranges in general don't do deep copies when they're passed around for basically the same reasons that pointers don't. If you want to know more about ranges, this is probably the best resource at this point:

http://ddili.org/ders/d.en/ranges.html

There are probably plenty of cases in D where the equivalent of C++'s const& would be desirable, but D doesn't really have that at this point. The closest is auto ref, which only works with templated functions, and it doesn't prevent the argument from being mutated (though auto ref const would). Also, const in D is far more restrictive than it is in C++, making it so that forcing const on function parameters can be highly restrictive and annoying. It's an ongoing debate on how to solve that, as emulating C++'s const& and having const ref take rvalues has been rejected. So, in most cases, the issue is completely ignored at this point in Phobos. And since most functions in Phobos take either ranges or built-in types (where passing by value is not a problem), so in most cases, it's not an issue at all. Long term, it's something that should probably be addressed, but until the const ref situation is sorted out, it probably won't be.

Functions which take the element of a range rather than a range probably should do something to avoid unnecessary copies, and auto ref may be the solution to that at the moment, but it's not clear how that's going to be sorted out in the long run.

- Jonathan M Davis