October 03, 2020
On 10/3/20 11:30 AM, Ola Fosheim Grøstad wrote:
> On Saturday, 3 October 2020 at 14:49:03 UTC, Steven Schveighoffer wrote:
>> `in ref` is a reference, and it's OK if we make this not a reference in practice, because it's const. And code that takes something via `in ref` can already expect possible changes via other references, but should be OK if it doesn't change also.
> 
> You either support aliasing or not.
> 
> If you support aliasing then you should be able to write code where aliasing has the expected outcome.
> 
> Let me refer to ADA. According to the ADA manual you can specify that an integer is aliased, that means that it is guaranteed to exist in memory (and not in a register). Then you use 'access' to reference it.
> 
> If a language construct says "ref" I would expect 100% support for aliasing. It is not like aliasing is always undesired.
> 

Given that it's a parameter, and the parameter is const, it can only change through another reference. And this means, the function has to deal with the possibility that it can change, but ALSO cannot depend on or enforce being able to change it on purpose. On that, I think I agree with the concept of being able to switch to a value.

What I don't agree with is the idea that one can write code expecting something is passed by value, and then have the compiler later switch it to a reference. `in` means by value in all code today. The fact that we tried -preview=in on a bunch of projects and they "didn't break" is not reassuring.

-Steve
October 03, 2020
On 10/3/20 11:41 AM, Joseph Rushton Wakeling wrote:
> On Saturday, 3 October 2020 at 14:49:03 UTC, Steven Schveighoffer wrote:
>> `in ref` is a reference, and it's OK if we make this not a reference in practice, because it's const. And code that takes something via `in ref` can already expect possible changes via other references, but should be OK if it doesn't change also.
> 
> Is that still OK in a concurrent or multithreaded context?
> 
>      void foo (in ref bar)
>      {
>          // does something which may yield, and
>          // another context can change value
>          // underlying `bar`
>          ...
> 
>          // result of this writeln will now depend
>          // on how the compiler treated the `in ref`
>          writeln(bar)
>      }

This is not any different than calling a function which has a reference to the data elsewhere. In other words, it's not necessarily the function itself that changes the data, it could be changed outside the function. You don't need concurrency to do it.

But it's not impossible to define this:

"when accepting a parameter by `in ref`, one cannot depend on the value remaining constant, as other references to the data may change it. The compiler can also decide to pass an `in ref` parameter by value for optimization reasons, so one cannot depend on the parameter changing through a different alias."

That's essentially what `in` means in this preview switch. But the problem really is that `in` means something else today, and there is already a lot of code that expects that meaning.

-Steve
October 03, 2020
On 10/3/20 11:58 AM, Steven Schveighoffer wrote:
> What I don't agree with is the idea that one can write code expecting something is passed by value, and then have the compiler later switch it to a reference. `in` means by value in all code today. The fact that we tried -preview=in on a bunch of projects and they "didn't break" is not reassuring.

Agreed. Sadly I found (actually remembered) a smoking gun.

Over the years I've worked on a few STL-related code (such as flex_string and fbvector). I've once had a really difficult bug related to one or both of these functions:

http://www.cplusplus.com/reference/string/string/replace/
https://en.cppreference.com/w/cpp/algorithm/replace

The code looked correct and everything, I looked at it for hours. STL implementation subtleties are not really something to google about, but I asked a colleague and he started chuckling. He pointed out that you always must assume that your parameters may alias part of your container. (Sometimes wrapped in a different kind of iterator.) This sort of thing is well known and feared in STL implementer circles, so the rest of us sleep soundly at night.

Consider for example:

template< class ForwardIt, class T >
constexpr void replace( ForwardIt first, ForwardIt last,
                        const T& old_value, const T& new_value );

That may as well be called like this:

vector<Widget> v;
...
size_t i = ..., j = ...;
replace(v.begin(), v.end(), v[i], v[j]);

Only one is needed to be a reference inside the vector. Two is a worst case of sorts. You don't know what town you're in after debugging this.

At least in the STL this is reproducible with some ease, because STL always passes references. Now consider that this happens only on certain definitions of Widget (possibly maintenance increases its size and... boom!) and on certain platforms.

So I ask again: is this the kind of feature we want for the D language?
October 03, 2020
On Saturday, 3 October 2020 at 16:08:46 UTC, Steven Schveighoffer wrote:
> This is not any different than calling a function which has a reference to the data elsewhere. In other words, it's not necessarily the function itself that changes the data, it could be changed outside the function. You don't need concurrency to do it.

Sure.  Concurrency was just one example of how the implementation-dependent behaviour could arise.

> But it's not impossible to define this:
>
> "when accepting a parameter by `in ref`, one cannot depend on the value remaining constant, as other references to the data may change it. The compiler can also decide to pass an `in ref` parameter by value for optimization reasons, so one cannot depend on the parameter changing through a different alias."

OK, but that feels rather like it's imposing a cognitive burden on the developer as a way to work around the fact that the feature itself isn't working in an intuitive way.

It feels as unintuitive that a parameter marked `ref` could fail (in an implementation dependent way) to display reference semantics, as it does that a non-reference-type parameter _not_ marked `ref` could display them.
October 03, 2020
On Saturday, 3 October 2020 at 15:58:53 UTC, Steven Schveighoffer wrote:
> Given that it's a parameter, and the parameter is const, it can only change through another reference. And this means, the function has to deal with the possibility that it can change, but ALSO cannot depend on or enforce being able to change it on purpose. On that, I think I agree with the concept of being able to switch to a value.

But you can expect it to not change in parallell as it is not shared!? It can change if you call another function or in the context of a coroutine (assuming that coroutines cannot move to other threads).

My key point was this, I've never seen "ref" mean anything else than a live view of an object. If D is going to be an easy to learn language anything named "ref" has to retain that expectation.

In the context of parallell programming I believe that Chapel has various parameter transfer types that might be worth looking at (I don't remember the details).

> What I don't agree with is the idea that one can write code expecting something is passed by value, and then have the compiler later switch it to a reference. `in` means by value in all code today. The fact that we tried -preview=in on a bunch of projects and they "didn't break" is not reassuring.

Well, it is common for compilers (e.g. Ada/SPARK) to optimize by-value as a reference, but it should not be observable.

You could get around this by making the by-value parameter transfer "no-alias" with associated undefined behaviour (or "__restricted__" in C++ as Kinke pointed out). This would be great actually, except...

"in" looks very innocent to a newbie so it should have simple semantics... Advanced features ought to look advanced (at least more advanced than "in" or "out").

"in", "out", "in out" should be as simple to use as in SPARK, but that is difficult to achieve without constrained semantics (which would involve a lot more than a simple DIP). SPARK's approach to this looks really great though, but I've never used SPARK so I can't speak from experience. But, it is the kind of semantics that makes me more eager to give it a spin, for sure.

It might be helpful to play a bit with languages like Chapel and SPARK to get ideas.


October 03, 2020
On Saturday, 3 October 2020 at 16:49:28 UTC, Ola Fosheim Grøstad wrote:
> (which would involve a lot more than a simple DIP). SPARK's approach to this looks really great though, but I've never used SPARK so I can't speak from experience. But, it is the kind of semantics that makes me more eager to give it a spin, for sure.

As far as I understand you can choose to write parts of an Ada program in the constrained SPARK subset. Which basically means that D might be able to do something similar.

That could be very powerful. Write complicated functions in a restricted verified language subset, but most of the bread-and-butter code remains in the more flexible full language.


October 03, 2020
On Saturday, 3 October 2020 at 13:05:43 UTC, Andrei Alexandrescu wrote:
>
> [...]
>
> * This has been discussed in C++ circles a number of times, and aliasing has always been a concern. If /C++/ deemed that too dangerous... <insert broadside>. A much more explicit solution has been implemented in https://www.boost.org/doc/libs/1_66_0/libs/utility/call_traits.htm.

I don't deny that aliasing can create issues that could be very hard to debug.
But the problem of aliasing is not limited to `in`: code that uses `const ref` (or a  `const T` where `T` has indirection) can already misbehave if it doesn't take into account the possibility of parameter aliasing.

To put it differently: Why is `auto ref` acceptable but `in` is not ?
October 03, 2020
On 10/3/20 12:49 PM, Ola Fosheim Grøstad wrote:
> On Saturday, 3 October 2020 at 15:58:53 UTC, Steven Schveighoffer wrote:
>> Given that it's a parameter, and the parameter is const, it can only change through another reference. And this means, the function has to deal with the possibility that it can change, but ALSO cannot depend on or enforce being able to change it on purpose. On that, I think I agree with the concept of being able to switch to a value.
> 
> But you can expect it to not change in parallell as it is not shared!? It can change if you call another function or in the context of a coroutine (assuming that coroutines cannot move to other threads).

You can expect it to change but due to the way it enters your function, you can't rely on that expectation, even today.

For example:

void foo(const ref int x, ref int y)
{
   auto z = x;
   bar(); // might change x, but doesn't necessarily
   y = 5; // might change x, but doesn't necessarily
}

So given that it *might* change x, but isn't *guaranteed* to change x, you can reason that the function needs to deal with both of these possibilities. There isn't a way to say "parameter which is an alias of this other parameter".

In that sense, altering the function to actually accept x by value doesn't change what the function needs to deal with.

On the other hand, if the compiler normally passes x by value, and you rely on that current definition, and the definition changes later to mean pass by reference, then you now have code that may have had a correct assumption before, but doesn't now.

> 
> My key point was this, I've never seen "ref" mean anything else than a live view of an object. If D is going to be an easy to learn language anything named "ref" has to retain that expectation.

And in a sense, you can rely on that. At a function level, you can't tell whether mutating other data is going to affect `in ref` or `const ref` data. You have to assume in some cases it can, and in some cases it cannot.

>> What I don't agree with is the idea that one can write code expecting something is passed by value, and then have the compiler later switch it to a reference. `in` means by value in all code today. The fact that we tried -preview=in on a bunch of projects and they "didn't break" is not reassuring.
> 
> Well, it is common for compilers (e.g. Ada/SPARK) to optimize by-value as a reference, but it should not be observable.

If we could have this, it would be useful as well, but doesn't need a language change. You might be able to do this in pure functions.

> You could get around this by making the by-value parameter transfer "no-alias" with associated undefined behaviour (or "__restricted__" in C++ as Kinke pointed out). This would be great actually, except...
> 
> "in" looks very innocent to a newbie so it should have simple semantics... Advanced features ought to look advanced (at least more advanced than "in" or "out").

Yeah, that's why I think `in ref`, can be used (or even `const ref`).

That being said, if `in` didn't have the definition it already has, this would not be as controversial.

-Steve
October 03, 2020
On Saturday, 3 October 2020 at 16:56:06 UTC, Mathias LANG wrote:
> To put it differently: Why is `auto ref` acceptable but `in` is  not ?

Aren't `auto ref` parameters more intuitive in their behaviour, though?  That the parameter will be passed by ref if it's an lvalue, and by value if it's an rvalue (in which case the only way it can mutate under your feet is if it wraps by reference some other app state)?

Unless I'm missing something, that's much more predictable and shouldn't suffer from implementation-dependent differences in the way the preview `in` design does.
October 03, 2020
On Saturday, 3 October 2020 at 16:56:06 UTC, Mathias LANG wrote:
>
> To put it differently: Why is `auto ref` acceptable but `in` is not ?

The issue with `in`, compared to `auto ref`, is that, because its behavior is implementation-defined, it invites programmers to write code that "works on their machine," but is not portable to other environments (including future versions of the same compiler). It's the same issue that C has with features like variable-sized integer types and implementation-defined signedness of `char`.

Yes, *technically* it's your fault if you write C code that relies on an `int` being 32 bits, or a `char` being unsigned, just like it would *technically* be your fault if you wrote D code that relied in an `in` parameter being passed by reference. But making these things implementation-defined in the first place is setting the programmer up for failure.