Jump to page: 1 24  
Page
Thread overview
`in` parameters made useful
Jul 31, 2020
Mathias LANG
Jul 31, 2020
Adam D. Ruppe
Aug 01, 2020
Per Nordlöw
Aug 01, 2020
Kagamin
Aug 01, 2020
Rainer Schuetze
Aug 04, 2020
Mathias LANG
Aug 04, 2020
Rainer Schuetze
Aug 04, 2020
tsbockman
Aug 04, 2020
Mathias LANG
Aug 05, 2020
Fynn Schröder
Aug 23, 2020
James Blachly
Aug 20, 2020
Kagamin
Aug 20, 2020
Mathias LANG
Aug 21, 2020
Kagamin
Aug 21, 2020
Mathias LANG
Aug 25, 2020
Kagamin
Aug 25, 2020
Mathias LANG
Aug 27, 2020
Kagamin
Aug 20, 2020
IGotD-
Aug 20, 2020
Araq
Aug 21, 2020
Araq
Aug 21, 2020
Araq
Aug 21, 2020
tsbockman
Aug 22, 2020
IGotD-
Aug 22, 2020
Araq
Aug 21, 2020
Mathias LANG
Aug 21, 2020
WebFreak001
Aug 21, 2020
Jacob Carlborg
Aug 25, 2020
Atila Neves
Aug 26, 2020
Mathias LANG
Aug 26, 2020
Atila Neves
Aug 26, 2020
Atila Neves
Aug 26, 2020
Jacob Carlborg
Aug 26, 2020
IGotD-
July 31, 2020
Hi everyone,
For a long time I've been pretty annoyed by the state of `in` parameters.
In case it needs any clarification, I'm talking at what's between the asterisks (*) here: `void foo (*in* char[] arg)`).

While they always seemed like a good idea, they never really added anything: `in` was supposed to be `const scope`, then, when the time came to make `scope` actually do something (read: DIP1000), `scope` was removed from `in`!

This was re-added last release (DMD 2.092) where the `-preview=in` switch was added (https://dlang.org/changelog/2.092.0.html#preview-in). So now, if you want `in` to mean what it's documented to be, you need to throw in both `-preview=dip1000` and `-preview=in`.

But then... That still feels incomplete. I deal with a lot of C++ interop code, and we can't use `in` without using `ref`, because otherwise we trigger copy constructors / destructors of aggregates we have no control over. We also have some value types which can get pretty big, so we don't want to pass those by value, either. So easy solution, add `ref` ? But then, we cannot pass rvalues. A real-world example of this is `doSomething(myData.getHash())` where `getHash` return a `ubyte[64]`.

Luckily we have a `-preview=rvaluerefparam` switch, which should do what I want, right ? Well, as I said multiple times on this forum, it's so utterly broken it's not even funny:
- https://issues.dlang.org/show_bug.cgi?id=20704
- https://issues.dlang.org/show_bug.cgi?id=20705 (I'm sorry, WHAT?)
- https://issues.dlang.org/show_bug.cgi?id=20706

Because of 20705, that switch is completely unusable for any real world application.
There are alternatives to this (which we are using), such as using `auto ref`. But it requires to use templates, which we cannot do with delegates, or virtual methods.

Now I don't really like to rant without having a solution to offer. And it turns out, that's the whole motivation for this post. I have a PR that solves *all* those problems at once. All it needs is a bit of attention / review / feedback!

The PR in question is here: https://github.com/dlang/dmd/pull/11000

What does it do ?
A.0) It fixes `in` to be an actual storage class, not something that is lowered almost immediately.
   This was necessary for the implementation to work, but has two nice side effects:
     1) it fixes error messages (currently `void foo(in int)` will display as `void foo(const(int))` in error messages);
     2) it fixes header generation (`.di` files) so that `in` is kept instead of seeing `const` or `scope const`, depending on `-preview=in`;
   I think this change has value in itself, so I submitted it as a separate PR (https://github.com/dlang/dmd/pull/11474), which itself needs a tiny adjustment in Phobos (https://github.com/dlang/phobos/pull/7570).

A.1) It gives a mangling to `in`: This is necessary to avoid some ambiguity. The main two user-visible side effects will be that older debuggers won't be able to demangle `in`, and that, once we update druntime, stack traces will show the correct signature for functions using `in` (currently they suffer from the same bug as the error message / header generation). This is also part of the aforementioned PR.

B) It makes `in` take the effect of `ref` when it makes sense. It always pass something by `ref` if the type has elaborate construction / destruction (postblit, copy constructor, destructors). If the type doesn't have any of those it is only passed by `ref` if it cannot be passed in register. Some types (dynamic arrays, probably AA in the future) are not affected to allow for covariance (more on that later). The heuristics there still need some small improvements, e.g. w.r.t. floating points (currently the heuristic is based on size, and not asking the backend) and small struct slicing, but that should not affect correctness.

C) It implements covariance rules: if you have a `void toString(scope void delegate(in char[]) sink)` method, you can pass it `void writeToScreen(const scope char[])`. If you have `void output(scope void delegate(in ubyte[64]))` you can pass it `void saveHash(const scope ref ubyte[64])`. Simple stuff.

D) It allows to pass rvalues to `in`. Because we know it's `scope`, so it cannot be escaped (allegedly), and it's `const`, so it cannot be modified, it's only logical that you can give it rvalues.

Interestingly, @benjones pointed out in the PR that this is similar to one of Herb Sutter's proposal for C++: https://youtu.be/qx22oxlQmKc?t=1258

I hope this will generate interest with people hitting the same problem. I tried this with my project (which depends on ~10 libraries including Vibe.d and does C++ interop) and things just worked when changing `scope const auto ref` to `in`, and clearing up a few places where `in` parameters were escaped, or there was both an `in ref` and an `in` overload.

Last, but not least, if this gets accepted it would pave the way for another awesome change, having `checkaction=context` the default for D.
If you look at https://github.com/dlang/druntime/blob/104ac712331e4d3573fc277084334a528b5dadb1/src/core/internal/dassert.d you'll find that sweet `auto ref const scope` everywhere.
July 31, 2020
On Friday, 31 July 2020 at 21:49:25 UTC, Mathias LANG wrote:
> B) It makes `in` take the effect of `ref` when it makes sense.

i like it

> D) It allows to pass rvalues to `in`. Because we know it's `scope`, so it cannot be escaped (allegedly), and it's `const`, so it cannot be modified, it's only logical that you can give it rvalues.

i like this too


I've argued before the compiler should be allowed to optimize this in `in` case anyway so yeah you have my support here.
August 01, 2020
On Friday, 31 July 2020 at 21:49:25 UTC, Mathias LANG wrote:
> B) It makes `in` take the effect of `ref` when it makes sense. It always pass something by `ref` if the type has elaborate construction / destruction (postblit, copy constructor, destructors). If the type doesn't have any of those it is only passed by `ref` if it cannot be passed in register.

You mean if it fits in two registers, it's still passed by reference? 16 bytes is the size of uuid and is better passed by value.
August 01, 2020
I like most of your proposal, but

On 31/07/2020 23:49, Mathias LANG wrote:
> B) It makes `in` take the effect of `ref` when it makes sense. It always pass something by `ref` if the type has elaborate construction / destruction (postblit, copy constructor, destructors). If the type doesn't have any of those it is only passed by `ref` if it cannot be passed in register. Some types (dynamic arrays, probably AA in the future) are not affected to allow for covariance (more on that later). The heuristics there still need some small improvements, e.g. w.r.t. floating points (currently the heuristic is based on size, and not asking the backend) and small struct slicing, but that should not affect correctness.

Please note that many C/C++-ABIs already define similar rules for passing function arguments by value (referencing a copy on the stack). It might not be the best idea to stack two similar, but maybe slightly conflicting rule sets.

Maybe we can leverage that and define that if the ABI uses a reference for an `in`-value, the compiler may/must elide an extra copy. That avoids having to define our own rule set.
August 01, 2020
On Friday, 31 July 2020 at 22:01:06 UTC, Adam D. Ruppe wrote:
> On Friday, 31 July 2020 at 21:49:25 UTC, Mathias LANG wrote:
>> B) It makes `in` take the effect of `ref` when it makes sense.
>
> i like it
>
>> D) It allows to pass rvalues to `in`. Because we know it's `scope`, so it cannot be escaped (allegedly), and it's `const`, so it cannot be modified, it's only logical that you can give it rvalues.
>
> i like this too
>
>
> I've argued before the compiler should be allowed to optimize this in `in` case anyway so yeah you have my support here.

I agree. This is the D way. More simplicity via more inference.

Note that brings the meaning of the `in`-parameter-qualifier very close (it to equal) to its meaning in Ada.
August 04, 2020
On Saturday, 1 August 2020 at 07:48:10 UTC, Rainer Schuetze wrote:
>
> I like most of your proposal, but
>
> On 31/07/2020 23:49, Mathias LANG wrote:
>> B) It makes `in` take the effect of `ref` when it makes sense. It always pass something by `ref` if the type has elaborate construction / destruction (postblit, copy constructor, destructors). If the type doesn't have any of those it is only passed by `ref` if it cannot be passed in register. Some types (dynamic arrays, probably AA in the future) are not affected to allow for covariance (more on that later). The heuristics there still need some small improvements, e.g. w.r.t. floating points (currently the heuristic is based on size, and not asking the backend) and small struct slicing, but that should not affect correctness.
>
> Please note that many C/C++-ABIs already define similar rules for passing function arguments by value (referencing a copy on the stack). It might not be the best idea to stack two similar, but maybe slightly conflicting rule sets.
>
> Maybe we can leverage that and define that if the ABI uses a reference for an `in`-value, the compiler may/must elide an extra copy. That avoids having to define our own rule set.

Do you have a link ? I did some research beforehand, but all I could find was about NRVO and throwing exception, nothing about actually promoting values to references.

Itanium C++ ABI doesn't have anything: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#value-parameter
Nor does MS: https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019#parameter-passing
August 04, 2020

On 04/08/2020 11:35, Mathias LANG wrote:
> On Saturday, 1 August 2020 at 07:48:10 UTC, Rainer Schuetze wrote:
>> Maybe we can leverage that and define that if the ABI uses a reference for an `in`-value, the compiler may/must elide an extra copy. That avoids having to define our own rule set.
> 
> Do you have a link ? I did some research beforehand, but all I could find was about NRVO and throwing exception, nothing about actually promoting values to references.
> 
> Itanium C++ ABI doesn't have anything: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#value-parameter

Well, this already says as much for non-POD data IIUC.

The System V ABI for C that is used for PODs doesn't seem to use references, though.


> Nor does MS: https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019#parameter-passing
> 

This says for non-register-sized data: "Structs or unions of other sizes are passed as a pointer to memory allocated by the caller."
August 04, 2020
First off, this is a great change and I am excited to be able to use it in my own projects. Thanks for championing this.

On Friday, 31 July 2020 at 21:49:25 UTC, Mathias LANG wrote:
> B) It makes `in` take the effect of `ref` when it makes sense. It always pass something by `ref` if the type has elaborate construction / destruction (postblit, copy constructor, destructors). If the type doesn't have any of those it is only passed by `ref` if it cannot be passed in register. Some types (dynamic arrays, probably AA in the future) are not affected to allow for covariance (more on that later). The heuristics there still need some small improvements, e.g. w.r.t. floating points (currently the heuristic is based on size, and not asking the backend) and small struct slicing, but that should not affect correctness.

This optimization should be implemented by querying the backend, not calculated from scratch in the frontend, which is redundant and error-prone; the rules in the current PR are not accurate.

The correct rules are complex, platform-dependent, and depend on the types of each function's full parameter list as a whole, not just each parameter individually. I haven't found anywhere that documents them, but by experimentation with LDC I have discovered the following:

1) The size limit for most types to be passed in registers in x86_64 is twice as large as the PR's threshold, at least on LDC. (Array slices are passed by register because of their size and member types, and do not need to be special-cased. A custom slice-like struct with a pointer and a size member will be passed by register, too.)

2) I say *most* types because there are exceptions; __vector types, and *sometimes* structs that transitively contain only a single __vector member (like 3d homogenous coordinates in graphics programming) are also passed via register, although they may be 8 times the size of a general purpose register when using AVX2 in a 32-bit program, and probably even larger on some other platform.

3) There are limits to how many arguments can be passed via registers. I say "limits", plural, because different data types may consume different types of registers; for example on x86, `int` uses general purpose registers, whereas `float` uses SIMD registers. These limits are architecture-dependent.

August 04, 2020
On Tuesday, 4 August 2020 at 23:18:56 UTC, tsbockman wrote:
>
> This optimization should be implemented by querying the backend, not calculated from scratch in the frontend, which is redundant and error-prone; the rules in the current PR are not accurate.

Indeed. The current rules were put there as a way to get the ball rolling, so to say. My current focus is to get things to compile and pass test on Buildkite, then optimize the rules.
The thing that is not going to change is that types that needs elaborate copy or destruction, and types that are not copyable, will be passed by ref. Additionally, I want to keep covariance for array types, which requires them to be passed by value (although it can be done in registers). The rest, I don't mind changing it.

> The correct rules are complex, platform-dependent, and depend on the types of each function's full parameter list as a whole, not just each parameter individually. I haven't found anywhere that documents them, but by experimentation with LDC I have discovered the following:
>
> [...]

Thanks for the feedback. I'll definitely incorporate it (and Rainer's) into the PR soon-ish, most likely via a call to a backend hook, as is currently done for NRVO.
August 05, 2020
On Friday, 31 July 2020 at 21:49:25 UTC, Mathias LANG wrote:
> B) It makes `in` take the effect of `ref` when it makes sense. [...]
> C) It implements covariance rules
> [...]
> D) It allows to pass rvalues to `in`.

This sounds so great! Thank you for improving `in`!

> I hope this will generate interest with people hitting the same problem.

I've literally yesterday written some new code with `const scope ref` in almost every function to pass large, complex structs. Occasionally, I had to store rvalues temporarily to pass as lvalues (non-templated code). I would rather simply put `in` on those parameters :-) It's a lot easier to grasp function signatures only using `in` and `out` on parameters (and their effect/purpose being immediately obvious to new D programmers!)
« First   ‹ Prev
1 2 3 4