Thread overview
Taking arguments by value or by reference
Oct 03, 2020
Anonymouse
Oct 03, 2020
Max Haughton
Oct 04, 2020
Anonymouse
Oct 04, 2020
Mathias LANG
Oct 05, 2020
Max Haughton
Oct 04, 2020
IGotD-
Oct 04, 2020
Adam D. Ruppe
October 03, 2020
I'm passing structs around (collections of strings) whose .sizeof returns 432.

The readme for 2.094.0 includes the following:

> This release reworks the meaning of in to properly support all those use cases. in parameters will now be passed by reference when optimal, [...]
>
> * Otherwise, if the type's size requires it, it will be passed by reference.
> Currently, types which are over twice the machine word size will be passed by
> reference, however this is controlled by the backend and can be changed based
> on the platform's ABI.

However, I asked in #d a while ago and was told to always pass by value until it breaks, and only then resort to ref.

> [18:32:16] <zorael> at what point should I start passing my structs by ref rather than by value? some are nested in others, so sizeofs range between 120 and 620UL
> [18:33:43] <Herringway> when you start getting stack overflows
> [18:39:09] <zorael> so if I don't need ref for the references, there's no inherent merit to it unless I get in trouble without it?
> [18:39:20] <Herringway> pretty much
> [18:40:16] <Herringway> in many cases the copying is merely theoretical and doesn't actually happen when optimized

I've so far just been using const parameters. What should I be using?
October 03, 2020
On Saturday, 3 October 2020 at 23:00:46 UTC, Anonymouse wrote:
> I'm passing structs around (collections of strings) whose .sizeof returns 432.
>
> The readme for 2.094.0 includes the following:
>
>> This release reworks the meaning of in to properly support all those use cases. in parameters will now be passed by reference when optimal, [...]
>>
>> * Otherwise, if the type's size requires it, it will be passed by reference.
>> Currently, types which are over twice the machine word size will be passed by
>> reference, however this is controlled by the backend and can be changed based
>> on the platform's ABI.
>
> However, I asked in #d a while ago and was told to always pass by value until it breaks, and only then resort to ref.
>
>> [18:32:16] <zorael> at what point should I start passing my structs by ref rather than by value? some are nested in others, so sizeofs range between 120 and 620UL
>> [18:33:43] <Herringway> when you start getting stack overflows
>> [18:39:09] <zorael> so if I don't need ref for the references, there's no inherent merit to it unless I get in trouble without it?
>> [18:39:20] <Herringway> pretty much
>> [18:40:16] <Herringway> in many cases the copying is merely theoretical and doesn't actually happen when optimized
>
> I've so far just been using const parameters. What should I be using?

Firstly, the new in semantics are very new and possibly subtly broken (take a look at the current thread in general).

Secondly, as to the more specific question of how to pass a big struct around it may be helpful to look at this quick godbolt example (https://d.godbolt.org/z/nPvTWz). Pay attention to the instructions writing to stack memory (or not). A struct that big will be passed around on the stack, whether it gets copied or not depends on the semantics of the struct etc.

The guiding principle to your function parameters should be correctness - if I am passing a big struct around, if I want to take ownership of it I probably want to take it by value but if I want to modify it I should take it by reference (or by pointer but don't overcomplicate, notice in the previous example they lower to the same thing). If I just want to look at it, it should be taken by const ref if possible (D const isn't the same as C++ const, this may catch you out).

Const-correctness is a rule to live by especially with an big unwieldy struct.

I would avoid the new in for now, but I would go with const ref from what you've described so far.

October 04, 2020
On Saturday, 3 October 2020 at 23:47:32 UTC, Max Haughton wrote:
> The guiding principle to your function parameters should be correctness - if I am passing a big struct around, if I want to take ownership of it I probably want to take it by value but if I want to modify it I should take it by reference (or by pointer but don't overcomplicate, notice in the previous example they lower to the same thing). If I just want to look at it, it should be taken by const ref if possible (D const isn't the same as C++ const, this may catch you out).
>
> Const-correctness is a rule to live by especially with an big unwieldy struct.
>
> I would avoid the new in for now, but I would go with const ref from what you've described so far.

I mostly really only want a read-only view of the struct, and whether a copy was done or not is academic. However, profiling showed (what I interpret as) a lot of copying being done in release builds specifically.

https://i.imgur.com/JJzh4Zc.jpg

Naturally a situation where I need ref I'd use ref, and in the rare cases where it actually helps to have a mutable copy directly I take it mutable. But if I understand what you're saying, and ignoring --preview=in, you'd recommend I use const ref where I would otherwise use const?

Is there some criteria I can go by when making this decision, or does it always reduce to looking at the disassembly?
October 04, 2020
On Saturday, 3 October 2020 at 23:00:46 UTC, Anonymouse wrote:
> I'm passing structs around (collections of strings) whose .sizeof returns 432.
>
> The readme for 2.094.0 includes the following:
>
>> This release reworks the meaning of in to properly support all those use cases. in parameters will now be passed by reference when optimal, [...]
>>
>> * Otherwise, if the type's size requires it, it will be passed by reference.
>> Currently, types which are over twice the machine word size will be passed by
>> reference, however this is controlled by the backend and can be changed based
>> on the platform's ABI.
>
> However, I asked in #d a while ago and was told to always pass by value until it breaks, and only then resort to ref.
>
>> [18:32:16] <zorael> at what point should I start passing my structs by ref rather than by value? some are nested in others, so sizeofs range between 120 and 620UL
>> [18:33:43] <Herringway> when you start getting stack overflows
>> [18:39:09] <zorael> so if I don't need ref for the references, there's no inherent merit to it unless I get in trouble without it?
>> [18:39:20] <Herringway> pretty much
>> [18:40:16] <Herringway> in many cases the copying is merely theoretical and doesn't actually happen when optimized
>
> I've so far just been using const parameters. What should I be using?

I don't agree with this, especially if the struct is 432 bytes. It takes time and memory to copy such structure. I always use "const ref" when I pass structures because that's only a pointer. Classes are references by themselves so its not applicable there. Only "ref" when I want to modify the contents.

However there are some exceptions to this rule in D as D support slice parameters. In this case you want a copy as slice of the array, often because the slice is often casted from something else. Basically the array slice parameter become an lvalue.

This copy of parameters to the stack is an abomination in computer science and only useful in some cases but mostly not. The best would be if the compiler itself could determine what is the most efficient. Nim does this and it was not long ago suggested that the "in" keyword should have a new life as such optimization, is that the change that has entered in 2.094.0? Why wasn't this a DIP?

I even see this in some C++ program code where strings are passed as value which means that the string is copied including a possible memory allocation which certainly slow things down.

Do not listen to people who says "pass everything by value" because that is in general not ideal in imperative languages.

October 04, 2020
On Sunday, 4 October 2020 at 14:26:43 UTC, Anonymouse wrote:
> [...]
>
> I mostly really only want a read-only view of the struct, and whether a copy was done or not is academic. However, profiling showed (what I interpret as) a lot of copying being done in release builds specifically.
>
> https://i.imgur.com/JJzh4Zc.jpg
>
> Naturally a situation where I need ref I'd use ref, and in the rare cases where it actually helps to have a mutable copy directly I take it mutable. But if I understand what you're saying, and ignoring --preview=in, you'd recommend I use const ref where I would otherwise use const?
>
> Is there some criteria I can go by when making this decision, or does it always reduce to looking at the disassembly?

If the struct adds overhead to copy, use `const ref`. But if you do, you might end up with another set of problems. Aliasing is one of them, and the dangers of it are discussed at length in the thread about `-preview=in` in general. The other issue is that `const ref` means you cannot pass rvalues.
This is when people usually turn towards `auto ref`. Unfortunately, it requires you to use templates, which is not always possible.

So, in short: `auto ref const` if it's a template and aliasing is not a concern, `const ref` if the copy adds overhead, and add a `const` non-`ref` overload to deal with rvalues if needed. If you want to be a bit more strict, throwing `scope` in the mix is good practice, too.

----------

Now, about `-preview=in`: The aim of this switch is to address *exactly* this use case. While it is still experimental and I don't recommend using it in critical projects just yet, giving it a try should be straightforward and any feedback is appreciated.

What I mean by "should be straightforward", is that the only thing `-preview=in` will complain about is `in ref` (it triggers an error).

The main issue at the moment is that, if you use `dub`, you need to have control over the dependencies to add a configuration, or use `DFLAGS="-preview=in" dub` in order for it to work. Working on a fix to that right now.

For reference, this is what adapting code  to use `-preview=in` feels like in my project: https://github.com/Geod24/agora/commit/a52419851a7e6e4ef241c4617ebe0c8cc0ebe5cc
You can see that I added it pretty much everywhere the type `Hash` was used, because `Hash` is a 64 bytes struct but I needed to support rvalues.
October 04, 2020
On Sunday, 4 October 2020 at 15:30:48 UTC, IGotD- wrote:
> I don't agree with this, especially if the struct is 432 bytes. It takes time and memory to copy such structure.

If the compiler chooses to inline the function (which happens quite frequently with optimizations turned on), no copy takes place regardless of how you write it if the compiler can see it is unnecessary.

Returning a struct by value rarely means a copy either since the compiler actually passed a pointer to where it wants it up front, so it is constructed in-place.

So like "pass by value" in the language is not necessarily big copies in the generated binary. That's why the irc folks were advising to not worry about it unless you see a problem coming up that the profiles points here.
October 05, 2020
On Sunday, 4 October 2020 at 14:26:43 UTC, Anonymouse wrote:
> On Saturday, 3 October 2020 at 23:47:32 UTC, Max Haughton wrote:
>> The guiding principle to your function parameters should be correctness - if I am passing a big struct around, if I want to take ownership of it I probably want to take it by value but if I want to modify it I should take it by reference (or by pointer but don't overcomplicate, notice in the previous example they lower to the same thing). If I just want to look at it, it should be taken by const ref if possible (D const isn't the same as C++ const, this may catch you out).
>>
>> Const-correctness is a rule to live by especially with an big unwieldy struct.
>>
>> I would avoid the new in for now, but I would go with const ref from what you've described so far.
>
> I mostly really only want a read-only view of the struct, and whether a copy was done or not is academic. However, profiling showed (what I interpret as) a lot of copying being done in release builds specifically.
>
> https://i.imgur.com/JJzh4Zc.jpg
>
> Naturally a situation where I need ref I'd use ref, and in the rare cases where it actually helps to have a mutable copy directly I take it mutable. But if I understand what you're saying, and ignoring --preview=in, you'd recommend I use const ref where I would otherwise use const?
>
> Is there some criteria I can go by when making this decision, or does it always reduce to looking at the disassembly?

This is skill you only really hone with experience, but it's not too bad once you're used to it.

For a big struct, I would just stick to expressing what you want it to *do* rather than how you want it to perform. If you want to take ownership you basically have to take by value, but if you (as you said) want a read only view definitely const ref. If I was reading your code, ref immediately tells me not to think about ownership and const ref immediately tells me you just want to look at the goods.

One thing I haven't mentioned so far is that not all types have non-trivial semantics when it comes to passing them around by value, so if you are writing generic code it is often best to avoid these.