rvalues -> ref (yup... again!) (page 7) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » rvalues -> ref (yup... again!) (page 7)

March 27, 2018

Re: rvalues -> ref (yup... again!)

Posted by H. S. Teoh
in reply to Rubn

H. S. Teoh

Posted in reply to Rubn

On Tue, Mar 27, 2018 at 08:25:36PM +0000, Rubn via Digitalmars-d wrote: [...]
> _D7example__T3fooTSQr3FooZQnFNbNiNfQrZv:
>   push rbp
>   mov rbp, rsp
>   sub rsp, 3104
>   lea rax, [rbp + 16]
>   lea rdi, [rbp - 2048]
>   lea rcx, [rbp - 1024]
>   mov edx, 1024
>   mov rsi, rcx
>   mov qword ptr [rbp - 2056], rdi
>   mov rdi, rsi
>   mov rsi, rax
>   mov qword ptr [rbp - 2064], rcx
>   call memcpy@PLT    <--------------------- hidden copy
[...]

Is this generated by dmd, or gdc/ldc?

Generally, when it comes to performance issues, I don't even bother looking at dmd-generated code anymore.  If the extra copying is still happening with gdc -O2 / ldc -O, then you have a point. Otherwise, it doesn't really say very much.

T

-- 
People tell me that I'm skeptical, but I don't believe them.

March 27, 2018

Re: rvalues -> ref (yup... again!)

Posted by Rubn
in reply to H. S. Teoh

Rubn

Posted in reply to H. S. Teoh

On Tuesday, 27 March 2018 at 20:38:35 UTC, H. S. Teoh wrote:
> On Tue, Mar 27, 2018 at 08:25:36PM +0000, Rubn via Digitalmars-d wrote: [...]
>> _D7example__T3fooTSQr3FooZQnFNbNiNfQrZv:
>>   push rbp
>>   mov rbp, rsp
>>   sub rsp, 3104
>>   lea rax, [rbp + 16]
>>   lea rdi, [rbp - 2048]
>>   lea rcx, [rbp - 1024]
>>   mov edx, 1024
>>   mov rsi, rcx
>>   mov qword ptr [rbp - 2056], rdi
>>   mov rdi, rsi
>>   mov rsi, rax
>>   mov qword ptr [rbp - 2064], rcx
>>   call memcpy@PLT    <--------------------- hidden copy
> [...]
>
> Is this generated by dmd, or gdc/ldc?
>
> Generally, when it comes to performance issues, I don't even bother looking at dmd-generated code anymore.  If the extra copying is still happening with gdc -O2 / ldc -O, then you have a point. Otherwise, it doesn't really say very much.
>
>
> T

It happens with LDC too, not sure how it would be able to know to do any kind of optimization like that unless it was able to inline every single function called into one function and be able to do optimize it from there. I don't imagine that'll be likely though.

March 27, 2018

Re: rvalues -> ref (yup... again!)

Posted by H. S. Teoh
in reply to Rubn

H. S. Teoh

Posted in reply to Rubn

On Tue, Mar 27, 2018 at 09:52:25PM +0000, Rubn via Digitalmars-d wrote:
> On Tuesday, 27 March 2018 at 20:38:35 UTC, H. S. Teoh wrote:
> > On Tue, Mar 27, 2018 at 08:25:36PM +0000, Rubn via Digitalmars-d wrote: [...]
> > > _D7example__T3fooTSQr3FooZQnFNbNiNfQrZv:
> > >   push rbp
> > >   mov rbp, rsp
> > >   sub rsp, 3104
> > >   lea rax, [rbp + 16]
> > >   lea rdi, [rbp - 2048]
> > >   lea rcx, [rbp - 1024]
> > >   mov edx, 1024
> > >   mov rsi, rcx
> > >   mov qword ptr [rbp - 2056], rdi
> > >   mov rdi, rsi
> > >   mov rsi, rax
> > >   mov qword ptr [rbp - 2064], rcx
> > >   call memcpy@PLT    <--------------------- hidden copy
> > [...]
> > 
> > Is this generated by dmd, or gdc/ldc?
> > 
> > Generally, when it comes to performance issues, I don't even bother looking at dmd-generated code anymore.  If the extra copying is still happening with gdc -O2 / ldc -O, then you have a point. Otherwise, it doesn't really say very much.
> > 
> > 
> > T
> 
> It happens with LDC too, not sure how it would be able to know to do any kind of optimization like that unless it was able to inline every single function called into one function and be able to do optimize it from there.  I don't imagine that'll be likely though.

You'll be surprised.  Don't underestimate the power of modern optimizers.  I've seen LDC do inlining that's so aggressive, that it essentially evaluated an entire series of function calls at compile-time (likely on the IR) and generated a single instruction to load the answer into the return register at runtime. :-D  Of course, it still generated the individual functions, but those are never actually called at runtime.

(On one occasion, this produced odd-looking "benchmark" results where the ldc executable computed the answer in exactly 0ms, whereas everyone else took a lot longer than that. :-D  (Well, it was probably a few nanosecs while the CPU decoded and ran the instruction, but I don't think any benchmark could measure that!))

For your code example, you might want to look at the code generated for callers of the function, since when compiling individual functions in isolation, LDC is obligated to follow the ABI, which could include redundant copying. But if inlining was possible, it could generate very different code.

T

-- 
Dogs have owners ... cats have staff. -- Krista Casada

March 27, 2018

Re: rvalues -> ref (yup... again!)

Posted by kinke
in reply to Rubn

kinke

Posted in reply to Rubn

On Tuesday, 27 March 2018 at 21:52:25 UTC, Rubn wrote:
> It happens with LDC too, not sure how it would be able to know to do any kind of optimization like that unless it was able to inline every single function called into one function and be able to do optimize it from there. I don't imagine that'll be likely though.

It does it in your code sample with `-O`, there's no call to bar and the foo() by-value arg is memcpy'd to the global.

If you compile everything with LTO, your code and all 3rd-party libs as well as druntime/Phobos, LLVM is able to optimize the whole program as if it were inside a single gigantic 'object' file in LLVM bitcode IR, and is thus indeed theoretically able to inline *all* functions.

March 27, 2018

Re: rvalues -> ref (yup... again!)

Posted by Rubn
in reply to kinke

Rubn

Posted in reply to kinke

On Tuesday, 27 March 2018 at 23:35:44 UTC, kinke wrote:
> On Tuesday, 27 March 2018 at 21:52:25 UTC, Rubn wrote:
>> It happens with LDC too, not sure how it would be able to know to do any kind of optimization like that unless it was able to inline every single function called into one function and be able to do optimize it from there. I don't imagine that'll be likely though.
>
> It does it in your code sample with `-O`, there's no call to bar and the foo() by-value arg is memcpy'd to the global.
>
> If you compile everything with LTO, your code and all 3rd-party libs as well as druntime/Phobos, LLVM is able to optimize the whole program as if it were inside a single gigantic 'object' file in LLVM bitcode IR, and is thus indeed theoretically able to inline *all* functions.

A bit off topic now but anyways:

Well that example I posted didn't do anything, so it would optimize it out quite easily. The entire function was excluded essentially. Just adding a few writeln it isn't able to remove the function entirely anymore and can't optimize it out. Idk if you want to try some different options but flto didn't do anything for it.

https://godbolt.org/g/bLdpnm

import std.stdio : writeln;

struct Foo {
    ubyte[1024] data;

    this(int a)
    {
        data[0] = cast(ubyte)a;
    }
}

void foo(T)(auto ref T t) {
    import std.functional: forward;
    writeln(gfoo.data[0]);
    bar(forward!t);
    writeln(gfoo.data[0]);
}

__gshared Foo gfoo;

void bar(T)(auto ref T t) {
    import std.algorithm.mutation : move;
    writeln(gfoo.data[0]);
    move(t, gfoo);
}

void main() {
    foo(Foo(10));
}

March 28, 2018

Re: rvalues -> ref (yup... again!)

Posted by kinke
in reply to kinke

kinke

Posted in reply to kinke

On Tuesday, 27 March 2018 at 23:35:44 UTC, kinke wrote:
> On Tuesday, 27 March 2018 at 21:52:25 UTC, Rubn wrote:
>> It happens with LDC too, not sure how it would be able to know to do any kind of optimization like that unless it was able to inline every single function called into one function and be able to do optimize it from there. I don't imagine that'll be likely though.
>
> It does it in your code sample with `-O`, there's no call to bar and the foo() by-value arg is memcpy'd to the global.

For reference: https://run.dlang.io/is/2vDEXP
Note that main() boils down to a `memset(&gfoo, 10, 1024); return 0;`:

_Dmain:
	.cfi_startproc
	pushq	%rax
.Lcfi0:
	.cfi_def_cfa_offset 16
	data16
	leaq	onlineapp.Foo onlineapp.gfoo@TLSGD(%rip), %rdi
	data16
	data16
	rex64
	callq	__tls_get_addr@PLT
	movl	$10, %esi
	movl	$1024, %edx
	movq	%rax, %rdi
	callq	memset@PLT
	xorl	%eax, %eax
	popq	%rcx
	retq

March 28, 2018

Re: rvalues -> ref (yup... again!)

Posted by kinke
in reply to Rubn

kinke

Posted in reply to Rubn

On Tuesday, 27 March 2018 at 23:59:09 UTC, Rubn wrote:
> Just adding a few writeln it isn't able to remove the function entirely anymore and can't optimize it out.

Well writeln() here involves number -> string formatting, GC, I/O, template bloat... There are indeed superfluous memcpy's in your foo() there (although the forward and bar calls are still inlined), which after a quick glance seem to be LLVM optimizer shortcomings, the IR emitted by LDC looks fine.
For an abitrary external function, it's all fine as it should be, boiling down to a single memcpy in foo() and a direct memset in main(): https://run.dlang.io/is/O1aeLK

March 28, 2018

Re: rvalues -> ref (yup... again!)

Posted by Rubn
in reply to kinke

Rubn

Posted in reply to kinke

On Wednesday, 28 March 2018 at 00:56:29 UTC, kinke wrote:
> On Tuesday, 27 March 2018 at 23:59:09 UTC, Rubn wrote:
>> Just adding a few writeln it isn't able to remove the function entirely anymore and can't optimize it out.
>
> Well writeln() here involves number -> string formatting, GC, I/O, template bloat... There are indeed superfluous memcpy's in your foo() there (although the forward and bar calls are still inlined), which after a quick glance seem to be LLVM optimizer shortcomings, the IR emitted by LDC looks fine.
> For an abitrary external function, it's all fine as it should be, boiling down to a single memcpy in foo() and a direct memset in main(): https://run.dlang.io/is/O1aeLK

Well somethings wrong if writeln causes optimization to not occur, if that is the case then it'd be best to just use printf() instead. Anyways using small examples to show optimization is usually not what's going to happen in actual code. Functions are rarely that simple, and if adding a single writeln() to a call is enough to eliminate that optimization, I can only imagine what other little things do as well.

March 28, 2018

Re: rvalues -> ref (yup... again!)

Posted by Kagamin
in reply to Manu

Kagamin

Posted in reply to Manu

On Friday, 23 March 2018 at 22:01:44 UTC, Manu wrote:
> By contrast, people will NOT forgive the fact that they have to change:
>
>     func(f(x), f(y), f(z));
>
> to:
>
>     T temp = f(x);
>     T temp2 = f(y);
>     T temp3 = f(z);
>     func(temp, temp2, temp3);
>
> That's just hideous and in-defensible.
>
> A better story would be:
>
>     func(f(x), f(y), f(z));
> =>
>     func(x.f, y.f, z.f);

Another workaround:

auto r(T)(T a)
{
    struct R { T val; }
    return R(a);
}
void f(in ref int p);
int main()
{
    f(1.r.val);
    return 0;
}

March 28, 2018

Re: rvalues -> ref (yup... again!)

Posted by Timon Gehr
in reply to Manu

Timon Gehr

Posted in reply to Manu

On 27.03.2018 20:14, Manu wrote:
> That's exactly what I've been saying. For like, 9 years..
> It looks like this:
> https://github.com/TurkeyMan/DIPs/blob/ref_args/DIPs/DIP1xxx-rval_to_ref.md
>   (contribution appreciated)
> 
> As far as I can tell, it's completely benign, it just eliminates the
> annoying edge cases when interacting with functions that take
> arguments by ref. There's no spill-over affect anywhere that I'm aware
> of, and if you can find a single wart, I definitely want to know about
> it.

???

> I've asked so many times for a technical destruction, nobody will
> present any opposition that is anything other than a rejection *in
> principle*. This is a holy war, not a technical one.

That's extremely unfair. It is just a bad idea to overload D const for this purpose. Remove the "const" requirement and I'm on board.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation