Jump to page: 1 25  
Page
Thread overview
About structs and performant handling
Mar 09, 2013
Namespace
Mar 09, 2013
Namespace
Mar 09, 2013
bearophile
Mar 09, 2013
Namespace
Mar 09, 2013
Ali Çehreli
Mar 10, 2013
Marco Leise
Mar 10, 2013
Daniel Murphy
Mar 10, 2013
Namespace
Mar 10, 2013
Daniel Murphy
Mar 10, 2013
Namespace
Mar 10, 2013
Daniel Murphy
Mar 10, 2013
Namespace
Mar 10, 2013
Daniel Murphy
Mar 10, 2013
Namespace
Mar 10, 2013
deadalnix
Mar 11, 2013
Daniel Murphy
Mar 10, 2013
Namespace
Mar 10, 2013
deadalnix
May 08, 2013
Namespace
May 08, 2013
Ali Çehreli
May 08, 2013
Namespace
Mar 10, 2013
Namespace
Mar 10, 2013
deadalnix
Mar 10, 2013
Namespace
Mar 11, 2013
deadalnix
Mar 11, 2013
Namespace
Mar 11, 2013
Namespace
Mar 12, 2013
Zach the Mystic
Mar 12, 2013
deadalnix
Mar 12, 2013
Namespace
Mar 12, 2013
deadalnix
Mar 12, 2013
Namespace
Mar 12, 2013
deadalnix
Mar 12, 2013
Namespace
Mar 12, 2013
deadalnix
Mar 12, 2013
Namespace
Mar 12, 2013
deadalnix
Mar 12, 2013
Namespace
Mar 12, 2013
deadalnix
Mar 12, 2013
Namespace
Mar 12, 2013
Namespace
Mar 12, 2013
Marco Leise
Mar 10, 2013
Namespace
March 09, 2013
I would first like to apologize my bad English.

I like to suggest a possible solution for the current rvalue ref problem (especially for structs).
My suggestion is based on the idea of 'auto ref' and the proposal of deadalnix which was discussed here: http://forum.dlang.org/thread/funxviipfkdftmdfyrfk@forum.dlang.org?page=1

Smaller structs are very performant if you move or copy them. This is more performant as to create them only to pass them by ref.
But structs with a bigger size aren't that performant if you pass them as copy or move them (See also my small benchmark: http://dpaste.1azy.net/edit/b9624e01).
So I like to suggest a new syntax for this behaviour.
If you want that your function/method/whatever takes a struct as well as rvalue and lvalue, you declare this parameter with a '&' (Or whatever, the syntax doesn't matter at all. I like the '&' because I know it from C++ and many (C++) Newcomer will know what it means. Furthermore '&' is a lot shorter than eg. 'auto ref').
For example:
[code]
struct A { }
void foo(A& a) { }
[/code]
The compiler will check by these kind of parameters if they are structs and if the size is proven greater as N (maybe 16 - 24) bit. If not, the '&' will be ignored. The function take in this cases normally lvalues as copy and moves rvalues.
But if the struct size is greater than N the compiler changes the storage class of this parameter to ref.
Example:
If you have another struct B with the following structure (instead of the lightweight struct A):
[code]
struct B {
public:
	int[100] ids;
}
[/code]
the method 'foo' will be changed to:
[code]
void foo(ref B b) { }
[/code]
In this case lvalues are taken by ref and in case that a rvalue is used, a temporary variable is created and  passed to the function (like C++ does). Or if you don't like temporary variables you could also use a wrapper, something like:
[code]
@property
ref T make(T, Args...)(Args args) {
	static if (args.length != 0) {
		static T result = void;
		T _temp = T(args);
		memcpy(&result, &_temp, T.sizeof);
	} else {
		static T result;
	}
	
	return result;
}
[/code]
So the two possible kind of calls to foo would be:
[code]
B b;
foo(b);
[/code]
and
[code]foo(B());[/code]
which is converted to:
[code]foo(make!B);[/code]

I see a lot of potential in this solution, because the compiler is taking care about gaining the most powerfull/performant code, it's simple (and simple is always good) AND it could solve the rvalue ref problem.
And now you can behead me.
March 09, 2013
> (See also my small benchmark: http://dpaste.1azy.net/edit/b9624e01).
Wrong link...
I meant: http://dpaste.1azy.net/b9624e01
March 09, 2013
Namespace:

> Wrong link...
> I meant: http://dpaste.1azy.net/b9624e01

Benchmarks on dpaste aren't very useful because I think no optimization switches are used, and because the CPU is not under control, so other unknown tasks can steal some of its time.

Bye,
bearophile
March 09, 2013
> Benchmarks on dpaste aren't very useful because I think no optimization switches are used, and because the CPU is not under control, so other unknown tasks can steal some of its time.
>
> Bye,
> bearophile

I used optimation switches:
Application arguments:
-O -release -noboundscheck

But you're right, but what should I do?
I could deliver you my results from my pc:
[quote]
Call b0 (B by ref).  Duration: 259 total, 0.129500 average.
Call b1 (B by move). Duration: 804 total, 0.402000 average.
Call b2 (B by make). Duration: 364 total, 0.182000 average.
Call b3 (B by copy). Duration: 943 total, 0.471500 average.
Call b4 (B by manual move). Duration: 1101 total, 0.550500 average.
Call b5  (A by move). Duration: 17 total, 0.008500 average.
Call b6 (A by copy). Duration: 65 total, 0.032500 average.
Call b7 (A by ref).  Duration: 47 total, 0.023500 average.
Call b8 (A by make). Duration: 54 total, 0.027000 average.
[/quote]
Also compiled with -O -release -noboundscheck on a Intel i5-2500k CPU with 3.30 GHz.
But the script is there. So you could test by yourself. :)
March 09, 2013
On 03/09/2013 12:19 PM, Namespace wrote:

> But structs with a bigger size aren't that performant if you pass them
> as copy or move them (See also my small benchmark:

I have started working on the DConf presentation about copy and move semantics in D. I have done exactly the same type of tests and was surprised how faster pass-by-reference can be.

To be fair, I have also accessed the members of the structs inside the function to see whether the pointer dereferencing in the by-ref case brought any cost. Apparently I have been ignorant in modern CPU designs because I was surprised to see that pointer dereferencing seemingly had no cost at all. My guess would be that the object is completely inside the processor's cache.

Then I suspected dmd and made similar tests with gcc in the C language and have seen similar results. So yes, apparently by-ref is faster at least in some cases.

> For example:
> [code]
> struct A { }
> void foo(A& a) { }
> [/code]
> The compiler will check by these kind of parameters if they are structs
> and if the size is proven greater as N (maybe 16 - 24) bit. If not, the
> '&' will be ignored. The function take in this cases normally lvalues as
> copy and moves rvalues.
> But if the struct size is greater than N the compiler changes the
> storage class of this parameter to ref.

I hope others with compiler knowledge will chime in here.

I think the type of the parameter that is passed is intrinsic to how the function gets compiled. I think, for that to work, the compiler would have to compile two versions of the function, one taking by-value and the other taking by-ref.

If what I said above is correct, then of course that wouldn't scale, e.g. we would need four separate compilations of the function if we had two parameters.

Then there would be the issue of finding a naming scheme for these separate versions of the function so that the linker finds the right one. I am making up some names for the linker: foo_val_val(), foo_val_ref(), foo_ref_val(), foo_ref_ref().

Others, please correct me if I am wrong above. :)

Ali

March 10, 2013
Am Sat, 09 Mar 2013 15:07:49 -0800
schrieb Ali Çehreli <acehreli@yahoo.com>:

> Apparently I have been ignorant in modern CPU designs because I was surprised to see that pointer dereferencing seemingly had no cost at all. My guess would be that the object is completely inside the processor's cache.

Be aware of several things playing together here: L1 and L2
cache as well as prefetching and order of the data in memory.
If you create a few KiB of data and run it through a test its
all in the L1 cache and blazing fast. If you have a game and
load a matrix struct from somewhere scattered in memory you'll
see the massive access penalty.
The modern prefetchers in CPUs keep track of a N streams of
forward or backward serial memory accesses. So they work
perfectly for iterating an array for example. The work in the
"background" and use free memory bandwidth to load data from
RAM to CPU caches before you actually need it. This hides the
memory delay that has become increasingly larger in the past
years. It is so important that many don't optimize for CPU
cycles anymore but instead for memory access and cache
locality:

* http://en.wikipedia.org/wiki/Judy_array

* http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf

Its easy to underestimate the effects until you benchmark with some several MiB large random memory access patterns and see how you get close to a 100 times slow down.

-- 
Marco

March 10, 2013
"Ali Çehreli" <acehreli@yahoo.com> wrote in message news:khgfc6$1m9i$1@digitalmars.com...
>
> To be fair, I have also accessed the members of the structs inside the function to see whether the pointer dereferencing in the by-ref case brought any cost. Apparently I have been ignorant in modern CPU designs because I was surprised to see that pointer dereferencing seemingly had no cost at all. My guess would be that the object is completely inside the processor's cache.
>

Accessing a member of a stuct on the stack:
mov EDX, dword ptr [ESP+stackoffset+memberoffset]

Accessing a member of a struct on the heap:
(assume pointer to struct is in EAX)
mov EDX, dword ptr [EAX+memberoffset]

A lot of the time the heap pointer will be in a register already.  Stack memory will almost always be in the caches, and so will recently used heap memory.

If you want to measure the cost of loading the heap pointer then dereferencing, you might want to mark all the registers as used so the compiler is forced to reload.

eg with
asm {}

>
> > For example:
> > [code]
> > struct A { }
> > void foo(A& a) { }
> > [/code]
> > The compiler will check by these kind of parameters if they are structs
> > and if the size is proven greater as N (maybe 16 - 24) bit. If not, the
> > '&' will be ignored. The function take in this cases normally lvalues as
> > copy and moves rvalues.
> > But if the struct size is greater than N the compiler changes the
> > storage class of this parameter to ref.
>
> I hope others with compiler knowledge will chime in here.
>
> I think the type of the parameter that is passed is intrinsic to how the function gets compiled. I think, for that to work, the compiler would have to compile two versions of the function, one taking by-value and the other taking by-ref.
>

A better way to do with would be to change (or extend) the abi, so that structs over a certain size are always passed by reference with this parameter type.  Then you only need one version of the function.  We could use auto ref for this.


March 10, 2013
> I think the type of the parameter that is passed is intrinsic to how the function gets compiled. I think, for that to work, the compiler would have to compile two versions of the function, one taking by-value and the other taking by-ref.

If we have this we have still the problem that moving a big struct is slow.
My 'make' is a lot faster. Therefore I suggested this new behaviour instead of the old deliberations about 'auto ref'. And I thought, because it is probable, that 'auto ref' will never work for non-template function, my Idea would be a nice alternative. :)
March 10, 2013
> A better way to do with would be to change (or extend) the abi, so that
> structs over a certain size are always passed by reference with this
> parameter type.
And what if you want to pass it by value?
I am a opponent of such automatic and uncontrollable compiler intervention. Better to manually identify something to be on the safe side.

> Then you only need one version of the function.  We could
> use auto ref for this.
'auto ref' will probably never work for non-template functions, as Jonathan said some time ago.
March 10, 2013
"Namespace" <rswhite4@googlemail.com> wrote in message news:qmznbrplexqrdqahgaus@forum.dlang.org...
>> A better way to do with would be to change (or extend) the abi, so that structs over a certain size are always passed by reference with this parameter type.
> And what if you want to pass it by value?

You don't mark it with '&'.

> I am a opponent of such automatic and uncontrollable compiler intervention. Better to manually identify something to be on the safe side.
>

Then I don't understand the point.  If it doesn't need to be automatic, you can just do it in your own code (use ref/not ref)

>> Then you only need one version of the function.  We could use auto ref for this.
> 'auto ref' will probably never work for non-template functions, as Jonathan said some time ago.

That is not my understanding.


« First   ‹ Prev
1 2 3 4 5