Jump to page: 1 26  
Page
Thread overview
Optimize my code =)
Feb 14, 2014
Robin
Feb 14, 2014
Craig Dillabaugh
Feb 14, 2014
John Colvin
Feb 14, 2014
Craig Dillabaugh
Feb 14, 2014
bearophile
Feb 14, 2014
Xinok
Feb 14, 2014
bearophile
Feb 14, 2014
Chris Cain
Feb 14, 2014
bearophile
Feb 14, 2014
thedeemon
Feb 14, 2014
Kapps
Feb 14, 2014
Nick Sabalausky
Feb 14, 2014
bearophile
Feb 15, 2014
Robin
Feb 15, 2014
Stanislav Blinov
Feb 16, 2014
bearophile
Feb 16, 2014
bearophile
Feb 16, 2014
bearophile
Feb 16, 2014
Robin
Feb 17, 2014
Stanislav Blinov
Feb 17, 2014
bearophile
Feb 17, 2014
Robin
Feb 17, 2014
John Colvin
Feb 17, 2014
bearophile
Feb 17, 2014
Robin
Feb 17, 2014
bearophile
Feb 17, 2014
Nick Sabalausky
Feb 18, 2014
Stanislav Blinov
Feb 18, 2014
Stanislav Blinov
Feb 18, 2014
bearophile
Feb 18, 2014
Stanislav Blinov
Feb 18, 2014
bearophile
Feb 18, 2014
Stanislav Blinov
Feb 18, 2014
bearophile
Feb 18, 2014
Stanislav Blinov
Feb 18, 2014
Stanislav Blinov
Feb 18, 2014
Kapps
Feb 18, 2014
Robin
Feb 19, 2014
bearophile
Feb 19, 2014
Stanislav Blinov
Feb 20, 2014
Robin
Feb 20, 2014
bearophile
Feb 21, 2014
John Colvin
Feb 22, 2014
Robin
Feb 18, 2014
bearophile
Feb 17, 2014
Kapps
Feb 17, 2014
bearophile
Feb 17, 2014
Nick Sabalausky
Feb 16, 2014
Nick Sabalausky
Feb 18, 2014
Chris Williams
February 14, 2014
Hiho,

I am fairly new to the D programming and still reading throught the awesome online book at http://ddili.org/ders/d.en/index.html.
However, I think it is missing some things or I am missing glasses and overlooked these parts in the book.^^

Currently I am trying to write a very simple Matrix library for some minor linear algebra operations such as calculating determine, some simple compositions etc. I am aware that this probably has been done already and I am mainly focusing learning D with this project. =)

I already wrote a similar Matrix library for Java and C++ with more or less the same algorithms in order to benchmark these languages.

As I am very new to D I instantly ran into certain optimizing issues. E.g. the simple matrix multiplication based in my java implementation requires about 1.5 secs for multiplying two 1000x1000 matrices, however my D implementation requires 39 seconds without compiler optimizations and about 14 seconds with -inline, -O, -noboundscheck and -release. So I am sure that I just missed some possible optimization routines for D and that I could miss entire patters used in D to write more optimized code - especially when looking at the const and immutability concepts.

I won't paste the whole code, just the important parts.

class Matrix(T = double) {
	private T[] data;
	private Dimension dim;
}

This is my class with its templated data as a one dimensional array (as I don't like jagged-arrays) and a dimension (which is a struct). The dimension object has some util functionality such as getting the total size or mapping (row, col) to a one-dimensional index besides the getter for rows and columns.

this(size_t rows, size_t cols) {
	this.dim = Dimension(rows, cols);
	this.data = new T[this.dim.size];
	enum nil = to!T(0);
	foreach(ref T element; this.data) element = nil;
}

This is one of the constructor. Its task is just to create a new Matrix instance filled with what is 0 for the templated value - don't know if there is a better approach for this. I experienced that floating point values are sadly initialized with nan which isn't what I wanted -> double.init = nan.

I am using opIndex and opIndexAssign in order to access and assign the matrix values:

T opIndex(size_t row, size_t col) const {
	immutable size_t i = this.dim.offset(row, col);
	if (i >= this.dim.size) {
		// TODO - have to learn exception handling in D first. :P
	}
	return this.data[i];
}

Which calls:

size_t size() @property const {
	return this.rows * this.cols;
}

I think and hope that this is getting optimized via inlining. =)
This works similar for opIndexAssign.

The function I am mainly benchmarking is the simple matrix multiplication where one of the multiplied matrices is tranposed first in order to improve cache hit ratio.

Matrix opMul(ref const(Matrix) other) {
	if (this.dim.rows != other.dim.cols || this.dim.cols != ther.dim.rows) {
		// TODO - still have to learn exception handling first ...
	}
	auto m = new Matrix(this.dim.rows, other.dim.cols);
	auto s = new Matrix(other).transposeAssign();
	size_t r1, r2, c;
	T sum;
	for (r1 = 0; r1 < this.dim.rows; ++r1) {
		for (r2 = 0; r2 < other.dim.rows; ++r2) {
			sum = to!T(0);
			for (c = 0; c < this.dim.cols; ++c) {
				sum += this[r1, c] * other[r2, c];
			}
			m[r1, r2] = sum;
		}
	}
	return m;
}

I am aware that there are faster algorithms for matrix multiplication but I am mainly learning how to write efficient D code in general and not based on any algorithm.

This is more or less the same algorithm I am using with java and c++. Both require about 1.5 secs (after java warm-up) to perform this task for two 1000x1000 matrices. D compiled with DMD takes about 14 seconds with all (known-to-me) optimize flag activated. (listed above)

I wanted to make s an immutable matrix in order to hopefully improve performance via this change, however I wasn't technically able to do this to be honest.

Besides that I can't find a way how to make a use of move semantics like in C++. For example a rvalue-move-constructor or a move-assign would be a very nice thing for many tasks.

I am not quite sure also if it is normal in D to make an idup function as slices have which creates an immutable copy of your object. And are there important things I have to do while implementing an idup method for this matrix class?

Another nice thing to know would be if it is possible to initialize an array before it is default initialized with T.init where T is the type of the array's fields. In C++ e.g. there is no default initialization which is nice if you have to initialize every single field anyway. E.g. in a Matrix.random() method which creates a matrix with random values. There it is unnecessary that the (sometimes huge) array is completely initialized with the type's init value.

Thanks in advance!

Robin
February 14, 2014
On Friday, 14 February 2014 at 16:00:09 UTC, Robin wrote:
>
> this(size_t rows, size_t cols) {
> 	this.dim = Dimension(rows, cols);
> 	this.data = new T[this.dim.size];
> 	enum nil = to!T(0);
> 	foreach(ref T element; this.data) element = nil;
> }
>
I am no expert at optimizing D code, so this is a bit of a shot in the dark, but does your speed improve at all if you replace:

> 	this.data = new T[this.dim.size];

with:

this.data.length = this.dim.size
February 14, 2014
On Friday, 14 February 2014 at 16:40:31 UTC, Craig Dillabaugh wrote:
> On Friday, 14 February 2014 at 16:00:09 UTC, Robin wrote:
>>
>> this(size_t rows, size_t cols) {
>> 	this.dim = Dimension(rows, cols);
>> 	this.data = new T[this.dim.size];
>> 	enum nil = to!T(0);
>> 	foreach(ref T element; this.data) element = nil;
>> }
>>
> I am no expert at optimizing D code, so this is a bit of a shot in the dark, but does your speed improve at all if you replace:
>
>> 	this.data = new T[this.dim.size];
>
> with:
>
> this.data.length = this.dim.size

Why would that make a difference here? They're (almost) identical are they not? Certainly the body of the work, the allocation itself, is the same.
February 14, 2014
Robin:

> class Matrix(T = double) {
> 	private T[] data;
> 	private Dimension dim;
> }

Also try "final class" or struct in your benchmark. And try to use ldc2 compiler for performance benchmarks.

Perhaps dim is better const, unless you want to change the shape of the matrix.


> this(size_t rows, size_t cols) {
> 	this.dim = Dimension(rows, cols);
> 	this.data = new T[this.dim.size];
> 	enum nil = to!T(0);
> 	foreach(ref T element; this.data) element = nil;
> }

Better:

this(in size_t rows, in size_t cols) pure nothrow {
    this.dim = Dimension(rows, cols);
    this.data = minimallyInitializedArray!(typeof(data))(this.dim.size);
    this.data[] = to!T(0);
}


> I experienced that floating point values are sadly initialized with nan which isn't what I wanted -> double.init = nan.

This is actually a good feature of D language :-)


> T opIndex(size_t row, size_t col) const {
> 	immutable size_t i = this.dim.offset(row, col);
> 	if (i >= this.dim.size) {
> 		// TODO - have to learn exception handling in D first. :P
> 	}

Look in D docs for the contract programming.

Also add the pure nothrow attributes.


> Which calls:
>
> size_t size() @property const {
> 	return this.rows * this.cols;
> }
>
> I think and hope that this is getting optimized via inlining. =)

Currently class methods are virtual, and currently D compilers are not inlining virtual calls much. The solution is to use a final class, final methods, etc.


> This works similar for opIndexAssign.
>
> The function I am mainly benchmarking is the simple matrix multiplication where one of the multiplied matrices is tranposed first in order to improve cache hit ratio.

Transposing the whole matrix before the multiplication is not efficient. Better to do it more lazily.


> 	auto m = new Matrix(this.dim.rows, other.dim.cols);
> 	auto s = new Matrix(other).transposeAssign();
> 	size_t r1, r2, c;
> 	T sum;
> 	for (r1 = 0; r1 < this.dim.rows; ++r1) {
> 		for (r2 = 0; r2 < other.dim.rows; ++r2) {

Better to define r1, r2 inside the for. Or often even better to use a foreach with interval.


> D compiled with DMD takes about 14 seconds with all (known-to-me) optimize flag activated. (listed above)

DMD is not efficient with floating point values. Try ldc2 or gdc compilers (I suggest ldc2).


> I wanted to make s an immutable matrix in order to hopefully improve performance via this change,

Most times immutability worsens performance, unless you are passing data across threads.


> however I wasn't technically able to do this to be honest.

It's not too much hard to use immutability in D.


> Besides that I can't find a way how to make a use of move semantics like in C++. For example a rvalue-move-constructor or a move-assign would be a very nice thing for many tasks.

That adds too much complexity to .


> Another nice thing to know would be if it is possible to initialize an array before it is default initialized with T.init where T is the type of the array's fields. In C++ e.g. there is no default initialization which is nice if you have to initialize every single field anyway. E.g. in a Matrix.random() method which creates a matrix with random values. There it is unnecessary that the (sometimes huge) array is completely initialized with the type's init value.

It's done by the function that I have used above, from the std.array module.

Bye,
bearophile
February 14, 2014
Craig Dillabaugh:

>> 	this.data = new T[this.dim.size];
>
> with:
>
> this.data.length = this.dim.size

It's the same thing.

Bye,
bearophile
February 14, 2014
On Friday, 14 February 2014 at 16:47:32 UTC, John Colvin wrote:
> On Friday, 14 February 2014 at 16:40:31 UTC, Craig Dillabaugh wrote:
>> On Friday, 14 February 2014 at 16:00:09 UTC, Robin wrote:
>>>
>>> this(size_t rows, size_t cols) {
>>> 	this.dim = Dimension(rows, cols);
>>> 	this.data = new T[this.dim.size];
>>> 	enum nil = to!T(0);
>>> 	foreach(ref T element; this.data) element = nil;
>>> }
>>>
>> I am no expert at optimizing D code, so this is a bit of a shot in the dark, but does your speed improve at all if you replace:
>>
>>> 	this.data = new T[this.dim.size];
>>
>> with:
>>
>> this.data.length = this.dim.size
>
> Why would that make a difference here? They're (almost) identical are they not? Certainly the body of the work, the allocation itself, is the same.

Well, in my defense I did start with a disclaimer.  For some reason I thought using .length was the proper way to size arrays.   It is likely faster if you are resizing an existing array (especially downward), but in this case new memory is being allocated anyway.

In fact I ran some tests (just allocating a matrix over and over) and it seems using new is consistently faster than using .length (by about a second for 50000 matrices).  Shows how much I know.

February 14, 2014
On Friday, 14 February 2014 at 16:00:09 UTC, Robin wrote:
> This is my class with its templated data as a one dimensional array (as I don't like jagged-arrays) and a dimension (which is a struct). The dimension object has some util functionality such as getting the total size or mapping (row, col) to a one-dimensional index besides the getter for rows and columns.

Are you sure you need a class here? If you're not using inheritance, structs can be much faster.


> this(size_t rows, size_t cols) {
> 	this.dim = Dimension(rows, cols);
> 	this.data = new T[this.dim.size];
> 	enum nil = to!T(0);
> 	foreach(ref T element; this.data) element = nil;
> }

You don't need to use std.conv.to here (ditto for when you initialize sum later). This should work and is clearer:

    foreach(ref T element; this.data) element = 0;

(You can do `double item = 0;` and it knows enough to store the double equivalent of 0)

In the case of initializing sum, I'm not sure the compiler is folding `sum = to!T(0);` thus you could be getting a lot of overhead from that. Using std.conv.to is fine but it does actually do checks in the conversion process, so it can be expensive in tight loops. Since you're just using a constant 0 here, there's no reason to use std.conv.to.


> The function I am mainly benchmarking is the simple matrix multiplication where one of the multiplied matrices is tranposed first in order to improve cache hit ratio.

Did you benchmark first to make sure this actually improves performance? It seems to me like doing the transpose would require the same cache-missing that you'd get in the actual use. If you were caching it and using it multiple times, this would probably be more beneficial. General tip for optimizing: benchmark before and after every "optimization" you do. I've been surprised to find some of my ideas for optimizations slowed things down before.


> Another nice thing to know would be if it is possible to initialize an array before it is default initialized with T.init where T is the type of the array's fields.

http://dlang.org/phobos/std_array.html#.uninitializedArray
February 14, 2014
Chris Cain:

> http://dlang.org/phobos/std_array.html#.uninitializedArray

minimallyInitializedArray should be used, because it's safer.

Bye,
bearophile
February 14, 2014
On Friday, 14 February 2014 at 16:00:09 UTC, Robin wrote:
> class Matrix(T = double) {
> T opIndex(size_t row, size_t col) const {

First of all make sure you it's not virtual, otherwise each element access will cost you enough to make it 10x slower than Java.
February 14, 2014
On Friday, 14 February 2014 at 16:56:29 UTC, bearophile wrote:
> Craig Dillabaugh:
>
>>> 	this.data = new T[this.dim.size];
>>
>> with:
>>
>> this.data.length = this.dim.size
>
> It's the same thing.
>
> Bye,
> bearophile

Not quite. Setting length will copy over the existing contents of the array. Using new simply sets every element to .init.

Granted, this.data is empty meaning there's nothing to copy over, so there's a negligible overhead which may be optimized out by the compiler anyways.

There's also the uninitializedArray function:
http://dlang.org/phobos/std_array.html#.uninitializedArray
« First   ‹ Prev
1 2 3 4 5 6