Optimize my code =) (page 4)

On 2/17/2014 4:56 PM, Robin wrote: > > And here is the complete and improved code: > http://dpaste.dzfl.pl/7f8610efa82b > You should get rid of the .dup's in the constructors and postblits. You don't want big arrays ever getting accidentally allocated and copied behind your back when you're only trying to pass things around. Better to provide an explicit .dup function in your Matrix class (just like how D's dynamic arrays work) for when you *intentionally* want a full duplicate. Also, unlike classes, when you're copying a struct there's no need to copy each field individually. Just copy the struct as a whole. In fact, copy constructors and overloading assignments from the same type are generally not needed at all: They aren't going to be any faster than just using D's default behavior of memory-blitting the struct to the new location *if* a copy is actually even needed at all. By providing explicit copy constructors and overloading assignments from the same type, you're forcing the compiler to use a potentially-slower method every time. Finally, and I could be wrong on this part, but I doubt those move constructors are really needed in D. See the top answer here: http://stackoverflow.com/questions/4200190/does-d-have-something-akin-to-c0xs-move-semantics

On Monday, 17 February 2014 at 11:34:43 UTC, bearophile wrote: > >> foreach (ref T element; this.data) { >> element *= scalar; >> } > > Try to use: > > data[] *= scalar; > > >> for (auto i = 0; i < this.dim.size; ++i) { >> this.data[i] += other.data[i]; >> } > > Try to use: > > data[] += other.data[]; > > >> for (auto i = 0; i < this.dim.size; ++i) { >> this.data[i] -= other.data[i]; >> } > > Try to use: > > data[] -= other.data[]; > > While it is cleaner, I remember array operations (such as data[] -= other.data[]) being significantly slower than doing it yourself. I don't know if this has been fixed. Perhaps using -m64 helps this. Also, I stress again that if you aren't compiling for 64-bit (-m64), it's very likely to give a significant improvement (because the optimizer will probably convert some of your operations into SIMD operations). Using LDC / GDC would also give a significant improvement.

Kapps: > I remember array operations (such as data[] -= other.data[]) being significantly slower than doing it yourself. I don't know if this has been fixed. They are not being "fixed". But for arrays as large as the ones here, they are fast enough. --------------------- I have tested the code (all in a single module) using the latest dmd and ldc2 (beta versions in both cases), compiling with: dmd -O -release -inline -noboundscheck matrix.d ldmd2 -O -release -inline -noboundscheck matrix.d Best timings: LDC2: multiplicationTest ... Time required: 2 secs, 522 msecs DMD: multiplicationTest ... Time required: 4 secs, 724 msecs Using link-time optimization, or using AVX/AVX2 on modern CPUs (using the compiler switcher for the AVX) could improve the LDC2 timings a little more. Bye, bearophile

February 18, 2014

Re: Optimize my code =)

Posted by Chris Williams
in reply to Robin

Permalink

Chris Williams

Posted in reply to Robin

Permalink

On Friday, 14 February 2014 at 16:00:09 UTC, Robin wrote:
> this(size_t rows, size_t cols) {
> 	this.dim = Dimension(rows, cols);
> 	this.data = new T[this.dim.size];
> 	enum nil = to!T(0);
> 	foreach(ref T element; this.data) element = nil;
> }

Your foreach here is unnecessary. You can zero out an array using the syntax:

this.data[] = 0; // probably compiles out as memset()

And while I would expect to!T(0) to compile down to something with no particular logic, it still is a function call not a macro, so you're better to let the compiler figure out the best conversion from 0 for type T than to use a library function for it.

> I think and hope that this is getting optimized via inlining. =)
> This works similar for opIndexAssign.

From complaints that I have seen, very little gets inlined at the moment.

> Matrix opMul(ref const(Matrix) other) {
> 	if (this.dim.rows != other.dim.cols || this.dim.cols != ther.dim.rows) {
> 		// TODO - still have to learn exception handling first ...
> 	}
> 	auto m = new Matrix(this.dim.rows, other.dim.cols);
> 	auto s = new Matrix(other).transposeAssign();

Since you don't return s, it's probably better to not allocate an instance. I would also be curious to see your constructor which copies one matrix into a new one.

> Besides that I can't find a way how to make a use of move semantics like in C++. For example a rvalue-move-constructor or a move-assign would be a very nice thing for many tasks.

Do you mean something like this?

this.data[] = other.data[]; // probably compiles out as memcpy()
this.data = other.data.dup; // different syntax, but should be the same

> Another nice thing to know would be if it is possible to initialize an array before it is default initialized with T.init where T is the type of the array's fields.

I believe so, but I would have to scour about to find the syntax. Hopefully, someone else knows.

Okay, here we go... 1) Don't use upper case letters in module names (http://dlang.org/module.html#ModuleDeclaration) 2) As has already been suggested, if you're targeting raw performance, don't use GC. You can always malloc and free your storage. Using C heap has certain implications, but they're not important right now. 3) Ditch extra constructors, they're completely unnecessary. For matrix you only need your non-trivial ctor, postblit and lvalue assignment operator. 4) This: ref Matrix transpose() const { return Matrix(this).transposeAssign(); } just doesn't make any sense. You're returning a reference to a temporary. 5) Use scoped imports, they're good :) 6) Use writefln for formatted output, it's good :) With the above suggestions your code transfrorms into this: http://dpaste.dzfl.pl/9d7feeab59f6 And here are the timings on my machine: $ rdmd -O -release -inline -noboundscheck main.d allocationTest ... Time required: 1 sec, 112 ms, 827 μs, and 3 hnsecs multiplicationTest ... Time required: 1 sec, 234 ms, 417 μs, and 8 hnsecs toStringTest ... Time required: 998 ms, 16 μs, and 2 hnsecs transposeTest ... Time required: 813 ms, 947 μs, and 3 hnsecs scalarMultiplicationTest ... Time required: 105 ms, 828 μs, and 5 hnsecs matrixAddSubTest ... Time required: 240 ms and 384 μs matrixEqualsTest ... Time required: 244 ms, 249 μs, and 8 hnsecs identityMatrixTest ... Time required: 249 ms, 897 μs, and 4 hnsecs LDC yields roughly the same times.

On Tuesday, 18 February 2014 at 03:32:48 UTC, Stanislav Blinov wrote: > 3) Ditch extra constructors, they're completely unnecessary. For matrix you only need your non-trivial ctor, postblit and lvalue assignment operator. Actually rvalue opAssign would be even better :) Also, I forgot to remove it in the code, but that explicit ctor for Dimension is completely unnecessary too.

Stanislav Blinov: > http://dpaste.dzfl.pl/9d7feeab59f6 Few small things should still be improved, but it's an improvement. Perhaps it needs to use a reference counting from Phobos. > LDC yields roughly the same times. This is surprising. Bye, bearophile

On Tuesday, 18 February 2014 at 07:50:23 UTC, bearophile wrote: > Stanislav Blinov: > >> http://dpaste.dzfl.pl/9d7feeab59f6 > > Few small things should still be improved, but it's an improvement. Perhaps it needs to use a reference counting from Phobos. COW for matrices? Aw, come on... :) >> LDC yields roughly the same times. > > This is surprising. To me as well. I haven't yet tried to dig deep though.

On Tuesday, 18 February 2014 at 07:45:18 UTC, bearophile wrote: > Stanislav Blinov: > >> that explicit ctor >> for Dimension is completely unnecessary too. > > I like a constructor(s) like that because it catches bugs like: > > auto d = Dimension(5); Hmmm... yeah, ok, not completely unnecessary :)

Forums