Thread overview
Help optimizing code?
Jan 01, 2018
Lily
Jan 01, 2018
Adam D. Ruppe
Jan 01, 2018
user1234
Jan 01, 2018
user1234
Jan 01, 2018
Adam D. Ruppe
Jan 01, 2018
Muld
Jan 01, 2018
Adam D. Ruppe
Jan 01, 2018
Muld
Jan 02, 2018
Uknown
Jan 02, 2018
Uknown
January 01, 2018
I started learning D a few days ago, coming from some very basic C++ knowledge, and I'd like some help getting a program to run faster. The code is here: https://github.com/IndigoLily/D-mandelbrot/blob/master/mandelbrot.d

Right now it runs slower than my JavaScript Mandelbrot renderer on the same quality settings, which is clearly ridiculous, but I don't know what to do to fix it. Sorry for the lack of comments, but I can never tell what will and won't be obvious to other people.
January 01, 2018
On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
> I started learning D a few days ago, coming from some very basic C++ knowledge, and I'd like some help getting a program to run faster.

So a few easy things you can do:

1) use `float` instead of `real`. real sucks, it is really slow and weird. Making that one switch doubled the speed on my computer.

2) preallocate the imageData. before the loop, `imageData.reserve(width*height*3)`. Small savings on my computer but an easy one.

3) make sure you use the compiler optimization options like `-O` and `-inline` on dmd (or use the gdc and ldc compilers both of which generally optimize better than dmd out of the box).


And if that isn't enough we can look into smaller things, but these overall brought the time down to about 1/3 what it started on my box.
January 01, 2018
On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
> I started learning D a few days ago, coming from some very basic C++ knowledge, and I'd like some help getting a program to run faster. The code is here: https://github.com/IndigoLily/D-mandelbrot/blob/master/mandelbrot.d
>
> Right now it runs slower than my JavaScript Mandelbrot renderer on the same quality settings, which is clearly ridiculous, but I don't know what to do to fix it. Sorry for the lack of comments, but I can never tell what will and won't be obvious to other people.

- The first thing is to compile with the best options:

    dmd mandelbrot.d -O -release -inline -boundscheck=off

- You append a lot, which can cause reallocs for imageData; Try

   import std.array;
   Appender!(ubyte[]) imageData;

   The code will not have to be changed for "~=" since Appender overloads this operator.

- I'd use "double" instead of "real".
January 01, 2018
On Monday, 1 January 2018 at 15:23:19 UTC, Adam D. Ruppe wrote:
> On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
>> I started learning D a few days ago, coming from some very basic C++ knowledge, and I'd like some help getting a program to run faster.
>
> So a few easy things you can do:
>
> 1) use `float` instead of `real`. real sucks, it is really slow and weird. Making that one switch doubled the speed on my computer.

Yes I've also adviced double. Double is better if the target arch is X86_64 since part of the operations will be made with SSE. With "real" the OP was **sure** to get 100% of the maths done in the FPU (although for all the trigo stuff there's no choice)

>
> 2) preallocate the imageData. before the loop, `imageData.reserve(width*height*3)`. Small savings on my computer but an easy one.
>
> 3) make sure you use the compiler optimization options like `-O` and `-inline` on dmd (or use the gdc and ldc compilers both of which generally optimize better than dmd out of the box).
>
>
> And if that isn't enough we can look into smaller things, but these overall brought the time down to about 1/3 what it started on my box.


January 01, 2018
On Monday, 1 January 2018 at 15:29:28 UTC, user1234 wrote:
>     dmd mandelbrot.d -O -release -inline -boundscheck=off

-O and -inline are OK, but -release and -boundscheck are harmful and shouldn't be used. Yeah, you can squeeze a bit of speed out of them, but there's another way to do it - `.ptr` on the individual accesses or versioning out unwanted `assert` statements - and those avoid major bug and security baggage that -release and -boundscheck=off bring.

In this program, I didn't see a major improvement with the boundscheck skipping... and in this program, it seems to be written without the bugs, but still, I am against that switch on principle. It is so so so easy to break things with them.

> - I'd use "double" instead of "real".

On my computer at least, float gave 2x speed compared to double. You could try both though and see which works better.
January 01, 2018
On Monday, 1 January 2018 at 15:54:33 UTC, Adam D. Ruppe wrote:
> On Monday, 1 January 2018 at 15:29:28 UTC, user1234 wrote:
>>     dmd mandelbrot.d -O -release -inline -boundscheck=off
>
> -O and -inline are OK, but -release and -boundscheck are harmful and shouldn't be used. Yeah, you can squeeze a bit of speed out of them, but there's another way to do it - `.ptr` on the individual accesses or versioning out unwanted `assert` statements - and those avoid major bug and security baggage that -release and -boundscheck=off bring.

If you use .ptr then you get zero detection, even in debug builds.

> In this program, I didn't see a major improvement with the boundscheck skipping... and in this program, it seems to be written without the bugs, but still, I am against that switch on principle. It is so so so easy to break things with them.

In this program, it's relatively small and doesn't look like it does its calculations in realtime. I'd rather there be a potential bug than the program running to slow to be usable, or have zero debugging for indices in debug builds.




January 01, 2018
On Monday, 1 January 2018 at 16:13:37 UTC, Muld wrote:
> If you use .ptr then you get zero detection, even in debug builds.

It is limited to the one expression where you wrote it, instead of on the ENTIRE program like the build switches do.

It is a lot easier to check correctness in an individual expression than it is to check the entire program, including stuff you didn't even realize might have been a problem.

With the .ptr pattern, it is correct by default and you individually change ones you (should) look carefully at. With -boundscheck, it is wrong by default and most people don't even look at it - people suggest it to newbies as an optimization without mentioning how nasty it is.

> I'd rather there be a potential bug than the program running to slow to be usable

That's a ridiculous exaggeration. In this program, I saw a < 1% time difference using those flags. -O -inline make a 50x bigger difference!

> or have zero debugging for indices in debug builds.

You shouldn't be using .ptr until after you've carefully checked and debugged the line of code where you are writing it. That's the beauty of the pattern: it only affects one line of code, so you can test it before you use it without affecting the rest of the program.
January 01, 2018
On Monday, 1 January 2018 at 16:47:40 UTC, Adam D. Ruppe wrote:
> On Monday, 1 January 2018 at 16:13:37 UTC, Muld wrote:
>> If you use .ptr then you get zero detection, even in debug builds.
>
> It is limited to the one expression where you wrote it, instead of on the ENTIRE program like the build switches do.
>
> It is a lot easier to check correctness in an individual expression than it is to check the entire program, including stuff you didn't even realize might have been a problem.
>
> With the .ptr pattern, it is correct by default and you individually change ones you (should) look carefully at. With -boundscheck, it is wrong by default and most people don't even look at it - people suggest it to newbies as an optimization without mentioning how nasty it is.

It won't be just one line though. When you pretty much have to use it EVERYWHERE to get the optimization you want. It makes more sense to just turn off the check for the entire program and use your own asserts() where they are actually needed. That way you still get the checks in debug builds and have asserts where they are actually necessary.

>> I'd rather there be a potential bug than the program running to slow to be usable
>
> That's a ridiculous exaggeration. In this program, I saw a < 1% time difference using those flags. -O -inline make a 50x bigger difference!

Read the sentence right before this.. Jesus. People only read what they want.

>> or have zero debugging for indices in debug builds.
>
> You shouldn't be using .ptr until after you've carefully checked and debugged the line of code where you are writing it. That's the beauty of the pattern: it only affects one line of code, so you can test it before you use it without affecting the rest of the program.

It won't just be one line, and that's not beautiful. What happens when code gets refactored? You are constantly going to be flip-flopping the source code rather than a compiler flag or using multiple build configurations? How long are you even going to test for? The error that might happen for the code is probably difficult to detect, if it wasn't then having bounds checking at all wouldn't be necessary. Just test your code, that's the beauty of testing!

January 02, 2018
On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
> I started learning D a few days ago, coming from some very basic C++ knowledge, and I'd like some help getting a program to run faster. The code is here: https://github.com/IndigoLily/D-mandelbrot/blob/master/mandelbrot.d
>
> Right now it runs slower than my JavaScript Mandelbrot renderer on the same quality settings, which is clearly ridiculous, but I don't know what to do to fix it. Sorry for the lack of comments, but I can never tell what will and won't be obvious to other people.

Hey! I happened to also write a Mandelbrot generator in D. It was based of the version given on rossetacode for C[0].
Some of the optimizations I used were:

0. Use LDC. It is significantly faster.
1. Utilize the fact that the Mandelbrot  set is symmetric about the X axis.You can half the time taken.
2. Use std.parallelism for using multiple cores on the CPU
3. Use @fastmath of LDC
4. imageData.reserve(width * height * 3) before the loop
5. [1] is a great article on this specific topic

For reference, on my 28W 2 core i5, a 2560x1600 image took about 2 minutes to
render, with 500,000 iterations per pixel.
[2] is my own version.

[0]: https://rosettacode.org/wiki/Mandelbrot_set#PPM_non_interactive
[1]: https://randomascii.wordpress.com/2011/08/13/faster-fractals-through-algebra/
[2]: https://github.com/Sirsireesh/Khoj-2017/blob/master/Mandelbrot-set/mandlebrot.d
January 02, 2018
On Tuesday, 2 January 2018 at 07:17:23 UTC, Uknown wrote:
> [snip]
> 0. Use LDC. It is significantly faster.
> 1. Utilize the fact that the Mandelbrot  set is symmetric about the X axis.You can half the time taken.
> 2. Use std.parallelism for using multiple cores on the CPU
> 3. Use @fastmath of LDC
> 4. imageData.reserve(width * height * 3) before the loop
> 5. [1] is a great article on this specific topic
> [snip]

Forgot to mention that since you already know some of the edges, you can avoid unnecessarily looping through some regions. That saves a lot of time