Hello all. I have been working on learning computational photography and have been using D to do that. I recently got some code running that performs chromatic adaptation (white balancing). The output is still not ideal (image is overexposed) but it does correct color casts. The issue I have is with performance. With a few optimizations found with profiling I have been able to drop processing time from ~10.8 to ~6.2 seconds for a 16 megapixel test image. That still feels like too long however. Image editing programs are usually much faster.
The optimizations that I've implemented:
- Remove
immutable
from constants. The type mismatch between constants (immutable(double)
) and pixel values (double
) caused time-consuming checks for compatible types in mir operations and triggered run-time type conversions and memory allocations (sorry if I butchered this description). - Use
mir.math.common.pow
in place ofstd.math.pow
. - Use
@optmath
for linearization functions (https://github.com/kyleingraham/photog/blob/up-chromadapt-perf/source/photog/color.d#L192 and https://github.com/kyleingraham/photog/blob/up-chromadapt-perf/source/photog/color.d#L318).
Is there anything else I can do to improve performance?
I tested the code under the following conditions:
- Compiled with
dub build --build=release --compiler=ldmd2
- dub v1.23.0, ldc v1.24.0
- Intel Xeon W-2170B 2.5GHz (4.3GHz turbo)
- Test image
- Test code:
#!/usr/bin/env dub
/+ dub.sdl:
name "photog-test"
dependency "photog" version="~>0.1.1-alpha"
dependency "jpeg-turbod" version="~>0.2.0"
+/
import std.datetime.stopwatch : AutoStart, StopWatch;
import std.file : read, write;
import std.stdio : writeln, writefln;
import jpeg_turbod;
import mir.ndslice : reshape, sliced;
import photog.color : chromAdapt, Illuminant, rgb2Xyz;
import photog.utils : imageMean, toFloating, toUnsigned;
void main()
{
const auto jpegFile = "image-in.jpg";
auto jpegInput = cast(ubyte[]) jpegFile.read;
auto dc = new Decompressor();
ubyte[] pixels;
int width, height;
bool decompressed = dc.decompress(jpegInput, pixels, width, height);
if (!decompressed)
{
dc.errorInfo.writeln;
return;
}
auto image = pixels.sliced(height, width, 3).toFloating;
int err;
double[] srcIlluminant = image
.imageMean
.reshape([1, 1, 3], err)
.rgb2Xyz
.field;
assert(err == 0);
auto sw = StopWatch(AutoStart.no);
sw.start;
auto ca = chromAdapt(image, srcIlluminant, Illuminant.d65).toUnsigned;
sw.stop;
auto timeTaken = sw.peek.split!("seconds", "msecs");
writefln("%d.%d seconds", timeTaken.seconds, timeTaken.msecs);
auto c = new Compressor();
ubyte[] jpegOutput;
bool compressed = c.compress(ca.field, jpegOutput, width, height, 90);
if (!compressed)
{
c.errorInfo.writeln;
return;
}
"image-out.jpg".write(jpegOutput);
}
Functions found through profiling to be taking most time:
- Chromatic adaptation: https://github.com/kyleingraham/photog/blob/up-chromadapt-perf/source/photog/color.d#L354
- RGB to XYZ: https://github.com/kyleingraham/photog/blob/up-chromadapt-perf/source/photog/color.d#L142
- XYZ to RGB: https://github.com/kyleingraham/photog/blob/up-chromadapt-perf/source/photog/color.d#L268
A profile for the test code is here. The trace.log.dot file can be viewed with xdot. The PDF version is here. The profile was generated using:
- Compiled with dub build --build=profile --compiler=ldmd2
- Visualized with profdump - dub run profdump -- -f -d -t 0.1 trace.log trace.log.dot
The branch containing the optimized code is here: https://github.com/kyleingraham/photog/tree/up-chromadapt-perf
The corresponding release is here: https://github.com/kyleingraham/photog/releases/tag/v0.1.1-alpha
If you've gotten this far thank you so much for reading. I hope there's enough information here to ease thinking about optimizations.