On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:
> Hi
I'm playing with CTFE in D. This feature allows for a lot of funny things, e.g. initialisation of immutable data at compile time with the result of some other function (template).
As a result I get immutable result blobs compiled into the binary. But none of the generating code, because it was already executed by CTFE.
This worked nicly for several other usecases as well.For now the results of CTFE and RT were always the same. As expected.
However, yesterday a unit-test started to report, that the results created by the same code with same parameters differ when run in CTFE mode or at runtime.
static immutable ubyte[] burningShipImageCTFE = generateBurningShipImage(twidth, theight, maxIter);
immutable ubyte[] burningShipImageRT = generateBurningShipImage(twidth, theight, maxIter);
assert(burningShipImageCTFE == burningShipImageRT, "Same results expected.");
I diffed the pictures and indeed some of the pixels in the more complex areas of the BurningShip fractal were clearly and noteably different.
Ok, the fractal code uses 'double' floats, which are by their very nature limited in precision. But assuming that the math emulation of CTFE works identical as the CPU does at runtime, the outcome should be identical.
Or not, in some cases ;-) E.g. with a fractal equation where smallest changes can result into big differences.
And it opens up some questions:
- Can CTFE be used under all circumstances when float numbers of any precision are involved?
- Or is this some kind of expected behaviour whenever floats are involved?
- Is the D CTFE documentation completely covering such possible issues?
I can imagine that bugs causes by such subtil differences might be very difficult to fix.
Any experiences or thought on this?
Experiences, little, as I'm not doing floating-point stuff professionally, but I know my stuff because in the past, for some years, I did the lecture assistance for a numerical programming course.
The normal use case for floating-point isn't perfectly reproducible results between different optimization levels. However, differences between CTFE and RT are indeed unacceptable for core-language operations. Those are bugs. Of course, results in user code can differ between CTFE and RT due to using __ctfe
incorrectly. It might be noteworthy that C++ (at least up to including C++17) does not allow floating-point types in CTFE (i.e. in constexpr
execution) and my suspicion is that this is the reason.
Maybe the solution is the same: Remove floating-point operations from CTFE or at least ones that could differ from RT. It would be awkward, at last in some people's opinion, because that would mean that at CTFE, only real
is available, despite it being implementation defined (it's not just platform dependent, it's also how expressions are interpreted), while double
and float
being seemingly exactly defined. The reason is that real
can't be optimized to higher precision as it's the highest precision format supposed by the platform by definition, whereas for the smaller formats, RT results may differ for different optimization levels. What the compiler could do, however is replace a + b * c
by a fused multiply add. If it does that consistently across CTFE and RT, as I read the spec, it would be allowed to do that, even for real
.
The reason D specifies floating-point operations not that precisely is to allow for optimizations. Generally speaking, optimizations require some leeway in the spec. Optimizations are also required not to change observable behavior, but what counts as that is again up to the spec. In C++, the compiler is allowed to optimize away copies, even if those would invoke a copy constructor that has observable side-effects. As I see it (not my opinion, just what I see and conclude), D specifies differences in floating-point operations due to them being carried out in higher-than-required precision not an obersvable side-effect, i.e. one that the optimizer must preserve, even if you can practically observe a difference.
The reason for that is probably because Walter didn't like that other languages nailed down floating-point operations so that you'd get both less precise results and worse performance. That would for example be the case on an 80387 coprocessor, and (here's where my knowledge ends) probably also true for basically all hardware today if you consider float
specifically. I know of no hardware, that supports single precision, but not double precision. Giving you double precision instead of single is at least basically free and possibly even a performance boost, while also giving you more precision.
An algorithm like Kahan summation must be implemented in a way that takes those optimizations into account. This is exactly like in C++, signed integer overflow is undefined, not because it's undefined on the hardware, but because it allows for optimizations. In Zig, all integer overflow is undefined for that reason, and for wrap-around or saturated arithmetic, there are separate operators. D could easily add specific functions to core.math
that specify operations as specifically IEEE-754 confirming. Using those, Phobos could give you types that are specified to produce results as specified by IEEE-754, with no interference by the optimizer. You can't actually do the reverse, i.e. provide a type in Phobos that allows for optimizations of that sort but the core-language types are guaranteed to be unoptimized. Such a type would have to be compiler-recognized, i.e. it would end up being a built-in type.