Differences in results when using the same function in CTFE and Runtime (page 4)

Settings

Help

Index » General » Differences in results when using the same function in CTFE and Runtime (page 4)

August 15

Re: Differences in results when using the same function in CTFE and Runtime

Posted by Timon Gehr
in reply to Abdulhaq

Permalink

Timon Gehr

Posted in reply to Abdulhaq

Permalink

On 8/15/24 18:50, Abdulhaq wrote:
> On Thursday, 15 August 2024 at 16:21:35 UTC, Abdulhaq wrote:
>> On Thursday, 15 August 2024 at 09:13:31 UTC, Carsten Schlote
> 
> To clarify a bit more, I'm not just talking about single isolated computations, I'm talking about e.g. matrix multiplication. Different compilers, even LDC vs DMD for example, could optimise the calculation in a different way, loop unrolling, step elimination etc. even if the rounding algorithms etc. at the chip level are the same, the way the code is compiled and calculations sequenced will change the error in the final answer.
> ...

LDC disables -ffast-math by default.

> Then, variations in pipelining and caching at the processor level could also affect the answer.
> ...

No.

> And if you move on to different computing paradigms such as quantum computing and other as yet undiscovered techniques, again the way operations and rounding etc is compounded will cause divergences in computations.
> ...

Yes, if you move on to an analog computing paradigm with imperfect error correction, full reproducibility will go out the window. Floating point is not that though.

> Now, we could insist that we somehow legislate for the way compound calculations are conducted. But that would cripple the speed of calculations for some processor architecture / paradigms for a goal (reproduceability) which is worthy, but for 99% of usages not sufficiently beneficial to pay the big price in performance.
> 
> 

It's really not that expensive, changing the result via optimizations is disabled by default in LDC, and actually, how do you know that the compiler does not pessimize your hand-optimized compound operations.

I am not even against people being able to pass -ffast-math, it should just not destroy the correctness and reproducibility of everyone else's computations.

August 15

Re: Differences in results when using the same function in CTFE and Runtime

Posted by Abdulhaq
in reply to Timon Gehr

Permalink

Abdulhaq

Posted in reply to Timon Gehr

Permalink

On Thursday, 15 August 2024 at 16:51:51 UTC, Timon Gehr wrote:
> On 8/15/24 18:21, Abdulhaq wrote:
>> Why do you want to reproduce the errors?
>
> It's you who calls it "errors". Someone else may just call it "results".
>

Here by "error" I mean where the result is different to a known correct answer. It's a simple matter to propose a method for producing a fractal plot, and to have a notional "correct" plot if calculated to infinite precision. Then, at sufficient depth of an e.g. mandelbrot plot, at a finite precision, some pixels will differ from the ideal plot produced by the machine of infinite precision. These divergences are, by my definition, errors. Yes, it's all results, and some results are errors.

> Science. Determinism, e.g. blockchain or other deterministic lock-step networking. etc. It's not hard to come up with use cases. Sometimes what exact result you get is not very important, but it is important that you get the same one.
>
>> 

This is why I was careful to say that I was generalising. Of course we can come up with examples where reproduceability is an essential requirement. In this case we can often work out what precision is required to achieve said reproduceability.

>> In general floating point calculations are inexact, and we shouldn't (generalising here) expect to get the same error across different platforms.
>
> It's just not true that floating point calculations are somehow "inexact". You are confusing digital and analog computing. This is still digital computing. Rounding is a deterministic, exact function, like any other.

Fair point. I meant to refer to floating point numbers rather than calculations. There are an infinite number of real numbers that cannot be exactly represented by a floating point number of finite precision. The floating point representations of those numbers are inexact.

If we calculate 1/3 using floating point, the result can be exactly correct in the sense there is a correct answer in the world of floating point, but the calculated result will be an inexact representation of the correct number, which is what I had in mind.

August 16

Re: Differences in results when using the same function in CTFE and Runtime

Posted by claptrap
in reply to Dom DiSc

Permalink

claptrap

Posted in reply to Dom DiSc

Permalink

On Thursday, 15 August 2024 at 10:25:45 UTC, Dom DiSc wrote:

On Wednesday, 14 August 2024 at 07:54:15 UTC, claptrap wrote:

I agree the compiler should actually use the float precision you explicitly ask for.

But if you do cross compiling, that may simply be not possible (because the target has different hardware implementation than the hardware the compiler runs on).

So if have some CTFE function that doesn't a bunch of calculations in double precision and the compiler is running on a platform that only supports float, what happens?

Also relying on specific inaccuracy of FP calculations is very bad design.

It's not relying on them, it's accounting for them.

August 17

Re: Differences in results when using the same function in CTFE and Runtime

Posted by Quirin Schroll
in reply to Carsten Schlote

Permalink

Quirin Schroll

Posted in reply to Carsten Schlote

Permalink

On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:

I'm playing with CTFE in D. This feature allows for a lot of funny things, e.g. initialisation of immutable data at compile time with the result of some other function (template).

As a result I get immutable result blobs compiled into the binary. But none of the generating code, because it was already executed by CTFE.

This worked nicly for several other usecases as well.For now the results of CTFE and RT were always the same. As expected.

However, yesterday a unit-test started to report, that the results created by the same code with same parameters differ when run in CTFE mode or at runtime.

    static immutable ubyte[] burningShipImageCTFE = generateBurningShipImage(twidth, theight, maxIter);
    immutable ubyte[] burningShipImageRT = generateBurningShipImage(twidth, theight, maxIter);
    assert(burningShipImageCTFE == burningShipImageRT, "Same results expected.");

I diffed the pictures and indeed some of the pixels in the more complex areas of the BurningShip fractal were clearly and noteably different.

Ok, the fractal code uses 'double' floats, which are by their very nature limited in precision. But assuming that the math emulation of CTFE works identical as the CPU does at runtime, the outcome should be identical.

Or not, in some cases ;-) E.g. with a fractal equation where smallest changes can result into big differences.

And it opens up some questions:

Can CTFE be used under all circumstances when float numbers of any precision are involved?
Or is this some kind of expected behaviour whenever floats are involved?
Is the D CTFE documentation completely covering such possible issues?

I can imagine that bugs causes by such subtil differences might be very difficult to fix.

Any experiences or thought on this?

Experiences, little, as I'm not doing floating-point stuff professionally, but I know my stuff because in the past, for some years, I did the lecture assistance for a numerical programming course.

The normal use case for floating-point isn't perfectly reproducible results between different optimization levels. However, differences between CTFE and RT are indeed unacceptable for core-language operations. Those are bugs. Of course, results in user code can differ between CTFE and RT due to using __ctfe incorrectly. It might be noteworthy that C++ (at least up to including C++17) does not allow floating-point types in CTFE (i.e. in constexpr execution) and my suspicion is that this is the reason.

Maybe the solution is the same: Remove floating-point operations from CTFE or at least ones that could differ from RT. It would be awkward, at last in some people's opinion, because that would mean that at CTFE, only real is available, despite it being implementation defined (it's not just platform dependent, it's also how expressions are interpreted), while double and float being seemingly exactly defined. The reason is that real can't be optimized to higher precision as it's the highest precision format supposed by the platform by definition, whereas for the smaller formats, RT results may differ for different optimization levels. What the compiler could do, however is replace a + b * c by a fused multiply add. If it does that consistently across CTFE and RT, as I read the spec, it would be allowed to do that, even for real.

The reason D specifies floating-point operations not that precisely is to allow for optimizations. Generally speaking, optimizations require some leeway in the spec. Optimizations are also required not to change observable behavior, but what counts as that is again up to the spec. In C++, the compiler is allowed to optimize away copies, even if those would invoke a copy constructor that has observable side-effects. As I see it (not my opinion, just what I see and conclude), D specifies differences in floating-point operations due to them being carried out in higher-than-required precision not an obersvable side-effect, i.e. one that the optimizer must preserve, even if you can practically observe a difference.

The reason for that is probably because Walter didn't like that other languages nailed down floating-point operations so that you'd get both less precise results and worse performance. That would for example be the case on an 80387 coprocessor, and (here's where my knowledge ends) probably also true for basically all hardware today if you consider float specifically. I know of no hardware, that supports single precision, but not double precision. Giving you double precision instead of single is at least basically free and possibly even a performance boost, while also giving you more precision.

An algorithm like Kahan summation must be implemented in a way that takes those optimizations into account. This is exactly like in C++, signed integer overflow is undefined, not because it's undefined on the hardware, but because it allows for optimizations. In Zig, all integer overflow is undefined for that reason, and for wrap-around or saturated arithmetic, there are separate operators. D could easily add specific functions to core.math that specify operations as specifically IEEE-754 confirming. Using those, Phobos could give you types that are specified to produce results as specified by IEEE-754, with no interference by the optimizer. You can't actually do the reverse, i.e. provide a type in Phobos that allows for optimizations of that sort but the core-language types are guaranteed to be unoptimized. Such a type would have to be compiler-recognized, i.e. it would end up being a built-in type.

August 18

Re: Differences in results when using the same function in CTFE and Runtime

Posted by Timon Gehr
in reply to Quirin Schroll

Permalink

Timon Gehr

Posted in reply to Quirin Schroll

Permalink

On 8/17/24 18:33, Quirin Schroll wrote:
> 
The normal use case for floating-point isn't perfectly reproducible results between different optimization levels.

I would imagine the vast majority of FLOPs nowadays are used in HPC and AI workloads. Reproducibility is at least a plus, particularly in a research context.

> However, differences between CTFE and RT are indeed unacceptable for core-language operations. Those are bugs.

No, they are not bugs, it's just the same kind of badly designed specification. According to the specification, you can get differences between RT and RT when running the exact same function. Of course you will get differences between CTFE and RT.

> The reason for that is probably because Walter didn't like that other languages nailed down floating-point operations

Probably. C famously nails down floating-point operations, just like it nails down all the other types. D is really well-known for all of its unportable built-in data types, because Walter really does not like nailing things down and this is not one of D's selling points. /s

Anyway, at least LDC is sane on this at runtime by default. Otherwise I would have to switch language for use cases involving floating point, which would probably just make me abandon D in the long run.

> so that you'd get both less precise results *and* worse performance.

Imagine just manually using the data type that is most suitable for your use case.

> That would for example be the case on an 80387 coprocessor, and (here's where my knowledge ends) 

Then your knowledge may be rather out of date. I get the x87 shenanigans, but that's just not very relevant anymore. I am not targeting 32-bit x86 with anything nowadays.

> probably also true for basically all hardware today if you consider `float` specifically. I know of no hardware, that supports single precision, but not double precision. Giving you double precision instead of single is at least basically free and possibly even a performance boost, while also giving you more precision.

It's nonsense. If I want double, I ask for double. Also, it's definitely not true that going to double instead of single precision will boost your performance on a modern machine. If you are lucky it will not slow you down, but if the code can be auto-vectorized (or you are vectorizing manually), you are looking at least at a 2x slowdown.

> 
> An algorithm like Kahan summation must be implemented in a way that takes those optimizations into account.

I.e., do not try to implement this at all with the built-in floating-point types. It's impossible.

> This is exactly like in C++, signed integer overflow is undefined, not because it's undefined on the hardware, but because it allows for optimizations.

If you have to resort to invoking insane C++ precedent in order to defend a point, you have lost the debate. Anyway, it is not at all the same (triggered by overflow vs triggered by default, undefined behavior vs wrong result), and also, in D, signed overflow is actually defined behavior.

> D could easily add specific functions to `core.math` that specify operations as specifically IEEE-754 confirming. Using those, Phobos could give you types that are specified to produce results as specified by IEEE-754, with no interference by the optimizer. 

It does not do that. Anyway, I would expect that to go to std.numeric.

> You can't actually do the reverse, i.e. provide a type in Phobos that allows for optimizations of that sort but the core-language types are guaranteed to be unoptimized.

You say "unoptimized", I hear "not broken".

Anyway, clearly the default should be the variant with less pitfalls. If you really want to add some sort of flexible-precision data types, why not, but there should be a compiler flag to disable it.

> Such a type would have to be compiler-recognized, i.e. it would end up being a built-in type. 

I have no desire at all to suffer from irreproducible behavior because some dependency tried to max out on some irrelevant to me benchmark. I also have no desire at all to suffer from an unnecessary performance penalty just to recover reproducible behavior that is exposed directly by the hardware.

Of course, then there's the issue that libc math functions are not fully precise and have differences between implementations, but at least there seems to be some movement on that front, and this is easy to work around given that the built-in operations are sane.

August 19

Re: Differences in results when using the same function in CTFE and Runtime

Posted by Quirin Schroll
in reply to Timon Gehr

Permalink

Quirin Schroll

Posted in reply to Timon Gehr

Permalink

On Sunday, 18 August 2024 at 12:57:41 UTC, Timon Gehr wrote:

On 8/17/24 18:33, Quirin Schroll wrote:

The normal use case for floating-point isn't perfectly reproducible results between different optimization levels.

I would imagine the vast majority of FLOPs nowadays are used in HPC and AI workloads. Reproducibility is at least a plus, particularly in a research context.

However, differences between CTFE and RT are indeed unacceptable for core-language operations. Those are bugs.

No, they are not bugs, it's just the same kind of badly designed specification. According to the specification, you can get differences between RT and RT when running the exact same function. Of course you will get differences between CTFE and RT.

The reason for that is probably because Walter didn't like that other languages nailed down floating-point operations

Probably. C famously nails down floating-point operations, just like it nails down all the other types. D is really well-known for all of its unportable built-in data types, because Walter really does not like nailing things down and this is not one of D's selling points. /s

Anyway, at least LDC is sane on this at runtime by default. Otherwise I would have to switch language for use cases involving floating point, which would probably just make me abandon D in the long run.

so that you'd get both less precise results and worse performance.

Imagine just manually using the data type that is most suitable for your use case.

That would for example be the case on an 80387 coprocessor, and (here's where my knowledge ends)

Then your knowledge may be rather out of date. I get the x87 shenanigans, but that's just not very relevant anymore. I am not targeting 32-bit x86 with anything nowadays.

probably also true for basically all hardware today if you consider float specifically. I know of no hardware, that supports single precision, but not double precision. Giving you double precision instead of single is at least basically free and possibly even a performance boost, while also giving you more precision.

It's nonsense. If I want double, I ask for double. Also, it's definitely not true that going to double instead of single precision will boost your performance on a modern machine. If you are lucky it will not slow you down, but if the code can be auto-vectorized (or you are vectorizing manually), you are looking at least at a 2x slowdown.

An algorithm like Kahan summation must be implemented in a way that takes those optimizations into account.

I.e., do not try to implement this at all with the built-in floating-point types. It's impossible.

This is exactly like in C++, signed integer overflow is undefined, not because it's undefined on the hardware, but because it allows for optimizations.

If you have to resort to invoking insane C++ precedent in order to defend a point, you have lost the debate. Anyway, it is not at all the same (triggered by overflow vs triggered by default, undefined behavior vs wrong result), and also, in D, signed overflow is actually defined behavior.

D could easily add specific functions to core.math that specify operations as specifically IEEE-754 confirming. Using those, Phobos could give you types that are specified to produce results as specified by IEEE-754, with no interference by the optimizer.

It does not do that. Anyway, I would expect that to go to std.numeric.

You can't actually do the reverse, i.e. provide a type in Phobos that allows for optimizations of that sort but the core-language types are guaranteed to be unoptimized.

You say "unoptimized", I hear "not broken".

Anyway, clearly the default should be the variant with less pitfalls. If you really want to add some sort of flexible-precision data types, why not, but there should be a compiler flag to disable it.

Such a type would have to be compiler-recognized, i.e. it would end up being a built-in type.

I have no desire at all to suffer from irreproducible behavior because some dependency tried to max out on some irrelevant to me benchmark. I also have no desire at all to suffer from an unnecessary performance penalty just to recover reproducible behavior that is exposed directly by the hardware.

Of course, then there's the issue that libc math functions are not fully precise and have differences between implementations, but at least there seems to be some movement on that front, and this is easy to work around given that the built-in operations are sane.

I think you got me wrong here on a key aspect: I’m not trying to argue, but to explain what the D spec means and outline possible rationales for why it is as it is. For example, my x87 knowledge isn’t “outdated,” x87 co-processors didn’t change. They’re just not relevant anymore practically, but they could have influenced why Walter specified D as he did. I totally agree with you that in the context of modern hardware, D’s float spec makes little sense.

Then I tried to outline a compromise between Walter’s numerously stated opinion and the desires of many D community members, including you.

I’m somewhere between your side and neither side. I know enough about floating-point arithmetic to avoid it as much as I can, and it’s not due to how languages implement it, but how it works in practice. Personally, I wouldn’t even care much if D removed floating-point types entirely. I just see (saw?, years ago) great potential in D and want it to succeed, and I believe having reproducible results would be a big win.

One aspect about immune-to-optimizations types (calling them float32 and float64 for now) would be that if you have both a context in which you want good and fast algebraic results (where using fused-multiply-add is welcomed), but also a context, in which reproducibility is required, you could use double in the first context, and float64 in the second, while passing -ffast-math. Maybe a pragma on one kind of those functions is better. I don’t know. They’re not mutually exclusive. LDC already has @fastmath, so maybe it can just become official and be part of the D spec?

Top | Forum index | About this forum

Forums