August 06, 2016
On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote:
> No pragmas tied to a specific architecture should be allowed in the language spec, please.

I wholeheartedly agree. However, it's not like FP optimisation pragmas would be specific to any particular architecture. They just describe classes of transformations that are allowed on top of the standard semantics.

For example, whether transforming `a + (b * c)` into a single operation is allowed is not a question of the target architecture at all, but rather whether the implicit rounding after evaluating (b * c) can be skipped or not. While this in turn of course enables the compiler to use FMA instructions on x86/AVX, ARM/NEON, PPC, …, it is not architecture-specific at all on a conceptual level.

 — David
August 06, 2016
On Saturday, 6 August 2016 at 12:48:26 UTC, Iain Buclaw wrote:
> There are compiler switches for that.  Maybe there should be one pragma to tweak these compiler switches on a per-function basis, rather than separately named pragmas.

This might be a solution for inherently compiler-specific settings (although for LDC we would probably go for "type-safe" UDAs/pragmas instead of parsing faux command-line strings).

Floating point transformation semantics aren't compiler-specific, though. The corresponding options are used commonly enough in certain kinds of code that it doesn't seem prudent to require users to resort to compiler-specific ways of expressing them.

 — David
August 06, 2016
On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote:
> The LDC fastmath bothers me a lot. It throws away proper NaN and infinity handling, and throws away precision by allowing reciprocal and algebraic transformations.

This is true – and precisely the reason why it is actually defined (ldc.attributes) as

---
alias fastmath = AliasSeq!(llvmAttr("unsafe-fp-math", "true"), llvmFastMathFlag("fast"));
---

This way, users can actually combine different optimisations in a more tasteful manner as appropriate for their particular application.

Experience has shown that people – even those intimately familiar with FP semantics – expect a catch-all kitchen-sink switch for all natural optimisations (natural when equating FP values with real numbers). This is why the shorthand exists.

 — David
August 06, 2016
On 8/6/2016 1:06 PM, Ilya Yaroshenko wrote:
> Some applications requires exactly the same results for different architectures
> (probably because business requirement). So this optimization is turned off by
> default in LDC for example.

Let me rephrase the question - how does fusing them alter the result?
August 06, 2016
On 8/6/2016 2:12 PM, David Nadlinger wrote:
> This is true – and precisely the reason why it is actually defined
> (ldc.attributes) as
>
> ---
> alias fastmath = AliasSeq!(llvmAttr("unsafe-fp-math", "true"),
> llvmFastMathFlag("fast"));
> ---
>
> This way, users can actually combine different optimisations in a more tasteful
> manner as appropriate for their particular application.
>
> Experience has shown that people – even those intimately familiar with FP
> semantics – expect a catch-all kitchen-sink switch for all natural optimisations
> (natural when equating FP values with real numbers). This is why the shorthand
> exists.

I didn't know that, thanks for the explanation. But the same can be done for pragmas, as the second argument isn't just true|false, it's an expression.
August 06, 2016
On Saturday, 6 August 2016 at 21:56:06 UTC, Walter Bright wrote:
> Let me rephrase the question - how does fusing them alter the result?

There is just one rounding operation instead of two.

Of course, if floating point values are strictly defined as having only a minimum precision, then folding away the rounding after the multiplication is always legal.

 — David
August 06, 2016
On 8/6/2016 3:14 PM, David Nadlinger wrote:
> On Saturday, 6 August 2016 at 21:56:06 UTC, Walter Bright wrote:
>> Let me rephrase the question - how does fusing them alter the result?
>
> There is just one rounding operation instead of two.

Makes sense.


> Of course, if floating point values are strictly defined as having only a
> minimum precision, then folding away the rounding after the multiplication is
> always legal.

Yup.

So it does make sense that allowing fused operations would be equivalent to having no maximum precision.

August 07, 2016
On Saturday, 6 August 2016 at 21:56:06 UTC, Walter Bright wrote:
> On 8/6/2016 1:06 PM, Ilya Yaroshenko wrote:
>> Some applications requires exactly the same results for different architectures
>> (probably because business requirement). So this optimization is turned off by
>> default in LDC for example.
>
> Let me rephrase the question - how does fusing them alter the result?

The result became more precise, because single rounding instead of two.
August 07, 2016
On Saturday, 6 August 2016 at 22:32:08 UTC, Walter Bright wrote:
> On 8/6/2016 3:14 PM, David Nadlinger wrote:
>> Of course, if floating point values are strictly defined as having only a
>> minimum precision, then folding away the rounding after the multiplication is
>> always legal.
>
> Yup.
>
> So it does make sense that allowing fused operations would be equivalent to having no maximum precision.

Fused operations are mul/div+add/sub only.
Fused operations does not break compesator subtraction:
auto t = a - x + x;
So, please, make them as separate pragma.
August 07, 2016
On 6 August 2016 at 22:12, David Nadlinger via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote:
>>
>> No pragmas tied to a specific architecture should be allowed in the language spec, please.
>
>
> I wholeheartedly agree. However, it's not like FP optimisation pragmas would be specific to any particular architecture. They just describe classes of transformations that are allowed on top of the standard semantics.
>
> For example, whether transforming `a + (b * c)` into a single operation is allowed is not a question of the target architecture at all, but rather whether the implicit rounding after evaluating (b * c) can be skipped or not. While this in turn of course enables the compiler to use FMA instructions on x86/AVX, ARM/NEON, PPC, …, it is not architecture-specific at all on a conceptual level.
>

Well, you get fusedMath for free when turning on -mfma or -mfused-madd - whatever is most relevant for the target.

Try adding -mfma here.  http://goo.gl/xsvDXM