August 06, 2016
On 6 August 2016 at 12:07, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote:
>>
>> On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>
>>> On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote:
>>>>
>>>> [...]
>>>
>>>
>>>
>>> OK, then we need a third pragma,`pragma(ieeeRound)`. But
>>> `pragma(fusedMath)`
>>> and `pragma(fastMath)` should be presented too.
>>>
>>>> [...]
>>>
>>>
>>>
>>> It allows a compiler to replace two arithmetic operations with single composed one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set.
>>
>>
>> No pragmas tied to a specific architecture should be allowed in the language spec, please.
>
>
> Then probably Mir will drop all compilers, but LDC
> LLVM is tied for real world, so we can tied D for real world too. If a
> compiler can not implement optimization pragma, then this pragma can be just
> ignored by the compiler.

If you need a function to work with an exclusive instruction set or something as specific as use of composed/fused instructions, then it is common to use an indirect function resolver to choose the most relevant implementation for the system that's running the code (a la @ifunc), then for the targetted fusedMath implementation, do it yourself.
August 06, 2016
On Saturday, 6 August 2016 at 11:10:18 UTC, Iain Buclaw wrote:
> On 6 August 2016 at 12:07, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>> On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote:
>>>
>>> On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>>
>>>> On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote:
>>>>>
>>>>> [...]
>>>>
>>>>
>>>>
>>>> OK, then we need a third pragma,`pragma(ieeeRound)`. But
>>>> `pragma(fusedMath)`
>>>> and `pragma(fastMath)` should be presented too.
>>>>
>>>>> [...]
>>>>
>>>>
>>>>
>>>> It allows a compiler to replace two arithmetic operations with single composed one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set.
>>>
>>>
>>> No pragmas tied to a specific architecture should be allowed in the language spec, please.
>>
>>
>> Then probably Mir will drop all compilers, but LDC
>> LLVM is tied for real world, so we can tied D for real world too. If a
>> compiler can not implement optimization pragma, then this pragma can be just
>> ignored by the compiler.
>
> If you need a function to work with an exclusive instruction set or something as specific as use of composed/fused instructions, then it is common to use an indirect function resolver to choose the most relevant implementation for the system that's running the code (a la @ifunc), then for the targetted fusedMath implementation, do it yourself.

What do you mean by "do it yourself"? Write code using FMA GCC intrinsics? Why I need to do something that can be automated by a compiler? Modern approach is to give a hint to the compiler instead of write specialised code for different architectures.

It seems you have misunderstood me. I don't want to force compiler to use explicit instruction sets. Instead, I want to give a hint to a compiler, about what math _transformations_ are allowed. And this hints are architecture independent. A compiler may a may not use this hints to optimise code.
August 06, 2016
Am Sat, 6 Aug 2016 02:29:50 -0700
schrieb Walter Bright <newshound2@digitalmars.com>:

> On 8/6/2016 1:21 AM, Ilya Yaroshenko wrote:
> > On Friday, 5 August 2016 at 20:53:42 UTC, Walter Bright wrote:
> > 
> >> I agree that the typical summation algorithm suffers from double rounding. But that's one algorithm. I would appreciate if you would review http://dlang.org/phobos/std_algorithm_iteration.html#sum to ensure it doesn't have this problem, and if it does, how we can fix it.
> >
> > Phobos's sum is two different algorithms. Pairwise summation for Random Access Ranges and Kahan summation for Input Ranges. Pairwise summation does not require IEEE rounding, but Kahan summation requires it.
> >
> > The problem with real world example is that it depends on optimisation. For example, if all temporary values are rounded, this is not a problem, and if all temporary values are not rounded this is not a problem too. However if some of them rounded and others are not, than this will break Kahan algorithm.
> >
> > Kahan is the shortest and one of the slowest (comparing with KBN for example) summation algorithms. The true story about Kahan, that we may have it in Phobos, but we can use pairwise summation for Input Ranges without random access, and it will be faster then Kahan. So we don't need Kahan for current API at all.
> >
> > Mir has both Kahan, which works with 32-bit DMD, and pairwise, witch works with input ranges.
> >
> > Kahan, KBN, KB2, and Precise summations is always use `real` or `Complex!real` internal values for 32 bit X86 target. The only problem with Precise summation, if we need precise result in double and use real for internal summation, then the last bit will be wrong in the 50% of cases.
> >
> > Another good point about Mir's summation algorithms, that they are Output Ranges. This means they can be used effectively to sum multidimensional arrays for example. Also, Precise summator may be used to compute exact sum of distributed data.
> >
> > When we get a decision and solution for rounding problem, I will
> > make PR for std.experimental.numeric.sum.
> > 
> >> I hear you. I'd like to explore ways of solving it. Got any ideas?
> >
> > We need to take the overall picture.
> >
> > It is very important to recognise that D core team is small and D community is not large enough now to involve a lot of new professionals. This means that time of existing one engineers is very important for D and the most important engineer for D is you, Walter.
> >
> > In the same time we need to move forward fast with language changes and druntime changes (GC-less Fibers for example).
> >
> > So, we need to choose tricky options for development. The most important option for D in the science context is to split D Programming Language from DMD in our minds. I am not asking to remove DMD as reference compiler. Instead of that, we can introduce changes in D that can not be optimally implemented in DMD (because you have a lot of more important things to do for D instead of optimisation) but will be awesome for our LLVM-based or GCC-based backends.
> >
> > We need 2 new pragmas with the same syntax as `pragma(inline, xxx)`:
> >
> > 1. `pragma(fusedMath)` allows fused mul-add, mul-sub, div-add,
> > div-sub operations. 2. `pragma(fastMath)` equivalents to [1]. This
> > pragma can be used to allow extended precision.
> >
> > This should be 2 separate pragmas. The second one may assume the first one.
> >
> > Recent LDC beta has @fastmath attribute for functions, and it is already used in Phobos ndslice.algorithm PR and its Mir's mirror. Attributes are alternative for pragmas, but their syntax should be extended, see [2]
> >
> > The old approach is separate compilation, but it is weird, low level for users, and requires significant efforts for both small and large projects.
> >
> > [1] http://llvm.org/docs/LangRef.html#fast-math-flags
> > [2] https://github.com/ldc-developers/ldc/issues/1669
> 
> Thanks for your help with this.
> 
> Using attributes for this is a mistake. Attributes affect the interface to a function

This is not true for UDAs. LDC and GDC actually implement @attribute as an UDA. And UDAs used in serialization interfaces, the std.benchmark proposals, ... do not affect the interface either.

> not its internal implementation.

It's possible to reflect on the UDAs of the current function, so this is not true in general:
-----------------------------
@(40) int foo()
{
    mixin("alias thisFunc = " ~ __FUNCTION__ ~ ";");
    return __traits(getAttributes, thisFunc)[0];
}
-----------------------------
https://dpaste.dzfl.pl/aa0615b40adf

I think this restriction is also quite arbitrary. For end users attributes provide a much nicer syntax than pragmas. Both GDC and LDC already successfully use UDAs for function specific backend options, so DMD is really the exception here.

Additionally, even according to your rules pragma(mangle) should
actually be @mangle.
August 06, 2016
On 6 August 2016 at 13:30, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Saturday, 6 August 2016 at 11:10:18 UTC, Iain Buclaw wrote:
>>
>> On 6 August 2016 at 12:07, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>
>>> On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote:
>>>>
>>>>
>>>> On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>>>
>>>>>
>>>>> On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote:
>>>>>>
>>>>>>
>>>>>> [...]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> OK, then we need a third pragma,`pragma(ieeeRound)`. But
>>>>> `pragma(fusedMath)`
>>>>> and `pragma(fastMath)` should be presented too.
>>>>>
>>>>>> [...]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> It allows a compiler to replace two arithmetic operations with single composed one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set.
>>>>
>>>>
>>>>
>>>> No pragmas tied to a specific architecture should be allowed in the language spec, please.
>>>
>>>
>>>
>>> Then probably Mir will drop all compilers, but LDC
>>> LLVM is tied for real world, so we can tied D for real world too. If a
>>> compiler can not implement optimization pragma, then this pragma can be
>>> just
>>> ignored by the compiler.
>>
>>
>> If you need a function to work with an exclusive instruction set or something as specific as use of composed/fused instructions, then it is common to use an indirect function resolver to choose the most relevant implementation for the system that's running the code (a la @ifunc), then for the targetted fusedMath implementation, do it yourself.
>
>
> What do you mean by "do it yourself"? Write code using FMA GCC intrinsics? Why I need to do something that can be automated by a compiler? Modern approach is to give a hint to the compiler instead of write specialised code for different architectures.
>
> It seems you have misunderstood me. I don't want to force compiler to use explicit instruction sets. Instead, I want to give a hint to a compiler, about what math _transformations_ are allowed. And this hints are architecture independent. A compiler may a may not use this hints to optimise code.

There are compiler switches for that.  Maybe there should be one pragma to tweak these compiler switches on a per-function basis, rather than separately named pragmas.  That way you tell the compiler what you want, rather than it being part of the language logic to understand what must be turned on/off internally.

First, assume the language knows nothing about what platform it's running on, then use that as a basis for suggesting new pragmas that should be supported everywhere.
August 06, 2016
On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote:
> On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>> On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote:
>>>
>
> No pragmas tied to a specific architecture should be allowed in the language spec, please.

Hmmm, that's the whole point of pragmas (at least in C) to specify implementation specific stuff outside of the language specs. If it's in the language specs it should be done with language specific mechanisms.
August 06, 2016
On 6 August 2016 at 16:11, Patrick Schluter via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote:
>>
>> On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>
>>> On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote:
>>>>
>>>>
>>
>> No pragmas tied to a specific architecture should be allowed in the language spec, please.
>
>
> Hmmm, that's the whole point of pragmas (at least in C) to specify implementation specific stuff outside of the language specs. If it's in the language specs it should be done with language specific mechanisms.

https://dlang.org/spec/pragma.html#predefined-pragmas

"""
All implementations must support these, even if by just ignoring them.
...
Vendor specific pragma Identifiers can be defined if they are prefixed
by the vendor's trademarked name, in a similar manner to version
identifiers.
"""

So all added pragmas that have no vendor prefix must be treated as part of the language in order to conform with the specs.
August 06, 2016
On 8/6/2016 5:09 AM, Johannes Pfau wrote:
> I think this restriction is also quite arbitrary.

You're right that there are gray areas, but the distinction is not arbitrary.

For example, mangling does not affect the interface. It affects the name.

Using an attribute has more downsides, as it affects the whole function rather than just part of it, like a pragma would.

August 06, 2016
On 8/6/2016 2:48 AM, Ilya Yaroshenko wrote:
>> I don't know what the point of fusedMath is.
> It allows a compiler to replace two arithmetic operations with single composed
> one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set.

I understand that, I just don't understand why that wouldn't be done anyway.
August 06, 2016
On 8/6/2016 3:02 AM, Iain Buclaw via Digitalmars-d wrote:
> No pragmas tied to a specific architecture should be allowed in the
> language spec, please.


A good point. On the other hand, a list of them would be nice so implementations don't step on each other.

August 06, 2016
On Saturday, 6 August 2016 at 19:51:11 UTC, Walter Bright wrote:
> On 8/6/2016 2:48 AM, Ilya Yaroshenko wrote:
>>> I don't know what the point of fusedMath is.
>> It allows a compiler to replace two arithmetic operations with single composed
>> one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set.
>
> I understand that, I just don't understand why that wouldn't be done anyway.

Some applications requires exactly the same results for different architectures (probably because business requirement). So this optimization is turned off by default in LDC for example.