June 29, 2014 Re: std.math performance (SSE vs. real) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Russel Winder | On 6/29/2014 2:30 PM, Russel Winder via Digitalmars-d wrote: > If D is a language that uses the underlying hardware representation then > it cannot define the use of specific formats for hardware numbers. Thus, > on hardware that provides IEEE754 format hardware float and double can > map to the 32-bit and 64-bit IEEE754 numbers offered. However if the > hardware does not provide IEEE754 hardware then either D must interpret > floating point expressions (as per Java) or it cannot be ported to that > architecture. cf. IBM 360. That's correct. The D spec says IEEE 754. > PS Walter just wrote that the type real is not defined as float and > double are, so it does have a Humpty Dumpty factor even if float and > double do not. It's still IEEE, just the longer lengths if they exist on the hardware. D is not unique in requiring IEEE 754 floats - Java does, too. So does Javascript. |
June 29, 2014 Re: std.math performance (SSE vs. real) | ||||
---|---|---|---|---|
| ||||
On 29 June 2014 23:20, H. S. Teoh via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Sun, Jun 29, 2014 at 08:54:49AM +0100, Iain Buclaw via Digitalmars-d wrote:
>> On 29 Jun 2014 05:48, "H. S. Teoh via Digitalmars-d" < digitalmars-d@puremagic.com> wrote:
>> >
>> > On Sat, Jun 28, 2014 at 08:41:24PM -0700, Andrei Alexandrescu via
>> Digitalmars-d wrote:
>> > > On 6/28/14, 6:02 PM, Tofu Ninja wrote:
>> > [...]
>> > > >I think this thread needs to refocus on the main point, getting math overloads for float and double and how to mitigate any problems that might arise from that.
>> > >
>> > > Yes please. -- Andrei
>> >
>> > Let's see the PR!
>> >
>>
>> I've already raised one (already linked in this thread).
>
> Are you talking about #2274? Interesting that your implementation is basically identical to my own idea for fixing std.math -- using unions instead of pointer casting.
>
Not really. The biggest speed up was from adding float+double overloads for floor, ceil, isNaN and isInfinity. Firstly, the use of a union itself didn't make much of a dent in the speed up. Removing the slow array copy operation did though. Secondly, unions are required for this particular function (floor) because we need to set bits through type-punning, it just wouldn't work casting to a pointer.
Regards
Iain
|
June 29, 2014 Re: std.math performance (SSE vs. real) | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On 6/29/2014 2:04 PM, John Colvin wrote:
> Assuming there isn't one, then what is the point of having a type with hardware
> dependant precision?
The point is D is a systems programming language, and the D programmer should not be locked out of the hardware capabilities of the system he is running on.
D should not be constrained to be the least common denominator of all and future processors.
|
June 29, 2014 Re: std.math performance (SSE vs. real) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Russel Winder | On 6/29/14, 11:13 AM, Russel Winder via Digitalmars-d wrote: > On Sun, 2014-06-29 at 07:59 -0700, Andrei Alexandrescu via Digitalmars-d > wrote: > […] > >> A friend who works at a hedge fund (after making the rounds to the NYC >> large financial companies) told me that's a myth. Any nontrivial >> calculation involving money (interest, fixed income, derivatives, ...) >> needs floating point. He never needed more than double. > > Very definitely so. Fixed point or integer arithmetic for simple > "household" finance fair enough, but for "finance house" calculations > you generally need 22+ significant denary digits to meet with compliance > requirements. I don't know of US regulations that ask for such. What I do know is I gave my hedge fund friend a call (today is his name day so it was as good a pretext as any) and mentioned that some people believe fixed point is used in finance. His answer was: BWAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAAAAAAAAAAAAAAAAAAAAAAA! I asked about how they solve accumulating numeric errors and he said it's on a case basis. Most of the time it's pennies for billions of dollars, so nobody cares. Sometimes there are reconciliations needed - so called REC's - that compare and adjust outputs of different algorithms. One nice war story he recalled: someone was storing the number of seconds as a double, and truncate it to int where needed. An error of at most one second wasn't important in the context. However, sometimes the second was around midnight so an error of one second was an error of one day, which was significant. The solution was to use rounding instead of truncation. Andrei |
June 29, 2014 Re: std.math performance (SSE vs. real) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Russel Winder | On 6/29/2014 2:45 PM, Russel Winder via Digitalmars-d wrote: > So D float and double will not work on IBM 360 unless interpreted, That's right. On the other hand, someone could create a "D360" fork of the language that was specifically targetted to the 360. Nothing wrong with that. Why burden the other 99.999999% of D programmers with 360 nutburger problems? > I guess we just hope that all future hardware is IEEE754 compliant. I'm not concerned about it. No CPU maker in their right head would do something different. I've witnessed decades of "portable" C code where the programmer tried to be "portable" in his use of int's and char's, but never tested it on a machine where those sizes are different, and when finally it was tested it turned out to be broken. Meaning that whether the D spec defines 360 portability or not, there's just no way that FP code is going to be portable to the 360 unless someone actually tests it. 1's complement, 10 bit bytes, 18 bit words, non-IEEE fp, are all DEAD. I can pretty much guarantee you that about zero of C/C++ programs will actually work without modification on those systems, despite the claims of the C/C++ Standard. I'd also bet you that most C/C++ code will break if ints are 64 bits, and about 99% will break if you try to compile them with a 16 bit C/C++ compiler. 90% will break if you feed it EBCDIC. |
June 30, 2014 Re: std.math performance (SSE vs. real) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 28 June 2014 15:16, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote: > On 6/27/2014 3:50 AM, Manu via Digitalmars-d wrote: >> >> Totally agree. >> Maintaining commitment to deprecated hardware which could be removed >> from the silicone at any time is a bit of a problem looking forwards. >> Regardless of the decision about whether overloads are created, at >> very least, I'd suggest x64 should define real as double, since the >> x87 is deprecated, and x64 ABI uses the SSE unit. It makes no sense at >> all to use real under any general circumstances in x64 builds. >> >> And aside from that, if you *think* you need real for precision, the >> truth is, you probably have bigger problems. >> Double already has massive precision. I find it's extremely rare to >> have precision problems even with float under most normal usage >> circumstances, assuming you are conscious of the relative magnitudes >> of your terms. > > > That's a common perception of people who do not use the floating point unit for numerical work, and whose main concern is speed instead of accuracy. > > I've done numerical floating point work. Two common cases where such precision matters: > > 1. numerical integration > 2. inverting matrices > > It's amazing how quickly precision gets overwhelmed and you get garbage answers. For example, when inverting a matrix with doubles, the results are garbage for larger than 14*14 matrices or so. There are techniques for dealing with this, but they are complex and difficult to implement. This is what I was alluding to wrt being aware of the relative magnitudes of terms in operations. You're right it can be a little complex, but it's usually just a case of rearranging the operations a bit, or worst case, a temporary renormalisation from time to time. > Increasing the precision is the most straightforward way to deal with it. Is a 14*14 matrix really any more common than a 16*16 matrix though? It just moves the goal post a bit. Numerical integration will always manage to find it's way into crazy big or crazy small numbers. It's all about relative magnitude with floats. 'real' is only good for about 4 more significant digits... I've often thought they went a bit overboard on exponent and skimped on mantissa. Surely most users would reach for a lib in these cases anyway, and they would be written by an expert. Either way, I don't think it's sensible to have a std api defy the arch ABI. > Note that the 80 bit precision comes from W.F. Kahan, and he's no fool when dealing with these issues. I never argued this. I'm just saying I can't see how defying the ABI in a std api could be seen as a good idea applied generally to all software. > Another boring Boeing anecdote: calculators have around 10 digits of precision. A colleague of mine was doing a multi-step calculation, and rounded each step to 2 decimal points. I told him he needed to keep the full 10 digits. He ridiculed me - but his final answer was off by a factor of 2. He could not understand why, and I'd explain, but he could never get how his 2 places past the decimal point did not work. Rounding down to 2 decimal points is rather different than rounding from 19 to 15 decimal points. > Do you think engineers like that will ever understand the problems with double precision, or have the remotest idea how to deal with them beyond increasing the precision? I don't. I think they would use a library. Either way, those jobs are so rare, I don't see that it's worth defying the arch ABI across the board for it. I think there should be a 'double' overload. The existing real overload would be chosen when people use the real type explicitly. Another advantage of this, is that when people are using the double type, the API will produce the same results on all architectures, including the ones that don't have 'real'. >> I find it's extremely rare to have precision problems even with float >> under most normal usage >> circumstances, > > Then you aren't doing numerical work, because it happens right away. My key skillset includes physics, lighting, rendering, animation. These are all highly numerical workloads. While I am comfortable with some acceptable level of precision loss for performance, I possibly have to worry about maintaining numerical precision even more since I use low-precision types exclusively. I understand the problem very well, probably better than most. More often than not, the problems are easily mitigated by rearranging operations such that operations are performed against terms with relative magnitudes, or in some instances, temporarily renormalising terms. I agree these aren't skills that most people have, but most people use libraries for complex numerical work... or would, if such a robust library existed. Thing is, *everybody* will use std.math. |
June 30, 2014 Re: std.math performance (SSE vs. real) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 28 June 2014 16:16, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 6/27/2014 10:18 PM, Walter Bright wrote:
>>
>> On 6/27/2014 4:10 AM, John Colvin wrote:
>>>
>>> *The number of algorithms that are both numerically stable/correct and
>>> benefit
>>> significantly from > 64bit doubles is very small.
>>
>>
>> To be blunt, baloney. I ran into these problems ALL THE TIME when doing professional numerical work.
>>
>
> Sorry for being so abrupt. FP is important to me - it's not just about performance, it's also about accuracy.
Well, here's the thing then. Consider that 'real' is only actually supported on only a single (long deprecated!) architecture.
I think it's reasonable to see that 'real' is not actually an fp type.
It's more like an auxiliary type, which just happens to be supported
via a completely different (legacy) set of registers on x64 (most
arch's don't support it at all).
In x64's case, it is deprecated for over a decade now, and may be
removed from the hardware at some unknown time. The moment that x64
processors decide to stop supporting 32bit code, the x87 will go away,
and those opcodes will likely be emulated or microcoded.
Interacting real<->float/double means register swapping through
memory. It should be treated the same as float<->simd; they are
distinct (on most arch's).
For my money, x87 can only be considered, at best, a coprocessor (a slow one!), which may or may not exist. Software written today (10+ years after the hardware was deprecated) should probably even consider introducing runtime checks to see if the hardware is even present before making use of it.
It's fine to offer a great precise extended precision library, but I don't think it can be _the_ standard math library which is used by everyone in virtually all applications. It's not a defined part of the architecture, it's slow, and it will probably go away in the future.
It's the same situation with SIMD; on x64, the SIMD unit and the FPU are the same unit, but I don't think it's reasonable to design all the API's around that assumption. Most processors separate the SIMD unit from the FPU, and the language decisions reflect that. We can't make the language treat SIMD just like an FPU extensions on account of just one single architecture... although in that case, the argument would be even more compelling since x64 is actually current and active.
|
June 30, 2014 Re: std.math performance (SSE vs. real) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On 6/29/2014 8:22 PM, Manu via Digitalmars-d wrote: > Well, here's the thing then. Consider that 'real' is only actually > supported on only a single (long deprecated!) architecture. It's news to me that x86, x86-64, etc., are deprecated, despite being used to run pretty much all desktops and laptops and even servers. The 80 bit reals are also part of the C ABI for Linux, OSX, and FreeBSD, 32 and 64 bit. > I think it's reasonable to see that 'real' is not actually an fp type. I find that a bizarre statement. > It's more like an auxiliary type, which just happens to be supported > via a completely different (legacy) set of registers on x64 (most > arch's don't support it at all). The SIMD registers are also a "completely different set of registers". > In x64's case, it is deprecated for over a decade now, and may be > removed from the hardware at some unknown time. The moment that x64 > processors decide to stop supporting 32bit code, the x87 will go away, > and those opcodes will likely be emulated or microcoded. > Interacting real<->float/double means register swapping through > memory. It should be treated the same as float<->simd; they are > distinct (on most arch's). Since they are part of the 64 bit C ABI, that would seem to be in the category of "nevah hoppen". > It's the same situation with SIMD; on x64, the SIMD unit and the FPU > are the same unit, but I don't think it's reasonable to design all the > API's around that assumption. Most processors separate the SIMD unit > from the FPU, and the language decisions reflect that. We can't make > the language treat SIMD just like an FPU extensions on account of just > one single architecture... although in that case, the argument would > be even more compelling since x64 is actually current and active. Intel has yet to remove any SIMD instructions. |
June 30, 2014 Re: std.math performance (SSE vs. real) | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On 29 June 2014 10:11, deadalnix via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Saturday, 28 June 2014 at 09:07:17 UTC, John Colvin wrote:
>>
>> On Saturday, 28 June 2014 at 06:16:51 UTC, Walter Bright wrote:
>>>
>>> On 6/27/2014 10:18 PM, Walter Bright wrote:
>>>>
>>>> On 6/27/2014 4:10 AM, John Colvin wrote:
>>>>>
>>>>> *The number of algorithms that are both numerically stable/correct and
>>>>> benefit
>>>>> significantly from > 64bit doubles is very small.
>>>>
>>>>
>>>> To be blunt, baloney. I ran into these problems ALL THE TIME when doing professional numerical work.
>>>>
>>>
>>> Sorry for being so abrupt. FP is important to me - it's not just about performance, it's also about accuracy.
>>
>>
>> I still maintain that the need for the precision of 80bit reals is a niche demand. Its a very important niche, but it doesn't justify having its relatively extreme requirements be the default. Someone writing a matrix inversion has only themselves to blame if they don't know plenty of numerical analysis and look very carefully at the specifications of all operations they are using.
>>
>> Paying the cost of moving to/from the fpu, missing out on increasingly large SIMD units, these make everyone pay the price.
>>
>> inclusion of the 'real' type in D was a great idea, but std.math should be overloaded for float/double/real so people have the choice where they stand on the performance/precision front.
>
>
> Would thar make sense to have std.mast and std.fastmath, or something along these lines ?
I've thought this too.
std.math and std.numeric maybe?
To me, 'fastmath' suggests comfort with approximations/estimates or
other techniques in favour of speed, and I don't think the non-'real'
version should presume that.
It's not that we have a 'normal' one and a 'fast' one. What we have is
a 'slow' one, and the other is merely normal; ie, "std.math".
|
June 30, 2014 Re: std.math performance (SSE vs. real) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 30 June 2014 14:15, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote: > On 6/29/2014 8:22 PM, Manu via Digitalmars-d wrote: >> >> Well, here's the thing then. Consider that 'real' is only actually supported on only a single (long deprecated!) architecture. > > > It's news to me that x86, x86-64, etc., are deprecated, despite being used to run pretty much all desktops and laptops and even servers. The 80 bit reals are also part of the C ABI for Linux, OSX, and FreeBSD, 32 and 64 bit. x86_64 and x86 are different architectures, and they have very different ABI's. Nobody is manufacturing x86 (exclusive) cpu's. Current x86_64 cpu's maintain a backwards compatibility mode, but that's not a part of the x86-64 spec, and may go away when x86_64 is deemed sufficiently pervasive and x86 sufficiently redundant. >> I think it's reasonable to see that 'real' is not actually an fp type. > > > I find that a bizarre statement. Well, it's not an fp type as implemented by the standard fp architecture of any cpu except x86, which is becoming less relevant with each passing day. >> It's more like an auxiliary type, which just happens to be supported >> via a completely different (legacy) set of registers on x64 (most >> arch's don't support it at all). > > > The SIMD registers are also a "completely different set of registers". Correct, so they are deliberately treated separately. I argued for strong separation between simd and float, and you agreed. >> In x64's case, it is deprecated for over a decade now, and may be >> removed from the hardware at some unknown time. The moment that x64 >> processors decide to stop supporting 32bit code, the x87 will go away, >> and those opcodes will likely be emulated or microcoded. >> Interacting real<->float/double means register swapping through >> memory. It should be treated the same as float<->simd; they are >> distinct (on most arch's). > > > Since they are part of the 64 bit C ABI, that would seem to be in the category of "nevah hoppen". Not in windows. You say they are in linux? I don't know. "Intel started discouraging the use of x87 with the introduction of the P4 in late 2000. AMD deprecated x87 since the K8 in 2003, as x86-64 is defined with SSE2 support; VIA’s C7 has supported SSE2 since 2005. In 64-bit versions of Windows, x87 is deprecated for user-mode, and prohibited entirely in kernel-mode." How do you distinguish x87 double and xmm double in C? The only way I know to access x87 is with inline asm. >> It's the same situation with SIMD; on x64, the SIMD unit and the FPU are the same unit, but I don't think it's reasonable to design all the API's around that assumption. Most processors separate the SIMD unit from the FPU, and the language decisions reflect that. We can't make the language treat SIMD just like an FPU extensions on account of just one single architecture... although in that case, the argument would be even more compelling since x64 is actually current and active. > > > Intel has yet to remove any SIMD instructions. Huh? I think you misunderstood my point. I'm saying that fpu/simd units are distinct, and they are distanced by the type system in order to respect that separation. |
Copyright © 1999-2021 by the D Language Foundation