D floating point maths (page 3)

"Russell Borogove" <kaleja@estarcion.com> wrote in message news:3C6590B5.1000602@estarcion.com... > As an extension of item 2, note that in the FPU, they're not one iota faster, but getting thousands of floats into and out of level-1 cache is much faster than doubles or extendeds. > > That's the main reason that 3D graphics and high-end audio applications, today, use floats instead of the fatter formats. Ok, I hadn't thought of that.

You may have read more recent docs than I have... last I thoroughly checked this out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff) Sean "Walter" <walter@digitalmars.com> wrote in message news:a446n5$15hm$3@digitaldaemon.com... > I know that you can reset the internal calculation precision. I did not know > this affected execution time, I've not seen any hint of that in the Intel CPU documentation, though I could have just missed it.

I suppose the definitive way is to write a benchmark. "Sean L. Palmer" <spalmer@iname.com> wrote in message news:a44i10$19tn$1@digitaldaemon.com... > You may have read more recent docs than I have... last I thoroughly checked > this out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff) > > Sean > > "Walter" <walter@digitalmars.com> wrote in message news:a446n5$15hm$3@digitaldaemon.com... > > I know that you can reset the internal calculation precision. I did not > know > > this affected execution time, I've not seen any hint of that in the Intel > > CPU documentation, though I could have just missed it. > > >

> You may have read more recent docs than I have... last I thoroughly checked > this out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff) That's all what I found in Intel optimization manuals: FDIV: Latency (single, double, extended) cycles: Pentium Pro : 17, 36, 56 Pentium 2,3 : 18, 32, 38 Pentium 4 : 23, 38, 43 FSQR: Latency (single, double, extended) cycles: Pentium 4 : 23, 38, 43 btw, It is highly not recomended to do any performance sensetive calculations on x86 in extended precision. There are no reg-mem floating point instructions for 80bit floats, everything have to be compiled into ( FLD / stack operations / FST ) form. Besides, extended precision FLD & FST are much slower than single/double precision.

February 11, 2002

Re: D floating point maths

Posted by Serge K
in reply to Serge K

Permalink

Serge K

Posted in reply to Serge K

Permalink

The same info with additions and corrections:

FDIV: Latency (single, double, extended) cycles:
Pentium Pro : 17,  36,  56
Pentium 2,3 :  18,  32,  38
Pentium 4    :  23,  38,  43
Athlon (K7) :  16,  20,  24

FSQRT: Latency (single, double, extended) cycles:
Pentium 4    :  23,  38,  43
Athlon (K7) :  19,  27,  35

FLD: Latency (single, double, extended) cycles:
Athlon (K7) :  2,  2,  10
FSTP: Latency (single, double, extended) cycles:
Athlon (K7) :  4,  4,  8

I have no info about FLD/FSTP on Pentium Pro..4,
only the number of micro-ops for FLD/FSTD:

number of micro-ops (single, double, extended):
FLD   : 1,  1,  4
FSTP : 2,  2, complex instruction

> btw,
> It is highly not recomended to do any performance sensitive calculations
on
> x86 in extended precision.
> There are no reg-mem floating point instructions for 80bit floats,
> everything have to be compiled into ( FLD / stack operations / FST ) form.
> Besides, extended precision FLD & FST are much slower than single/double
> precision.

It should be: ( FLD / stack operations / FSTP ),
since there is no FST for extended precision.
It means : there is no way to store some result in memory without throwing
it out of FPU stack.

Forums