February 10, 2002
"Russell Borogove" <kaleja@estarcion.com> wrote in message news:3C6590B5.1000602@estarcion.com...
> As an extension of item 2, note that in the FPU, they're not one iota faster, but getting thousands of floats into and out of level-1 cache is much faster than doubles or extendeds.
>
> That's the main reason that 3D graphics and high-end audio applications, today, use floats instead of the fatter formats.

Ok, I hadn't thought of that.


February 10, 2002
You may have read more recent docs than I have... last I thoroughly checked this out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff)

Sean

"Walter" <walter@digitalmars.com> wrote in message news:a446n5$15hm$3@digitaldaemon.com...
> I know that you can reset the internal calculation precision. I did not
know
> this affected execution time, I've not seen any hint of that in the Intel CPU documentation, though I could have just missed it.



February 10, 2002
I suppose the definitive way is to write a benchmark.

"Sean L. Palmer" <spalmer@iname.com> wrote in message news:a44i10$19tn$1@digitaldaemon.com...
> You may have read more recent docs than I have... last I thoroughly
checked
> this out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff)
>
> Sean
>
> "Walter" <walter@digitalmars.com> wrote in message news:a446n5$15hm$3@digitaldaemon.com...
> > I know that you can reset the internal calculation precision. I did not
> know
> > this affected execution time, I've not seen any hint of that in the
Intel
> > CPU documentation, though I could have just missed it.
>
>
>


February 10, 2002
> You may have read more recent docs than I have... last I thoroughly
checked
> this out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff)

That's all what I found in Intel optimization manuals:

FDIV: Latency (single, double, extended) cycles:
Pentium Pro : 17,  36,  56
Pentium 2,3 :  18,  32,  38
Pentium 4    :  23,  38,  43

FSQR: Latency (single, double, extended) cycles:
Pentium 4    :  23,  38,  43

btw,
It is highly not recomended to do any performance sensetive calculations on
x86 in extended precision.
There are no reg-mem floating point instructions for 80bit floats,
everything have to be compiled into ( FLD / stack operations / FST ) form.
Besides, extended precision FLD & FST are much slower than single/double
precision.



February 11, 2002
The same info with additions and corrections:

FDIV: Latency (single, double, extended) cycles:
Pentium Pro : 17,  36,  56
Pentium 2,3 :  18,  32,  38
Pentium 4    :  23,  38,  43
Athlon (K7) :  16,  20,  24

FSQRT: Latency (single, double, extended) cycles:
Pentium 4    :  23,  38,  43
Athlon (K7) :  19,  27,  35

FLD: Latency (single, double, extended) cycles:
Athlon (K7) :  2,  2,  10
FSTP: Latency (single, double, extended) cycles:
Athlon (K7) :  4,  4,  8

I have no info about FLD/FSTP on Pentium Pro..4,
only the number of micro-ops for FLD/FSTD:

number of micro-ops (single, double, extended):
FLD   : 1,  1,  4
FSTP : 2,  2, complex instruction

> btw,
> It is highly not recomended to do any performance sensitive calculations
on
> x86 in extended precision.
> There are no reg-mem floating point instructions for 80bit floats,
> everything have to be compiled into ( FLD / stack operations / FST ) form.
> Besides, extended precision FLD & FST are much slower than single/double
> precision.

It should be: ( FLD / stack operations / FSTP ),
since there is no FST for extended precision.
It means : there is no way to store some result in memory without throwing
it out of FPU stack.



1 2 3
Next ›   Last »