February 10, 2002 Re: D floating point maths | ||||
---|---|---|---|---|
| ||||
Posted in reply to Russell Borogove | "Russell Borogove" <kaleja@estarcion.com> wrote in message news:3C6590B5.1000602@estarcion.com... > As an extension of item 2, note that in the FPU, they're not one iota faster, but getting thousands of floats into and out of level-1 cache is much faster than doubles or extendeds. > > That's the main reason that 3D graphics and high-end audio applications, today, use floats instead of the fatter formats. Ok, I hadn't thought of that. |
February 10, 2002 Re: D floating point maths | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | You may have read more recent docs than I have... last I thoroughly checked this out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff) Sean "Walter" <walter@digitalmars.com> wrote in message news:a446n5$15hm$3@digitaldaemon.com... > I know that you can reset the internal calculation precision. I did not know > this affected execution time, I've not seen any hint of that in the Intel CPU documentation, though I could have just missed it. |
February 10, 2002 Re: D floating point maths | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean L. Palmer | I suppose the definitive way is to write a benchmark. "Sean L. Palmer" <spalmer@iname.com> wrote in message news:a44i10$19tn$1@digitaldaemon.com... > You may have read more recent docs than I have... last I thoroughly checked > this out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff) > > Sean > > "Walter" <walter@digitalmars.com> wrote in message news:a446n5$15hm$3@digitaldaemon.com... > > I know that you can reset the internal calculation precision. I did not > know > > this affected execution time, I've not seen any hint of that in the Intel > > CPU documentation, though I could have just missed it. > > > |
February 10, 2002 Re: D floating point maths | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean L. Palmer | > You may have read more recent docs than I have... last I thoroughly checked > this out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff) That's all what I found in Intel optimization manuals: FDIV: Latency (single, double, extended) cycles: Pentium Pro : 17, 36, 56 Pentium 2,3 : 18, 32, 38 Pentium 4 : 23, 38, 43 FSQR: Latency (single, double, extended) cycles: Pentium 4 : 23, 38, 43 btw, It is highly not recomended to do any performance sensetive calculations on x86 in extended precision. There are no reg-mem floating point instructions for 80bit floats, everything have to be compiled into ( FLD / stack operations / FST ) form. Besides, extended precision FLD & FST are much slower than single/double precision. |
February 11, 2002 Re: D floating point maths | ||||
---|---|---|---|---|
| ||||
Posted in reply to Serge K | The same info with additions and corrections: FDIV: Latency (single, double, extended) cycles: Pentium Pro : 17, 36, 56 Pentium 2,3 : 18, 32, 38 Pentium 4 : 23, 38, 43 Athlon (K7) : 16, 20, 24 FSQRT: Latency (single, double, extended) cycles: Pentium 4 : 23, 38, 43 Athlon (K7) : 19, 27, 35 FLD: Latency (single, double, extended) cycles: Athlon (K7) : 2, 2, 10 FSTP: Latency (single, double, extended) cycles: Athlon (K7) : 4, 4, 8 I have no info about FLD/FSTP on Pentium Pro..4, only the number of micro-ops for FLD/FSTD: number of micro-ops (single, double, extended): FLD : 1, 1, 4 FSTP : 2, 2, complex instruction > btw, > It is highly not recomended to do any performance sensitive calculations on > x86 in extended precision. > There are no reg-mem floating point instructions for 80bit floats, > everything have to be compiled into ( FLD / stack operations / FST ) form. > Besides, extended precision FLD & FST are much slower than single/double > precision. It should be: ( FLD / stack operations / FSTP ), since there is no FST for extended precision. It means : there is no way to store some result in memory without throwing it out of FPU stack. |
Copyright © 1999-2021 by the D Language Foundation