flat assembler
Message board for the users of flat assembler.
Index
> Main > Great read about Haswell CPU Microarchitecture Goto page Previous 1, 2 |
Author |
|
Melissa 11 Jun 2013, 20:23
tthsqe wrote: Does anybody know if vdivpd/vsqrtpd is still split internally (thus doubling the latency of divpd/sqrtpd) on haswell? I noticed this when there was not a 2:1 performance ratio on AVX:SSE code for the mandelbox on sandy/ivebridge. Unfortuately, the mandelbox uses a divide, and this ratio is only 1.3:1. I have asked in clax newsgroup and got this list: http://users.atw.hu/instlatx64/GenuineIntel00306C3_Haswell_InstLatX64.txt Seems that situation is same as for previous generations. |
|||
11 Jun 2013, 20:23 |
|
tthsqe 11 Jun 2013, 20:32
thanks for the info. I was previously doing a divide BEFORE clamping to [0.25,1]. I did see some decent improvement by dividing AFTER clamping to [0.25,1], as the divpd instruction is faster on powers of 2. This improvement is good enough that the rcps+newton solution is not better.
|
|||
11 Jun 2013, 20:32 |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.