flat assembler
Message board for the users of flat assembler.

Index > Main > Great read about Haswell CPU Microarchitecture

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 09 Jun 2013, 12:04
randall wrote:
..as expected...Speed improvement in current (not compiled for a new architecture) programs is from 5% to 20%.

ok, as expected, you say, on normal app; i read docs again, and the only parameter bound to this expectation seems to be
the ROB expansion to 192 entry, +15% above Sandy bridge.

supposing tests are correct there, we should admit the existance of something wrong in the uop cache, i.e in its design or
in programming it from those test software.

it should be 2 loads + 1 store and 5 instructions per cycle !
or the +2 ports come to be practically unused

de facto while using vectors and FMA (where there's practically no i-caches misses) all seems running fine, as announced by Intel.
Cheers,
Very Happy

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 09 Jun 2013, 12:04
View user's profile Send private message Visit poster's website Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 125
Melissa 11 Jun 2013, 20:23
tthsqe wrote:
Does anybody know if vdivpd/vsqrtpd is still split internally (thus doubling the latency of divpd/sqrtpd) on haswell? I noticed this when there was not a 2:1 performance ratio on AVX:SSE code for the mandelbox on sandy/ivebridge. Unfortuately, the mandelbox uses a divide, and this ratio is only 1.3:1. Sad


I have asked in clax newsgroup and got this list:
http://users.atw.hu/instlatx64/GenuineIntel00306C3_Haswell_InstLatX64.txt

Seems that situation is same as for previous generations.
Post 11 Jun 2013, 20:23
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 11 Jun 2013, 20:32
thanks for the info. I was previously doing a divide BEFORE clamping to [0.25,1]. I did see some decent improvement by dividing AFTER clamping to [0.25,1], as the divpd instruction is faster on powers of 2. This improvement is good enough that the rcps+newton solution is not better.
Post 11 Jun 2013, 20:32
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.