Message board for the users of flat assembler.
> Main > Intel releases Haswell's new AVX2 instructions
See the PDF: http://software.intel.com/file/36945
PDET / PEXT look interesting. Looks like a lot of silicon will be needed to support just these two. I wonder where they are useful?
|13 Jun 2011, 08:16||
Very interesting instructions, I'd love to read the rationale for their implementation.
PEXT is a move bit mask with a bit selector, I can kind of see how this would be convenient.
But PDEP is just an inverse of PEXT for the sake of symmetry...?
Maybe custom error correction/parity bit encoding?
Or perhaps the intended usage is more mathematically motivated like checking whether integer N is a multiple of some polynomial by PEXT the necessary bit patterns and comparing them?
|13 Jun 2011, 16:46||
Remember the undocumented IBTS/XBTS instructions from the first-generation 80386? It seems that they finally landed on this idea again.
|13 Jun 2011, 18:17||
'tables.inc' file uses DW directive
will it satisfy elements growth?
are you planning to use DD instead?
will reserved fields appear by the way too... let it say, for cpu version(.386) control?
why do not you use macro for that purpose?
CPU equ etc1
CPU equ etc2
do you have unrealized/desired, undesired, arguing plans about fasm?
|13 Jun 2011, 18:52||
I recently discovered this document from a blog's comment section (couldn't find it again) where they talked about BMI taking much silicon.
In there you can find that butterfly circuitry will not require more (they even assure that its less) transistors to implement shifts, rotates and those almost all permutations-allowing PDEP/PEXT pair instructions.
I've found that the throughput is actually 1 clock (with 3 clock latency):
instruction u p0 p1 p5 p6 p23 p4 p7 tp l pext/dep r,r,r 1 1 1 3 pext/dep r,r,m 2 1 1 1 3 bextr r,r,r 2 x x x x 1 2 bextr r,m,r 3 x x x x 1 1 2 blsr/i/msk r,r 1 x x 0.5 1 blsr/i/msk r,m 2 x x 1 0.5 1 l/tzcnt r,r 1 1 1 3 l/tzcnt r,m 2 1 1 1 3 bzhi r,r,r 1 x x 0.5 1 bzhi r,m,r 2 x x 1 0.5 1
My guess is that if they would allow pext/pdep to go to any port (0, 1, 5, 6), which means 4 times the transistors, it could have the throughput of a NOP or ADD r,r on Haswell right now.
|09 Jul 2013, 21:41||
< Last Thread | Next Thread >
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.