flat assembler
Message board for the users of flat assembler.

Index > Assembly > SSE5 vs AVX

Author
Thread Post new topic Reply to topic
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
With today's breakfast I was skimming through the documents on AMD SSE5 and Intel AVX extensions^. And the impression I got when looking at them both is that it's actually similar situation as it was with AMD 3DNow! and Intel SSE extensions. They both go in a similar direction in some aspects, although through different and incompatible designs (especially in the instruction encoding), but Intel's solution goes much further and also extends the already existing instructions.

To show the analogy as I see it: AMD 3DNow! added some MMX-like instructions that operated on floating point values, and so did SSE, which at the same time introduced the 128-bit registers and extended all the MMX instructions so that they could operate not only on old 64-bit MMX registers, but also new 128-bit SSE ones. The MMX instructions were covered, but floating-point 3DNow! instructions (for obvious reason that they were never implemented in Intel processors) were not. Also instruction encoding for 3DNow! was very specific (0F 0F opcode with additional opcode byte at the very end of instruction) and not followed by SSE. For this reason 3DNow! has never evolved anymore, and AMD chose not only to implement SSE into its processors, but also to add its own new instructions (like SSE4a) in the standards introduced by SSE.

Now with SSE5 AMD introduces the three/four operand instructions instead of standard destination-source scheme (it achieves it with a DREX byte in instruction code). Which is a nice thing, however AVX not only does the same for all the already existing SSE instructions, but also extends them with an option of operating on new 256-bit registers (and with the new VEX prefix technology it achieves it without making instruction code any longer). And since SSE5 because of its completely different encoding scheme again is not likely to evolve after AVX jumps in, I predict that the same fate awaits it, as the one 3DNow! suffered.

Well, that's just some thoughts I wanted to share with you. Smile
___
^ I plan to implement them in the 1.69.x development line, with final release 1.70 in mind. Thus I started thinking about it already, because the notable additions to fasm require some research and design phase, which usually happens for weeks or months just inside my head.
And my first impression is that, even though it looks as a quite serious extension, it should not be a big deal for fasm - just adding some new table items and a dozen of new routines should be enough. No existing code base may need to be modified, and thus it is not likely that adding those new instruction will break some of the existing ones - quite a good news.
Post 17 Feb 2009, 18:35
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17342
Location: In your JS exploiting you and your system
revolution
We haven't even seen any silicon yet and already you will support the new instructions. Now you are getting ahead the chip makers.

It is going to be a while till we see AVX silicon. No need to rush the project.
Post 17 Feb 2009, 18:42
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
revolution wrote:
No need to rush the project.
Check out how long it took me to (almost) finish the 1.67.x development line. Certainly I don't rush. Wink That thing about the future roadmap for fasm was just a sidenote.
Post 17 Feb 2009, 18:59
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2914
Location: [RSP+8*5]
bitRAKE
Can't wait to read your elegant solution. Intel has an AVX emulator. So, you'll be able to test without silicon. Maybe my next laptop will be AVX capable - I should start saving now. Very Happy
Post 18 Feb 2009, 02:42
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17342
Location: In your JS exploiting you and your system
revolution
I think one major difference that can be seen between Intel and AMD is that AMD only incrementally increase, whereas Intel will create new.

Look at X86-64 versus Itanium. AMD only extended the existing instructions with x86-64. Intel completely changed the system for Itanium.

Look at 3Dnow! versus SSE. AMD only extended the use of the existing FPU registers. Intel created a whole new set of registers for SSE.

Look at SSE5 versus AVX ...

Now you see the pattern. AMD extend, Intel expand. But that doesn't mean one method is better than the other. Clearly Intel lost with the Itanium and later it won with the SSE. Now who will win with SSE5/AVX? Tomasz has predicted a win for AVX. Anyone wish to predict a win for SSE5?
Post 18 Feb 2009, 02:58
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2914
Location: [RSP+8*5]
bitRAKE
Intel would need to adopt SSE5 - why would they do such a thing?
Post 18 Feb 2009, 03:03
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen
For AVX, here is latest revision 004 of the manual: http://software.intel.com/file/8418

Agner Fog wrote interesting comparison of SSE5 vs. AVX.
Post 18 Feb 2009, 09:11
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
MazeGen wrote:
For AVX, here is latest revision 004 of the manual: http://software.intel.com/file/8418

Agner Fog wrote interesting comparison of SSE5 vs. AVX.


And yet he proclaims AVX the "winner" even though neither is in hardware yet. Weird.
Post 18 Feb 2009, 21:51
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
bitRAKE wrote:
Intel would need to adopt SSE5 - why would they do such a thing?


Compatibility? Fun? Bored? (I dunno, highly doubt they'd do it since they never copied 3dnow! ... and yes, I know, SSE1, blah blah blah.)
Post 18 Feb 2009, 21:52
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17342
Location: In your JS exploiting you and your system
revolution
asmcoder wrote:
1 simple question, why use mmx, sse, fpu?
Because the world is full of real numbers. Integers don't fit everything. How to store 3.1415826535 in an integer?
Post 18 Feb 2009, 22:01
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
There's a hundred ways to do the same thing. But when somebody tells you "this way is better than that way", it's usually political or marketing bias, especially with computers. Everybody has to reinvent the wheel just so they can patent it and get loads more money. If it ain't new, they can't sell it as well.

FPU is mostly for scientific stuff.
MMX / 3dnow! was for 3d gaming and multimedia.
SSE / SSE2 was for faster maths and gaming.
SSE3 / SSE4.1 / SSE4.2 / SSE4a are marketing gimmicks.
SSE5 and AVX are (so far) vaporware.

Corrections welcome. Smile
Post 18 Feb 2009, 22:20
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
revolution wrote:
asmcoder wrote:
1 simple question, why use mmx, sse, fpu?
Because the world is full of real numbers. Integers don't fit everything. How to store 3.1415826535 in an integer?
Just like any other integer, but with location of decimal point prepended(or stored in another integer)?


rugxulo wrote:

SSE5 and AVX are (so far) vaporware.

Corrections welcome. Smile
AVX will only be vaporware if it isn't released in 2010(or sooner), and SSE5 if not in 2011(or sooner)..
Post 18 Feb 2009, 22:47
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
I read through all the Agner Fog's discussion with Intel people and I really got smarter. One thing the both parties seem to blaim is Microsoft, but because its so big, they can't make great changes to their ISA. It would break many M$ drivers etc.
Well, Intel is partly guilty, because what they should have done is sit behind a round table and discuss advances in the next few years. They would have saved at least Windows 7...or maybe they still can!?
I really like what AVX is all about. When SSE (pick a number) where just "bugfixes" or "features" to the already good architecture, AVX seems to be really thought through and on the plus side, it updates all SSE instructions to use wider registers (except the instructions where SSE addresses MMX).

Vaporware? I don't think so. I tend to agree with Agner Fog here and say: "Win for the AVX"; "Intel:AMD score 2:1" Smile
Post 19 Feb 2009, 09:11
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
So, definitely, Intel won with this. There is no longer the SSE5 as it was proposed at first.
http://en.wikipedia.org/wiki/XOP_instruction_set
Post 20 Sep 2009, 23:15
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Do we *really* need new instructions? Why can't they just make the CPU more intelligent to parallelize instructions better (more deep)? Like when using 2 SSE being the same as 1 AVX or something like that.

You barely find programs using SSE2!

@revolution: what you said about floating point and real numbers made me giggle. Floats aren't for fractions, fixed point handles that much more intuitively (how many people use scientific notation for fractions? Wink). Floats are for a high dynamic range, as implied by the scientific notation.

_________________
Previously known as The_Grey_Beast
Post 21 Sep 2009, 22:29
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17342
Location: In your JS exploiting you and your system
revolution
Borsuc wrote:
Why can't they just make the CPU more intelligent to parallelize instructions better (more deep)? Like when using 2 SSE being the same as 1 AVX or something like that.
Well I'm sure if it was so simple that someone would have done it already. AVX doubles the available register storage space. That can't easily be emulated by a half-size register set instruction encoding.
Post 22 Sep 2009, 09:47
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
No I meant why can't they do this:

Code:
add eax, 5
add ebx, 5    


and transform it into

Code:
add rax, 5    


of course rax wouldn't actually exist in this case. Instead of having one "special" "double-size" instruction, is it so hard to just look ahead for the NEXT instruction and see if it's the same (e.g: addition), and use the same unit as for the "double-size" instruction, but put the results in for instance 'ebx' above?

And of course, like I said, rax wouldn't even exist anyway (well it is useful for addressing but AVX isn't Wink).

_________________
Previously known as The_Grey_Beast
Post 23 Sep 2009, 18:26
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17342
Location: In your JS exploiting you and your system
revolution
Borsuc: Do you realise just how much silicon it would take to have the CPU recognise such things? Perhaps even at the expense of having to reduce the clock frequencies to allow for all the extra circuitry to complete its tasks. Nothing comes without cost and what you propose would cost a lot in terms of power usage and time delays taken (not to mention the engineering dollar costs also).
Post 24 Sep 2009, 00:36
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
With x86 we have always had the problem that we are not able to "look ahead" because of the variable length instructions (upto 15 bytes). That is a O(N). What we can gain from "RISC-like" architectures, is that if the instructions are made the same size or at least easily predictable, the lookup can be anywhere between O(1) and O(log2N). N here is the bytecount (not the instruction count) although I don't know if it changes algorithmically anything.

Anyway its always easier to add a prefix (like REX) to tell that now its i.e. RAX we use and if you omit the prefix, just use EAX.
If you would prefix an SSE instruction with VEX, then you would be using ymm0..15 registers instead of xmm0..15.

An example: ABC tell CPU to add 1 to EAX, RABC tells the CPU to add 1 to RAX. What you suggest is CPU to optimize the sequence ABCABC, but that makes it 2 bytes longer and don't forget that CPU has to look up each byte and decypher it and then has to CMP one ABC to the other ABC if they are equal. And if they are not? That's wasted cycles. You would have to have double the energy. That is pointless.
Post 24 Sep 2009, 06:29
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.