flat assembler
Message board for the users of flat assembler.

Index > Main > 256 bit SSE aka AVX

Goto page 1, 2, 3, 4, 5  Next
Author
Thread Post new topic Reply to topic
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20432
Location: In your JS exploiting you and your system
revolution 07 Apr 2008, 00:08
Read about it here:

There is even a programming manual:
Post 07 Apr 2008, 00:08
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 07 Apr 2008, 00:17
Wow! I wonder how many AVX registers there will be... (too lazy to read manual)

edit: They've got a typo in the PDF bookmarks, Chapter 5.1 spells "instruction" as "instructlon" Smile I think they're in a hurry

edit2: WOW! 16 YMM registers all 256-bits apiece, that'll be fun. That is, for 64-bit processors only Sad SHould be the standard come 2010.
Post 07 Apr 2008, 00:17
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4061
Location: vpcmpistri
bitRAKE 07 Apr 2008, 01:06
...looks like they've started at the wrong end of their naming convension, lol. Very Happy (...XMM, YMM, ZMM, ...now they are stuck. Maybe, AAMM, or ANM is next?)

I'm VEXed by the new prefix (multiple bytes!). Laughing

What is with all the clearing of the upper bits during smaller operations? I'd rather use the upper bits for something useful, or explicitly clear them if needed.

Built in AES is bloat, imho.

Many instructions just condense a couple opperations into one (with a longer encoding) - seems kind of silly. (i.e. VBROADCAST)

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 07 Apr 2008, 01:06
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20432
Location: In your JS exploiting you and your system
revolution 07 Apr 2008, 01:26
From what I gather the encoding is not much longer. It uses 0xC4 (3 byte instructions) and 0XC5 (two byte instructions), replacing LES and LDS. Besides it doesn't matter much because you can operate on double the data so even if is was a bit longer it would still be a net gain.

The major problem will be what to call the m256 data type?

The existing DQWORD (ugly as it is) could be OWORD (much nicer).

But for 32 bytes?
DDQWORD?
QQWORD?
DOWORD?
HWORD (hexadec-word)?
Post 07 Apr 2008, 01:26
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 07 Apr 2008, 01:38
They should be shot for having AES instructions. Nevermind the fact that it only supports AES-128, (sry for using 'only'), but it takes upwards of 17K of lookup tables to accomplish a fast AES software implementation!

Why did they have to announce AES instructions right when I'm devoting all of my time to AES software? That's just not funny. They're definitely out to get me.
Post 07 Apr 2008, 01:38
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20432
Location: In your JS exploiting you and your system
revolution 07 Apr 2008, 01:42
AlexP wrote:
They should be shot for having AES instructions. Nevermind the fact that it only supports AES-128, (sry for using 'only'), but it takes upwards of 17K of lookup tables to accomplish a fast AES software implementation!

Why did they have to announce AES instructions right when I'm devoting all of my time to AES software? That's just not funny. They're definitely out to get me.
Yep, it looks like they saw you doing the AES on this board and said to themselves, "let's get those asm buggers, our CPU is not allowed to be used with asm".
Post 07 Apr 2008, 01:42
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP 07 Apr 2008, 01:44
Smile At least it's only on 64-bit mode, I was afraid I'd have to re-wire my processor to take those guys out
Post 07 Apr 2008, 01:44
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 07 Apr 2008, 02:05
if new data names was based on exponents?
Code:
2^5 = 32
fIve
Iword => 256 bits or 32 bytes

then, 
2^6 = 64
Six
Sword => 512 bits or 64 bytes

2^7 = 128
seVen
Vword => 1024 bits or 128 bytes

2^8 = 256
Eight
Eword => 2048 bits or 256 bytes

2^9 = 512
Nine
Nword => 4096 bits or 512 bytes

no need of 2^10 bytes before a long time i hope. Wink
    

would be pretty good if Intel stop increasing the number of bits, and start to enhance the capabilities of the µP, with a very very good mechanism else than multicore (4GBytes L1 cache?, 4GHz FSB?). the 32 bits versions of µP are very good for programming (not like the ugly 16 bits) and didn't show all it's capabilities. just we know the limits.
Post 07 Apr 2008, 02:05
View user's profile Send private message Visit poster's website Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 07 Apr 2008, 02:28
Here I am thinking about buying a new notebook with Penryn CPU (6MB L2) + Blueray, they're only just starting to appear here, and Intel are already talking about releasing Nehalem towards end of 2008! Then there's USB 3.0 (4.8 Gbit/s) coming in 2009. How do you keep up! Evil or Very Mad

256bit data? Extended WORD (EWORD) maybe. The only problem I can see with that is the next natural progression would be FWORD and that would cause trouble posting code as the FWORD is banned on a lot of sites. Rolling Eyes
Post 07 Apr 2008, 02:28
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20432
Location: In your JS exploiting you and your system
revolution 07 Apr 2008, 02:46
Erm, there is already an fword data type since the 8086 days.
Post 07 Apr 2008, 02:46
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20432
Location: In your JS exploiting you and your system
revolution 07 Apr 2008, 02:50
AlexP wrote:
Smile At least it's only on 64-bit mode, I was afraid I'd have to re-wire my processor to take those guys out
It can also be used in 32bit modes (with some limitations, 8 registers only etc.)
Post 07 Apr 2008, 02:50
View user's profile Send private message Visit poster's website Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 07 Apr 2008, 03:40
Alphonso wrote:
the FWORD is banned on a lot of sites.
revolution wrote:
Erm, there is already an fword data type since the 8086 days.
It seemed funny at the time when I wrote it. Oh well Sad
Post 07 Apr 2008, 03:40
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20432
Location: In your JS exploiting you and your system
revolution 07 Apr 2008, 03:45
The fword is not used much now. It is an ugly monstrosity of a data type (and a word, hehe). 6 bytes, dword+word=fword.

So go ahead and use the fword as much as you like.

Just for fun:
Code:
jmp fword[hell]    
aka
Code:
goto fucking hell    
Laughing
Post 07 Apr 2008, 03:45
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 07 Apr 2008, 03:55
what would happen to assembly if quantic computers comes to be democratised by intel?

Code:
goto nowhere?
    

what kind of data types? based on powers of ?
Post 07 Apr 2008, 03:55
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20432
Location: In your JS exploiting you and your system
revolution 07 Apr 2008, 04:22
edfed wrote:
what would happen to assembly if quantic computers comes to be democratised by intel?

Code:
goto nowhere?
    

what kind of data types? based on powers of ?
I assume you are referring to quantum computing?

Even a quantum computer still has binary as its basis, called qubits. So, by extension, qubyte, quword, dquword, qquword, dqquword, etc.
Post 07 Apr 2008, 04:22
View user's profile Send private message Visit poster's website Reply with quote
daniel.lewis



Joined: 28 Jan 2008
Posts: 92
daniel.lewis 07 Apr 2008, 06:22
Confused Well, 256-bit is great except they screwed it up. They implemented AES, but can you do things like truncating at a specified byte? No. So these are still only useful for integers, and strings still need to be read in byte by byte (yay for 8-bit computing!)

Unless someone can tell me:

Given an SSE register XMM0, with a value of:

0x2347_5128_7234_5263___0023_2347_7345_1013

How can I efficiently truncate everything after one of {0,52} such that the result will be:

0x2347_5128_7234_5200___0000_0000_0000_0000

My line of thought is to use SSE registers to scan a string for either end of string or some significant token character. The pattern would double in efficiency when applied to AVX.

_________________
dd 0x90909090 ; problem solved.
Post 07 Apr 2008, 06:22
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20432
Location: In your JS exploiting you and your system
revolution 07 Apr 2008, 06:36
daniel.lewis wrote:
Confused Well, 256-bit is great except they screwed it up. They implemented AES, but can you do things like truncating at a specified byte? No. So these are still only useful for integers, and strings still need to be read in byte by byte (yay for 8-bit computing!)

Unless someone can tell me:

Given an SSE register XMM0, with a value of:

0x2347_5128_7234_5263___0023_2347_7345_1013

How can I efficiently truncate everything after one of {0,52} such that the result will be:

0x2347_5128_7234_5200___0000_0000_0000_0000

My line of thought is to use SSE registers to scan a string for either end of string or some significant token character. The pattern would double in efficiency when applied to AVX.
Have a look at SSE4 it has string functions just like you mention.
Post 07 Apr 2008, 06:36
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 07 Apr 2008, 07:48
Wow. They changed instruction format strongly. This really doesn't look like ol' good x86 :-(
Post 07 Apr 2008, 07:48
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 07 Apr 2008, 07:53
it's because they are affraid of FASM power. they want to make ALL in ASM coding IMPOSSIBLE.

TRUE, not a PARANOÏA.

256 bits operands, do do what, please? do we need so big variables?
Machines have still enough power to do all we need for daily usages.

The Moore law is True, but not to be respected for the eternity.
Sounds like a devil action. I hope they will introduce DRM in their µP, and link them to Vista just to show how they are in the wrong way.

Evil or Very Mad

Code:
My ideal µP:

/!\ No native 64 bits please /!\

PIV HT compliant with 64 dwords general registers
F = 4GHz
L1 cache = 2 GBytes code & 2 GBytes data
L2 cache = 4 GBytes 
FSB bandwidth = 4GHz
FANLESS
---------------------------
My ideal MotherBoard & chipset:

/!\ No "designed for Window$" /!\
/!\ No "dumb" BIOS Please     /!\
/!\        BOOT instantlly          /!\

64 GBytes Static RAM with 1 Farad capacitor as battery.
1024 GBytes Static RAM with battery As hard drive
4GHz Graphix With X86/SSE based instruction set. 4000*3000*32 bpp
8* ISA slots        
8* PCI slots 
4* AGP slots
4* PCIe slots                        ALL THESE WITH 4GHz Frequency Limit
8* IDE Slots
8* SATA Slots
1* Floppy Slot
8* 32 bits I/O Open collector connectable to any I/O Port (IN OUT)
8* USB 3.0 connectors
4.1 AUDIO with integrated HIFI Class AB Amplifier
FULL FANLESS

note, it is a pure Electronician-Hacker config. pretty good if it exists. then, i'll enjoy the soldering iron.
    


instead to ADD some not needed feature, why not SHL the existing ones?

Give a non commercial reason. please.

not a pure delirium of utopia land.
PC are PC, nothing else. do they want to Put it in our brain? All is based on SECUTRITY, ANTIVIRUS, ENCRYPTION, BIZNESS, BABYLON????

I hope you will all enjoy a machine like i imagine if it exists one day. Smile
Post 07 Apr 2008, 07:53
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20432
Location: In your JS exploiting you and your system
revolution 07 Apr 2008, 11:13
edfed: What is wrong with 64bit registers? You listed 64GB RAM but you only want a 32bit CPU. Kinda limiting don't ya think?

And why only 4.1 audio, surely you mean 7.1?

I don't like the idea of SRAM as hard-drive replacement. If the battery dies then you lose all your info. I think NVRAM would be better.
Post 07 Apr 2008, 11:13
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3, 4, 5  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.