flat assembler
Message board for the users of flat assembler.

Index > Main > 256 bit SSE aka AVX

Goto page Previous  1, 2, 3, 4, 5  Next
Author
Thread Post new topic Reply to topic
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
yeah, 7.1, will be 8 channels then. the same for input, over sampled @ 10 MHz like some oscilloscopes, 4 line in and 4 differentials inputs.

NVRAM? but it's not "immortal". a mix between nvram and sram would be good.

64GB of ram, with a real-protected-mode of course. not this ugly e64 mode that waste some bytes for nothing.

effective address = segment(16) shl 20 + offset(32)

exactlly like real mode, but with more address space.

this would beat all core2duo or gillette quatro Wink
Post 07 Apr 2008, 11:50
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
bitRAKE wrote:
Built in AES is bloat, imho.

AlexP wrote:
They should be shot for having AES instructions. Nevermind the fact that it only supports AES-128, (sry for using 'only'), but it takes upwards of 17K of lookup tables to accomplish a fast AES software implementation!

IMHO hardware crypto is a decent eonugh idea, if implemented correctly (I hope you're wrong about supporting only 128-bit keys). Just looke at the AES speeds VIA achieved with their "PadLock" stuff, quite better than they could have done in software with the limited processing speed of their CPUs.

OK, so it might seem a bit queer to have hardware crypto in a regular desktop CPU, but it has potential when coupled with full-disk encryption (whether that be Microsoft's BitLocker or the open-source TrueCrypt).

bitRAKE wrote:
...looks like they've started at the wrong end of their naming convension, lol. Very Happy Very Happy (...XMM, YMM, ZMM, ...now they are stuck. Maybe, AAMM, or ANM is next?)
Hehe Smile. Generalize it, R[0-n] for "regular" (integer) registers, RSbitsize[0-n] for SIMD registers...

revolution wrote:
The major problem will be what to call the m256 data type?
We should move away from the xword naming scheme, and use something including the bitsize instead. Very unambiguous when dealing with assembly, where you don't care so much about the interpretation of a variable.

edfed wrote:
256 bits operands, do do what, please? do we need so big variables?
Machines have still enough power to do all we need for daily usages.
Obviously you won't need this for browsing the internet and writing a word document... but see beyond your own nose, and think high-power video/audio processing.

revolution wrote:
I don't like the idea of SRAM as hard-drive replacement. If the battery dies then you lose all your info. I think NVRAM would be better.
The advantage of regular ram is that (currently, at least Smile) it's blindingly fast, and cheaper than flashram. When your computer is plugged into the wall socket, the ram is kept refreshed by the PCI standby voltage, the battery can last 20+ hours in case of poweroff, and you can combine it with a flashram backup device... this already exists, but is prohibitively expensive.

Intel is said to come out with 200MB/s read, 100MB/s write flashdrives at the end of 2008, which is going to be very interesting. Considering that the SRAM based drive offerings are still connected through ATA/SATA and thus can't come even close to utilizing their full potential, intel's new flashram based drive will be an interesting competitor, especially because the storage is nonvolatile, without resorting to batteries and backup drives. Hopefully by 2010 capacity will double, and then we can see prices drop so you don't have to sell a kidney in order to buy one Smile
Post 07 Apr 2008, 11:57
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
edfed: lack of knowledge and too much drugs, not paranoia. Those designers most probably never heard about FASM, and they know x86 assembly better than you ever would.
Post 07 Apr 2008, 12:00
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
edfed wrote:
yeah, 7.1, will be 8 channels then. the same for input, over sampled @ 10 MHz like some oscilloscopes, 4 line in and 4 differentials inputs.

NVRAM? but it's not "immortal". a mix between nvram and sram would be good.

64GB of ram, with a real-protected-mode of course. not this ugly e64 mode that waste some bytes for nothing.

effective address = segment(16) shl 20 + offset(32)

exactlly like real mode, but with more address space.

this would beat all core2duo or gillette quatro Wink
Segments? You mean you think they are a good thing? You're kidding right?

NVRAM (like FRAM or MRAM) will likely hold data for longer than an HDD. Both are not immortal but then neither is the user.

I think the main problem with your ideal PC is the cost. No average joe could not afford it, and those that could afford it would simply buy a super-computer and get better scaling.
Post 07 Apr 2008, 12:01
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
kidding? no way.
there are some advantages in the UGLY RM segmentation.

if made in factories (in poor asia countries Smile ), in large quantities, using the same technology as the future tera scale, then: this kind of computer will cost something like 100 € to make one. after shipping and marketing, this cost will be 2000 €.


Last edited by edfed on 07 Apr 2008, 12:18; edited 1 time in total
Post 07 Apr 2008, 12:15
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
edfed wrote:
there are some advantages in the UGLY RM segmentation.
Hmm, really? Example?
edfed wrote:
if made in factories (in poor asia countries Smile ), in large quantities, using the same technology as the future tera scale, then: this kind of computer will cost something like 100 € to make only one. after shipping and marketing, this cost will be 2000 €.
I think it is already begin done that way, so what is different with your way?
Post 07 Apr 2008, 12:17
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
exemple, in rm, it's really a limit, but it permits to wrap around the segment. it permit to separate physically the segments and ... don't need the protection mechanism and the TLB.
no need of gdt. no need of idt.

just change the production scheme.

there are some theories and practices in industry i cannot explain in english. but modifying the production is a cost at the start, and after, it cost nothing.that's why the electronic components are so cheap.
Post 07 Apr 2008, 12:30
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
For x86, paging is the way to go. Please forget everything about segments or unprotected modes.

If we were to design a completely new CPU, some other method would possible be better - an always-implicit "base register" combined with a decent number of flexible MTRRs. Would mean somewhat less flexibility than paging, but would be a cleaner architecture.
Post 07 Apr 2008, 12:36
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
I refuse to use a processor without a protection mechanism. I want stability, not chaos. RM, with segments, and no protection is just not going to be reliable for people. How to secure your kernel from errant code?
Post 07 Apr 2008, 12:38
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
don't do errant code.
forbit the segment mov in RPM.

fodder: i know the goals of multimedia. i don't ignore it.
revolution and fodder: i know the protection problems.
vid: i know my own ignorance and hope it will not be limited.

but... when technology have choices, why is it always bizness oriented. nowadays, multimedia is ONLY bizness. and i dislike bizness.
Post 07 Apr 2008, 12:47
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Quote:
don't do errant code.
Not an option, we're only humans, and humans make mistakes.

Segments are bad, 100% linear memory space with MTRRs and an always-implicit base-register (loaded per-process on task switch) is a cleaner approach.
Post 07 Apr 2008, 12:54
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
edfed wrote:
but... when technology have choices, why is it always bizness oriented. nowadays, multimedia is ONLY bizness. and i dislike bizness.
Without the business where will the money come from to create the technology? It doesn't come for free, it cost plenty to build a fab (the upfront investment cost), and it costs plenty to pay the designers (the staffing expenses), and it costs plenty to tell the public what you have for offer (the marketing expenses), and it cost plenty to build up stocks (the materials, storage and manufacturing expenses), and probably lots more expenses that I don't know about.
Post 07 Apr 2008, 13:05
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
edfed wrote:
don't do errant code.
Ahh, if only life were so simple where everyone wrote perfect code and no one wrote malware ... dreaming.

Hehe, just use your recent experience with erasing your own HDD to see how easy it can be to to mess things up big time.
Post 07 Apr 2008, 13:07
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
totally agree with you. but i want to dream a bit.

PC and ONLY PC all the winter insitate me to dream about utopist future.

and this is not only dreams.

now, i'll stop trolling about this.
Post 07 Apr 2008, 13:20
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Yes, edfed, it did explicitely say AES-128 is only supported by those instructions. I would like to edit my statement earlier, it takes over 20,000 bytes of lookup tables for a speedy AES routines, nevermind the state or schedules Smile. If you're wondering, the latter half of tables are implementation-specific. I shall release it in due time...

[edit]: I wonder... If those AES instructions only do one round of 128-bit per call (I say 'call' because they're probably very complex), then I guess you could expand it to be AES-256? I didn't read specifics.
Post 07 Apr 2008, 13:44
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
edfed, protection will be required until the design is more fault tolerant. For example, if there were 1024 cores then loosing one wouldn't be a problem - it could be restarted. Of course, protection would be needed at a higher level to ensure a large number of cores don't fail. I think of it like cells in the body - if any one cell dies there is no problem.

Wouldn't it be nice if computers worked similarly - a healthy computer runs very fast while a virii laiden one would still operate, but just slower. Not that different from the way things are now because virii cannot propagate if they take down the machine - rather more important for virii to hide.

If we ignore malicious code greater complexity always leads to errors when the pressures of resource limits force choices. Even a healthy computer increases tasks and collects garbage; or the environment in which it operates changes to the point where it is no longer adequate.

It is wrong to see this as a corporate process - it is a process of all complex systems I am aware of. I just hope my knowledge of it will help me make better decisions.

_________________
¯\(°_o)/¯ unlicense.org
Post 07 Apr 2008, 16:19
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
Except for the
OP DST, SRC1, SRC2 [, IMM/SRC3]
convention AVX seems like a logical extension of SSE.

With SSE4.2 and AVX; Intel is rolling out a lot of new extensions pretty much at once. This seems like it will cause problems for AMD keeping up with Intel's new tech, and problems for developers wanting to use the latest extensions but having different versions between AMD and Intel.

After reading the manual I'm disappointed by ONLY 128bit AES support. Also, the irony of a 256bit extension only supporting 128bit didn't escape me.

Between CRC32, AES and the Carry-less Multiplication it looks like Intel wants a stronger position in embedded devices (Atom 2.0 I guess).

The SSE4.2 and AVX extensions will ensure optimizing compilers will be stumped for at least a few more years.

All Hail Intel's New Vapor-ware!
Post 07 Apr 2008, 17:36
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
I've only briefly looked at it but looks like you define your own key schedule (or use the helper instruction if you want) and then do a 128bit encryption. That allows you to use any key size as defined in the spec, 256, 224, 192, 160 or 128. Seems quite flexible to me.

Of course if you want the 256bit block size then you have to forgo the hardware speed-up.

Personally, I think the 128bit block size is perfectly adequate. You still get the 256bit key size which a lot of currently available software supports.

The other thing to watch out for is whether it is really very much faster than an optimised software solution. I still remember the 80386 days when you could write faster single-precision floating point code with the CPU than could be done on the 80387.
Post 07 Apr 2008, 17:53
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
I just don't like that it USES A 128-bit KEY for a round... I guess that you could do that, but then xor it with another 128-bits of key to achieve a very odd-looking AES-256 routine... Still, I doubt they will have more than the substitution boxes, which means all the matrix multiplications have to be done (nevermind dozen+ xoring). Good luck Intel, you should have put me on your team!
Post 07 Apr 2008, 20:45
View user's profile Send private message Visit poster's website Reply with quote
Remy Vincent



Joined: 16 Sep 2005
Posts: 155
Location: France
Remy Vincent
revolution wrote:
I refuse to use a processor without a protection mechanism. I want stability, not chaos. RM, with segments, and no protection is just not going to be reliable for people. How to secure your kernel from errant code?


HELLO revolution, what do you think of this kind of solutions?
- You buy a processor, then you set you're own OPCODES.
- For example, """JE rel8""" has an OPCODE of 0x74 right ?
- What if you buy you're own processor, setting the OPCODE for """JE rel8""" to 0x01,...
- Ans so on for each OPCODE ?

After having changed the OPCODES of each instructions, WHAT ABOUT running you're programs inside a VMWARE window, that is supposed to be SAFE!!

Would say it's stability, or Would say it's just chaos ???

I'm sorry, but i'm feel like i'm not concerned with things like """secure your kernel from errant code""" are you really ?
Post 07 Apr 2008, 21:11
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.