flat assembler
Message board for the users of flat assembler.

Index > Main > 256 bit SSE aka AVX

Goto page Previous  1, 2, 3, 4, 5  Next
Author
Thread Post new topic Reply to topic
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
revolution wrote:
MazeGen wrote:
I always wondered why Intel keeps supporting NEVER documented SETALC instruction. There must be some good reason, but what is the reason?
This may be something we will never find out. Sad


Well, the obvious guess is that either Windows (which 95% of PCs have run at one point) or something inside Intel (their famous compiler?) use it. Besides, I don't think it takes up much silicon for that one measly instruction, so why not keep it??
Post 14 Apr 2008, 20:59
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3055
Location: vpcmipstrm
bitRAKE
I am missing something? Intel hasn't dropped any instructions that I know of. Maybe, someone is talking long mode? I don't recall AMD dropping SALC in long mode. It would have been a good idea to completely clean the ABI for long mode - wonder why AMD opted for such minimal changes.

_________________
¯\(°_o)/¯ unlicense.org
Post 14 Apr 2008, 22:04
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
They dropped LOADALL286 and LOADALL386 as well as some undocumented 386 instructions (IBTS ??) whose encodings were used for later processors. Oh, and don't forget "POP CS", which isn't available either. Razz

And no, they didn't drop SALC, just someone was wondering why they didn't since it was never officially documented. But obviously, people began using it once they discovered it, so dropping it would break something (at least TetrOS). Wink
Post 14 Apr 2008, 22:07
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
bitRAKE wrote:
It would have been a good idea to completely clean the ABI for long mode - wonder why AMD opted for such minimal changes.
More extensive cleanups would have required massive changes to the (already pretty complex?) x86 decoder frontend...

I still believe x64 wasn't the best idea in the world, and that we'd been better off with a somewhat more "sane" architecture (and more powerful, like specifying destination register... ARM style), but with powerful virtualization/emulation hooks.
Post 14 Apr 2008, 22:14
View user's profile Send private message Visit poster's website Reply with quote
chaoscode



Joined: 21 Nov 2006
Posts: 64
chaoscode
i think, a new operating mode would be a good idea,
like a 64 bit mode, a 64 bit RISC mode would be a nice idea.
switchabel by a flag in the Pagingtabels, so that different modes could be used and mixed^^.
Post 15 Apr 2008, 15:42
View user's profile Send private message ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
Be careful what you all ask for because you just might get it, look at the Itanium mess. That was a RISC, load/op/store, dest/source1/source2 architecture. It failed, miserably. Why? Because it wasn't compatible with existing code. Therein lies the difficulty, all that legacy code and the programmers desire not to have to throw it away for something new.
Post 15 Apr 2008, 15:48
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
revolution: the Itanium didn't exactly "fail miserably", it's used for high-end servers and the like... it wasn't ever meant as a desktop replacement (afaik, anyway), and thus the pricetag shows. The original Itanium had x86 hardware emulation, but it was horribly slow - software emulation turned out faster.

Also, Itanium is a pretty funky, as I understand it each opcode is more like an "instruction bundle" - probably somewhat complicated writing a decent compiler for that?

I don't think going entirely RISC (Itanium isn't RISC, btw, it's EPIC Smile) is a good idea either, for the kind of work x86 is doing today, we need the complex SSE instructions for optimal speed. And obviously, all instructions built into the CPU shouldn't be possible to do faster using multiple other instructions (ie, LOOP vs. DEC+JNZ).

I'm currently having consideration wrt. out-of-order vs. in-order execution, paging vs. MTRRs (but lots of them!), whether user/supervisor is enough or if there should be multiple rings, how virtualization should be implemented (Vanderpool and Pacifica aren't perfect, but x86 isn't the easiest platform to virtualize), what facilities one could have for making other-cpu emulation easier, etc etc etc. There's pro and con arguments for most, and it certainly wouldn't be an easy task coming up with a "perfect" CPU design. But x86 is a goddamn patchwork, with so much legacy and so many hacks.
Post 15 Apr 2008, 16:05
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
I predict Itanium will die. It is only still going because Intel don't want to throw away the billions they invested in it. I say it failed because it cannot compete with the current X86 CPUs. It costs more and gives almost nothing extra.

I still see the EPIC thing as RISC in disguise. Just because they stick 3 instructions together and call it a bundle doesn't make it magically something other than RISC. The out-of-order x86 cores can perform equally well as a bundler compiler, and with less expense for the user.

There is no perfect CPU design, different goals and needs mean we need different CPUs. My TV remote doesn't need an Itanium in there!
Post 15 Apr 2008, 16:15
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen
rugxulo wrote:
They dropped LOADALL286 and LOADALL386 as well as some undocumented 386 instructions (IBTS ??) whose encodings were used for later processors. Oh, and don't forget "POP CS", which isn't available either. Razz


LOADALLs were never documented.

rugxulo wrote:

And no, they didn't drop SALC, just someone was wondering why they didn't since it was never officially documented.


AMD documents SALC for long time. Only Intel pretends it doesn't exist (0xD6 is documented as reserved & undefined, never causes #UD)

rugxulo wrote:

But obviously, people began using it once they discovered it, so dropping it would break something (at least TetrOS). Wink

Both Intel and AMD dropped SALC in 64-bit mode. And few more documented instructions are dropped in 64-bit mode.
Post 15 Apr 2008, 16:51
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
MazeGen wrote:

Both Intel and AMD dropped SALC in 64-bit mode. And few more documented instructions are dropped in 64-bit mode.


Well, there is no V86 mode under 64-bit mode either, so 16-bit stuff won't work at all anyways. And that's a bigger "drop" than little ol' SALC. Razz
Post 16 Apr 2008, 01:24
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
rugxulo wrote:
MazeGen wrote:

Both Intel and AMD dropped SALC in 64-bit mode. And few more documented instructions are dropped in 64-bit mode.

Well, there is no V86 mode under 64-bit mode either, so 16-bit stuff won't work at all anyways. And that's a bigger "drop" than little ol' SALC. Razz

...and if you're masochistic enough to want to run 16-bit apps, any CPU that has 64-bit long mode support should be quite fast enough to emulate 16-bit apps Smile

_________________
Image - carpe noctem
Post 16 Apr 2008, 11:20
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
salc ~= sbb al,al
Only a few flags might end up different. So I don't really care for salc, especially when it is not officially documented and might just catch me out one day and make my program crash on some unknown cpu variant.
Post 16 Apr 2008, 13:40
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3055
Location: vpcmipstrm
bitRAKE
It's only really handy for size optimization.

_________________
¯\(°_o)/¯ unlicense.org
Post 16 Apr 2008, 14:21
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
f0dder wrote:

...and if you're masochistic enough to want to run 16-bit apps, any CPU that has 64-bit long mode support should be quite fast enough to emulate 16-bit apps Smile


I wouldn't call it masochistic to run pre-existing 16-bit apps. Besides, I definitely do NOT think DOSBox is "fast enough." It's good, but it could (in theory) be much much better.
Post 21 Apr 2008, 18:04
View user's profile Send private message Visit poster's website Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88
To reply to those who think processors shouldn't have AES and other complicated instructions:

I guess you must hate VAX, as it had instructions to evaluate polynomials and to insert into and delete from queues. Smile

I read somewhere that instruction encodings on VAX ranged from 1 to 50 bytes.
Post 27 Apr 2008, 01:31
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
mattst88 wrote:
I read somewhere that instruction encodings on VAX ranged from 1 to 50 bytes.
I am curious to see what a 50 byte instruction looks like? Are you sure you are not mixing it up with FORTRAN, I know FORTRAN can do polynomial evaluation.
Post 27 Apr 2008, 05:40
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
I just downloaded the instruction set manual for the VAX11/780. It seems it does have polynomial evaluation and queue (linked list) insertion and deletion instructions.

The introductory text say that the instructions vary from 1 to 30 bytes in length. After a quick scan the longest would appear to be 'INDEX', any of the 'move translated characters' or packed arithmetic with six parameters following. But the GOTO instruction takes an arbitrary number of destination operands, the manual does not seem to include those as part of the instruction but if it did you could have some extremely long instruction (i.e. megabytes).
Post 27 Apr 2008, 08:11
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
tom tobias wrote:
bitRAKE wrote:
...Too bad the stack machine idea didn't stay mainstream. ...
Not as far as I am concerned. "Stack" architecture has been obsolete, in my opinion, for four decades already...I despise trying to debug a program on a "stack" based architecture....I am oriented to the opposite direction, everything spelled out, unequivocal, rather than SP - 221.....
revolution wrote:

Indeed letting everyone choose their own opcodes would be complete chaos.
Well, I agree, and would go a step further. The best design is one which ELIMINATES 90% of all instructions, hence, almost all opcodes would be eliminated as well...
I think it is very important to offer the user the opportuntity, NO, the obligation, to define many of the operations now performed by intel's architecture. I believe we need at most a dozen native cpu instructions, from which all of the others can be derived, as needed, in each application. What would be useful is a list of the absolute minimum necessary instructions for the cpu to possess.... any thoughts?
Smile
Only one is necessary for turing completeness; loop.
Have fun. Smile

revolution wrote:
sakeniwefu wrote:
In the end, all opcodes turn into a handful of uops in current Intels and AMDs, don't they? If we could program in uops directly, performance wouldn't get any worse.
The uop coding would expand the code size a great deal. It is a very expanded form of the instructions. Memory overhead would be large.
If you had to inline all of them, sure, it would take a lot of space. If, on the other hand, you could make opcodes out of them, in the same way you make functions out of opcodes, they would only need to be loaded once at the start of the program ^^ maybe modified during runtime sometimes, but probably not often.
Post 02 Aug 2009, 17:07
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88
You responded to a 15 month old thread for... that?
Post 03 Aug 2009, 04:18
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Should I have waited until it turned 18? Confused
Post 03 Aug 2009, 06:35
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.