flat assembler
Message board for the users of flat assembler.

Index > Main > xmm registers

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 10 Nov 2006, 16:07
LocoDelAssembly wrote:
smiddy, do you mean http://softpixel.com/~cwright/programming/simd/cpuid.php ? But, I think that the site assumes that the code gets executed under an OS supporting SSE, maybe doing what you suggest on a boot code will not work so good.


And the OS could not support SSE but as user-mode program you can't modify CR4 so you have to do some tricks to be sure that you will not get a UD exception even if the processor supports SSE. Look at smiddy posts
Post 10 Nov 2006, 16:07
View user's profile Send private message Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 557
smiddy 10 Nov 2006, 16:18
Mark Larson wrote:
That's for the OS to support it. The OS does it on boot up. You don't have to do that yourself in your code.


If you're like me and writing your own OS, then I needed to do it. Thus my testing of it above.

Just and FYI for folks, Bochs sets CR4.Bit 9 upon turnon (bad Bochs), Virtual PC doesn't, VM Ware doesn't, nor does any of my own PC's, it all has to be set (both AMD and Intel CPUs). If you run the EXE on a Windows box it will recongnize if it is already running in protected mode and not try to toggle to CR4.Bit 9 (it won't work and no results are shared).

This brings up another item on my list of things, what the heck would you use them registers for (rhetorically speaking of course)? Well, I get the impression based on this these set of registers allow you to do mutiple 32-bit floating point arithmetic during one call (probably not the best word). I was hoping for 128-bit arethmetic, but upon further investigation it allows for some parallel arethmetic assciated with video and sound or more to the point multimedia items/devices. What are other's take on this? Is what I'm writing wrong? This is what I've gathered thus far in the limited few days and few hours I've reveiwed XMM registers.
Post 10 Nov 2006, 16:18
View user's profile Send private message Reply with quote
Mark Larson



Joined: 04 Nov 2006
Posts: 13
Mark Larson 10 Nov 2006, 16:19
smiddy wrote:
I think all you need to do is:

Code:
   mov eax,1
   cpuid

   test edx,0000001000000000000000000000000b ; SSE bit is 25
   jnz .UseXMM
    


It looks like they are extensions of the FPU ST registers, though I don't know if you are required to init the FPU, but I would assume so based on the extension. I have never used them myself, but would like to...

Here is where I found some information.



I've been doing SSE/SSE2 programming for many years. You do not have to init the FPU unit. I posted this a while ago on c.l.a.x when someone was asking about for of SSE/SSE2 programs.



I've done a lot of SSE and SSE2 programming over the years. I have
an optimization website that goes over some basic tricks to speed up
code with SSE/SSE2 ( along with other tricks).
http://www.mark.masmcode.com/

P4's and up on the Intel side really run SSE/SSE2 code very fast. So
I've used that advantage a lot to make code run extremely fast.

converting a string to a qword using SSE2
http://www.oldboard.assemblercode.com/index.php?topic=4253.msg28940#m...

SSE2 quaternion multiply
http://www.oldboard.assemblercode.com/index.php?topic=3469.0

Mersenne Twister Random Number Generator in SSE2
http://www.oldboard.assemblercode.com/index.php?topic=3565.0

my account on masmforum got messed up ( all these links are for
masmforum). So some messages will say they are from hutch- instead of
marklarson. The way you tell it's the real me, is it'll say "guest"
under "hutch--".

Counting the number of lines in a file using SSE2
http://www.oldboard.assemblercode.com/index.php?topic=2692.msg18800#m...

string copy using SSE2
http://www.oldboard.assemblercode.com/index.php?topic=2632.msg18047#m...

Computing MD5 using SSE2
http://www.oldboard.assemblercode.com/index.php?topic=2921.0

I am working on a raytracer that I haven't finished yet. You can use
scalar SSE code just like FP code ( you don't do stuff in parallel,
it's a single floating point value you are doing an operation on).
Scalar code is faster on a P4. ( not sure about AMD).

http://www.masm32.com/board/index.php?topic=1140.0

line counting again. But I actually have 2 different versions using
2 different algorithms. If you scroll down the second posted one is
done in a non-intuitive manner.

http://www.masm32.com/board/index.php?topic=5434.msg40666#msg40666

_________________
BIOS programmers do it fastest! Wink
Post 10 Nov 2006, 16:19
View user's profile Send private message Visit poster's website Reply with quote
Mark Larson



Joined: 04 Nov 2006
Posts: 13
Mark Larson 10 Nov 2006, 16:23
Madis731 wrote:
Neutral WHat do you mean ??? You don't have to enable xmm registers. They are there all the time. The only problem with OSs not supporting it is that you lose your context when multiple threads access these registers. You can FXSAVE and FXRSTOR, however...


This isn't entirely correst. Windows XP ( and I think 2000) automatically turn on SSE2 support. Linux does the same thing. It has to be enabled or it won't work. It's not the same as FPU or ALU registers. I wrote some code in the BIOS to do a memory test using SSE2. I had forgotten to turn on SSE2 support, and it didn't work. Once I flipped the magic switch it worked great. Older Windows OSes won't be turning it on, because they won't know about it. Same with DOS. So if you have one of those, you will have to turn it on manually if you want to use it.

I am not sure when it got turned on under Windows. So Win98 might support it.

_________________
BIOS programmers do it fastest! Wink
Post 10 Nov 2006, 16:23
View user's profile Send private message Visit poster's website Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 557
smiddy 10 Nov 2006, 17:13
Mark, awesome post! Thanks for the wealth of information. Once I get the opportunity to look at all these links I will.
Post 10 Nov 2006, 17:13
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 10 Nov 2006, 18:58
The latest stable version of OpenWatcom is 1.6.

Quote:

List of changes in OpenWatcom C/C++ 1.4

  • Default behaviour of inline assembler has changed. The CPU optimization level (-4, -5, -6) now implies the available instruction set: -5 implies MMX and 3DNow!, -6 also implies SSE/SSE2/SSE3. Also note that any CPU setting override now reverts to default at the end of each inline assembly block.
  • The CauseWay DOS extender now supports SSE instructions on plain DOS.

List of changes in OpenWatcom C/C++ 1.3

  • Support for SSE, SSE2, SSE3 and 3DNow! instruction sets has been added. Affected tools are the assembler (wasm), as well as all x86 compilers, disassembler and debugger. The debugger now also supports MMX registers formatted as floats (for 3DNow!) as well as a new XMM register window for SSE.
  • Inline assembler directives .MMX, .K3D, .XMM, .XMM2 and .XMM3 are now supported in the _asm as well as #pragma aux style inline assembler interface. Note: .MMX directive is now required (in addition to .586) to use MMX instructions.

Post 10 Nov 2006, 18:58
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.