Message board for the users of flat assembler.
> Main > xmm registers
Goto page Previous 1, 2
smiddy, do you mean http://softpixel.com/~cwright/programming/simd/cpuid.php ? But, I think that the site assumes that the code gets executed under an OS supporting SSE, maybe doing what you suggest on a boot code will not work so good.
And the OS could not support SSE but as user-mode program you can't modify CR4 so you have to do some tricks to be sure that you will not get a UD exception even if the processor supports SSE. Look at smiddy posts
|10 Nov 2006, 16:07||
Mark Larson wrote:
That's for the OS to support it. The OS does it on boot up. You don't have to do that yourself in your code.
If you're like me and writing your own OS, then I needed to do it. Thus my testing of it above.
Just and FYI for folks, Bochs sets CR4.Bit 9 upon turnon (bad Bochs), Virtual PC doesn't, VM Ware doesn't, nor does any of my own PC's, it all has to be set (both AMD and Intel CPUs). If you run the EXE on a Windows box it will recongnize if it is already running in protected mode and not try to toggle to CR4.Bit 9 (it won't work and no results are shared).
This brings up another item on my list of things, what the heck would you use them registers for (rhetorically speaking of course)? Well, I get the impression based on this these set of registers allow you to do mutiple 32-bit floating point arithmetic during one call (probably not the best word). I was hoping for 128-bit arethmetic, but upon further investigation it allows for some parallel arethmetic assciated with video and sound or more to the point multimedia items/devices. What are other's take on this? Is what I'm writing wrong? This is what I've gathered thus far in the limited few days and few hours I've reveiwed XMM registers.
|10 Nov 2006, 16:18||
I think all you need to do is:
I've been doing SSE/SSE2 programming for many years. You do not have to init the FPU unit. I posted this a while ago on c.l.a.x when someone was asking about for of SSE/SSE2 programs.
I've done a lot of SSE and SSE2 programming over the years. I have
an optimization website that goes over some basic tricks to speed up
code with SSE/SSE2 ( along with other tricks).
P4's and up on the Intel side really run SSE/SSE2 code very fast. So
I've used that advantage a lot to make code run extremely fast.
converting a string to a qword using SSE2
SSE2 quaternion multiply
Mersenne Twister Random Number Generator in SSE2
my account on masmforum got messed up ( all these links are for
masmforum). So some messages will say they are from hutch- instead of
marklarson. The way you tell it's the real me, is it'll say "guest"
Counting the number of lines in a file using SSE2
string copy using SSE2
Computing MD5 using SSE2
I am working on a raytracer that I haven't finished yet. You can use
scalar SSE code just like FP code ( you don't do stuff in parallel,
it's a single floating point value you are doing an operation on).
Scalar code is faster on a P4. ( not sure about AMD).
line counting again. But I actually have 2 different versions using
2 different algorithms. If you scroll down the second posted one is
done in a non-intuitive manner.
BIOS programmers do it fastest!
|10 Nov 2006, 16:19||
WHat do you mean ??? You don't have to enable xmm registers. They are there all the time. The only problem with OSs not supporting it is that you lose your context when multiple threads access these registers. You can FXSAVE and FXRSTOR, however...
This isn't entirely correst. Windows XP ( and I think 2000) automatically turn on SSE2 support. Linux does the same thing. It has to be enabled or it won't work. It's not the same as FPU or ALU registers. I wrote some code in the BIOS to do a memory test using SSE2. I had forgotten to turn on SSE2 support, and it didn't work. Once I flipped the magic switch it worked great. Older Windows OSes won't be turning it on, because they won't know about it. Same with DOS. So if you have one of those, you will have to turn it on manually if you want to use it.
I am not sure when it got turned on under Windows. So Win98 might support it.
BIOS programmers do it fastest!
|10 Nov 2006, 16:23||
Mark, awesome post! Thanks for the wealth of information. Once I get the opportunity to look at all these links I will.
|10 Nov 2006, 17:13||
The latest stable version of OpenWatcom is 1.6.
|10 Nov 2006, 18:58||
|Goto page Previous 1, 2
< Last Thread | Next Thread >
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.