flat assembler
Message board for the users of flat assembler.

Index > Main > xmm registers

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
rhyno_dagreat



Joined: 31 Jul 2006
Posts: 487
Location: Maryland, Unol Daleithiau
rhyno_dagreat 03 Nov 2006, 19:59
Is there something special I need to do in order to enable XMM registers or are they already usable from the start? Thanks!

-Rhyno
Post 03 Nov 2006, 19:59
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 03 Nov 2006, 20:21
You have to enable it

AMD64 Architecture Programmer’s Manual Volume 2: System programming wrote:

11.3 Enabling 128-Bit Media Instructions
Use of the 128-bit media instructions requires system software
to support SSE, SSE2, and/or SSE3 features, but also the FXSAVE
and FXRSTOR instructions, which are used to save
and restore the 128-bit media state (see “FXSAVE and
FXRSTOR Instructions” on page 354). When these instructions
are supported, system software must set CR4.OSFXSR=1 to let
the processor know that the software uses these instructions.
When the processor detects CR4.OSFXSR=1, it allows
execution of the 128-bit media instructions. If system software
does not set CR4.OSFXSR to 1, attempts to execute 128-bit
media instructions cause an invalid-opcode exception (#UD).
System software must also clear the CR0.EM (emulate
coprocessor) bit to 0, otherwise an attempt to execute a 128-bit
media instruction causes a #UD exception.
System software should also set the CR0.MP (monitor
coprocessor) bit to 1. When CR0.EM=0 and CR0.MP=1, all
media instructions, x87 instructions, and the FWAIT/WAIT
instructions cause a device-not-available exception (#NM) when
the CR0.TS bit is set. System software can use the #NM
exception to perform lazy context switching, saving and
restoring media and x87 state only when necessary after a task
switch. See “CR0 Register” on page 53 for more information.
System software must supply an exception handler if unmasked
128-bit media floating-point exceptions are allowed to occur.
When an unmasked exception is detected, the processor
transfers control to the SIMD floating-point exception (#XF)
handler provided by the operating system. System software
must let the processor know that the #XF handler is available
by setting CR4.OSXMMEXCPT to 1. If this bit is set to 1, the
processor transfers control to the #XF handler when it detects
an unmasked exception, otherwise a #UD exception occurs.
When the processor detects a masked exception, it handles it in
a default manner regardless of the CR4.OSXMMEXCPT value.
Post 03 Nov 2006, 20:21
View user's profile Send private message Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 557
smiddy 03 Nov 2006, 20:28
I think all you need to do is:

Code:
   mov eax,1
   cpuid

   test edx,0000001000000000000000000000000b ; SSE bit is 25
   jnz .UseXMM
    


It looks like they are extensions of the FPU ST registers, though I don't know if you are required to init the FPU, but I would assume so based on the extension. I have never used them myself, but would like to...

Here is where I found some information.
Post 03 Nov 2006, 20:28
View user's profile Send private message Reply with quote
rhyno_dagreat



Joined: 31 Jul 2006
Posts: 487
Location: Maryland, Unol Daleithiau
rhyno_dagreat 03 Nov 2006, 20:32
Thanks y'all.
Post 03 Nov 2006, 20:32
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 03 Nov 2006, 23:40
smiddy, do you mean http://softpixel.com/~cwright/programming/simd/cpuid.php ? But, I think that the site assumes that the code gets executed under an OS supporting SSE, maybe doing what you suggest on a boot code will not work so good.
Post 03 Nov 2006, 23:40
View user's profile Send private message Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 557
smiddy 04 Nov 2006, 02:35
LocoDelAssembly wrote:
smiddy, do you mean http://softpixel.com/~cwright/programming/simd/cpuid.php ? But, I think that the site assumes that the code gets executed under an OS supporting SSE, maybe doing what you suggest on a boot code will not work so good.


I think the combination of what you presented and what I presented would work, I don't know. I need to try it. I was unaware of turning on the control register, so that is where the OS would do it, since I write my own OS, yep that is how I'd do it. For use under Windows or Linux I suspect it may already be turned on you just need to check to make certain, otherwise you'll get an invalid opcode.
Post 04 Nov 2006, 02:35
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 08 Nov 2006, 13:16
Neutral WHat do you mean ??? You don't have to enable xmm registers. They are there all the time. The only problem with OSs not supporting it is that you lose your context when multiple threads access these registers. You can FXSAVE and FXRSTOR, however...
Post 08 Nov 2006, 13:16
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 557
smiddy 08 Nov 2006, 13:36
Madis731 wrote:
Neutral WHat do you mean ??? You don't have to enable xmm registers. They are there all the time. The only problem with OSs not supporting it is that you lose your context when multiple threads access these registers. You can FXSAVE and FXRSTOR, however...


I assume this means if you were to multiprocess you'd have to concern yourself with saving and restore per process those registers, is that right (at least in my own case)? Assuming then that FXSAVE and FXRSTOR requires a memory location to store and retrieve the information for those registers, is that right?

Do you have any examples of using these registers? Any references I seem to find online are not as comprehensive as I'd prefer, and I haven't persued any books on the subject, if there are any.
Post 08 Nov 2006, 13:36
View user's profile Send private message Reply with quote
Goplat



Joined: 15 Sep 2006
Posts: 181
Goplat 08 Nov 2006, 15:57
Madis731 wrote:
Neutral WHat do you mean ??? You don't have to enable xmm registers. They are there all the time. The only problem with OSs not supporting it is that you lose your context when multiple threads access these registers. You can FXSAVE and FXRSTOR, however...


According to Intel manuals, SSE instructions raise an exception if the OS doesn't set the CR4.OSFXSR bit.
Post 08 Nov 2006, 15:57
View user's profile Send private message Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 557
smiddy 08 Nov 2006, 18:13
I'll try to put an example together tonight based on my limited knowledge of the registers and uses to see if I can use them. I will have to test CR4 bit first, if it isn't off, I'll turn it off and then run the opcodes for the xmm registers. Then is it is off, I will turn it on and run them again a print my results here.

Hey Madis731, is there some xmm code in MenuetOS? (I know, why don't I just look, right?)
Post 08 Nov 2006, 18:13
View user's profile Send private message Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 557
smiddy 09 Nov 2006, 02:23
I found this. More to follow.
Post 09 Nov 2006, 02:23
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 09 Nov 2006, 02:56
I'd just finish my own test. http://ellocodelassembler.googlepages.com/SSETest.zip

This was the result:Image

And complete video just for fun Razz http://ellocodelassembler.googlepages.com/IMG_4948.AVI
Post 09 Nov 2006, 02:56
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 09 Nov 2006, 13:30
@Everybody: Sorry, I didn't know setting the bit was so important. I thought they could be used just like any other registers Sad

@smiddy: There is full FPU/MMX/SSE(all) support for MenuetOS 64-bit version, but I can't give you the code unless Ville is allowing it, sorry!

The general set-up is so easy, though, that I'll put some pseudo here:
Code:
fxsave [some_mem_512_this_context] ; The actual fill depends on the CPU
fxrstor [some_mem_512_other_context] ; The other thread's context is restored
    


Btw, the memory locations mustn't be of any type, because FASM doesn't support "the qqqqword" size Razz
Example:
some_mem_512_this_context: rb 512
some_mem_512_other_context: rb 512
Post 09 Nov 2006, 13:30
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 09 Nov 2006, 16:07
Tip: IDAPro gave to me a good listing from MenuetOS of how to detect Long Mode capable processors Wink

BTW, thanks Madis for telling that the source is not publicity available, I was felt very stupid for not found it in the floppy image Razz
Post 09 Nov 2006, 16:07
View user's profile Send private message Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 557
smiddy 09 Nov 2006, 17:47
Bummer, I didn't know that MenuetOS went closed source. I was actually referring to the 32-bit source, which I have a copy of, but didn't find any xmm registers used within it.

@LocoDelAssembly, I liked your test. If you don't mind, when I get the chance I am going to combine it with the Agner Fog code which tests the CPUID bits etcetera. I will also force a CR4 bit change if it isn't done and run a test.

@Madis731, can't that just be:
Code:
...

fxsave [some_mem_512_this_context]        ; The actual fill depends on the CPU 
fxrstor [some_mem_512_other_context]      ; The other thread's context is restored

...

some_mem_512_this_context:   rb (16 * 8)  ; 8 for 32-bit machines 16^2 for 64-bit?
some_mem_512_other_context:  rb (16 * 8)  ; same as above
    


I am of course assuming only 8 registers for xmm only...

AH [EXPLITIVE]POO[/EXPLITIVE], the FXSAVE and FXRSTOR needs 16 bytes by 32, ICK! And a ton of reserved stuff is within that. It saves all the ST or MM and XMMs, along with other stuff. DAH, 512 bytes, thus your comment on qqqqword size. Man am I a dork or what? Very Happy

If you look at the sandpile you get an idea of what has to happen.

Thanks for the information gents (I assuming you're all men, sorry if I'm mistaken, I can't really see you). Wink
Post 09 Nov 2006, 17:47
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 09 Nov 2006, 18:32
Quote:

@LocoDelAssembly, I liked your test. If you don't mind, when I get the chance I am going to combine it with the Agner Fog code which tests the CPUID bits etcetera.

Sure not problem. Just remember that if there is a floppy disk when you execute the .com file the first sector of the floppy disk will be overwritten without warning. You can remove the "makeInstaller" keyword if you want to just produce a 512 bytes binary file.
Quote:
Thanks for the information gents (I assuming you're all men, sorry if I'm mistaken, I can't really see you). Wink

Well at least Madis and me are boys Razz
Post 09 Nov 2006, 18:32
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 10 Nov 2006, 08:04
@smiddy, sure you can see us: http://board.flatassembler.net/topic.php?t=4074&postdays=0&postorder=asc&start=20
and the reserved space is now very little already with 64-bit: http://www.sandpile.org/aa64/fp_new.htm
Post 10 Nov 2006, 08:04
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 557
smiddy 10 Nov 2006, 10:56
Thanks Madis731...I think my photo is around here somewhere, on the same post, man am I losing it or what? http://board.flatassembler.net/topic.php?t=4074&postdays=0&postorder=asc&start=3

Yeah, I think that sandpile site is a great thing...I just never took much time to look at MMX or SSE.
Post 10 Nov 2006, 10:56
View user's profile Send private message Reply with quote
smiddy



Joined: 31 Oct 2004
Posts: 557
smiddy 10 Nov 2006, 11:44
Hi All,

I hope this is usefull to you, enjoy!


Description: Test CPU
--------

This set of files:

BootTest.ASM - Source code of boot sector to boot from the floppy the TESTCPU.EXE program.
BootTest.bin - Compiled BootTest.ASM source, look at source for instructions.
TestCPU.ASM - Source code of T

Download
Filename: TestCPU.zip
Filesize: 8.93 KB
Downloaded: 849 Time(s)

Post 10 Nov 2006, 11:44
View user's profile Send private message Reply with quote
Mark Larson



Joined: 04 Nov 2006
Posts: 13
Mark Larson 10 Nov 2006, 15:54
LocoDelAssembly wrote:
You have to enable it

AMD64 Architecture Programmer’s Manual Volume 2: System programming wrote:

11.3 Enabling 128-Bit Media Instructions
Use of the 128-bit media instructions requires system software
to support SSE, SSE2, and/or SSE3 features, but also the FXSAVE
and FXRSTOR instructions, which are used to save
and restore the 128-bit media state (see “FXSAVE and
FXRSTOR Instructions” on page 354). When these instructions
are supported, system software must set CR4.OSFXSR=1 to let
the processor know that the software uses these instructions.
When the processor detects CR4.OSFXSR=1, it allows
execution of the 128-bit media instructions. If system software
does not set CR4.OSFXSR to 1, attempts to execute 128-bit
media instructions cause an invalid-opcode exception (#UD).
System software must also clear the CR0.EM (emulate
coprocessor) bit to 0, otherwise an attempt to execute a 128-bit
media instruction causes a #UD exception.
System software should also set the CR0.MP (monitor
coprocessor) bit to 1. When CR0.EM=0 and CR0.MP=1, all
media instructions, x87 instructions, and the FWAIT/WAIT
instructions cause a device-not-available exception (#NM) when
the CR0.TS bit is set. System software can use the #NM
exception to perform lazy context switching, saving and
restoring media and x87 state only when necessary after a task
switch. See “CR0 Register” on page 53 for more information.
System software must supply an exception handler if unmasked
128-bit media floating-point exceptions are allowed to occur.
When an unmasked exception is detected, the processor
transfers control to the SIMD floating-point exception (#XF)
handler provided by the operating system. System software
must let the processor know that the #XF handler is available
by setting CR4.OSXMMEXCPT to 1. If this bit is set to 1, the
processor transfers control to the #XF handler when it detects
an unmasked exception, otherwise a #UD exception occurs.
When the processor detects a masked exception, it handles it in
a default manner regardless of the CR4.OSXMMEXCPT value.


That's for the OS to support it. The OS does it on boot up. You don't have to do that yourself in your code.

_________________
BIOS programmers do it fastest! Wink
Post 10 Nov 2006, 15:54
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.