flat assembler
Message board for the users of flat assembler.

Index > Main > Saving SSE state

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
system error



Joined: 01 Sep 2013
Posts: 670
system error 24 Feb 2015, 23:30
Code:
format pe console
include 'win32ax.inc'

call test_dump
call exit

test_dump:
        push    ebp
        mov     ebp,esp
        sub     esp,512
        lea     edi,[ebp-512]
        fxsave  [edi]           ;???
        ;...
        lea     edi,[ebp-512]
        fxrstor [edi]
        mov     esp,ebp
        pop     ebp
        ret    


Why this one cries Access Violation?
Post 24 Feb 2015, 23:30
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 24 Feb 2015, 23:46
I tested the code on Windows 8 (64-bit).
Post 24 Feb 2015, 23:46
View user's profile Send private message Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 24 Feb 2015, 23:54
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 17:51; edited 1 time in total
Post 24 Feb 2015, 23:54
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 25 Feb 2015, 00:14
HaHaAnonymous wrote:
Quote:

Why this one cries Access Violation?

Intel Manual wrote:

The first byte of the data should be located on a 16-byte boundary.


LOL. I forgot about that. Thanks.

one more question if u don't mind.

why would the putchar (from msvcrt.dll) mess up with the SSE registers? This is insanely annoying because everytime I use it, it clears xmm0 to xmm5 for no reason.
Post 25 Feb 2015, 00:14
View user's profile Send private message Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 25 Feb 2015, 00:22
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 17:51; edited 1 time in total
Post 25 Feb 2015, 00:22
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 25 Feb 2015, 00:36
HaHaAnonymous wrote:
Quote:

why would the putchar (from msvcrt.dll) mess up with the SSE registers?

I do not know. But this may explain: https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx


That is annoying. I don't understand why would a function as simple as a putchar really need to mess up with big fat extended registers. FAIL design! LOL
Post 25 Feb 2015, 00:36
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler 25 Feb 2015, 01:15
Regardless of whether it does or doesn't, you shouldn't depend on it not messing with it unless it is guaranteed by the calling convention.
Post 25 Feb 2015, 01:15
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 25 Feb 2015, 01:47
Tyler wrote:
Regardless of whether it does or doesn't, you shouldn't depend on it not messing with it unless it is guaranteed by the calling convention.


That's a problem, for example, if you are creating a general-purpose library where a simple routine like dispChar (which is central to information retrieval in text-based and string-based system) would have to deal with saving and restoring the SSE state every time. For a 100 routines that depend on one dispChar, one will have to deal with 100 times of saving / restoring the XMMs. This bloatness is contagious and really is a BAD design. I am glad Linux don't share this disease.
Post 25 Feb 2015, 01:47
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 25 Feb 2015, 02:24
Then don't store your data in the XMM registers. There is no sense in saving and restoring it 100 times when instead you can put it in memory once and read it when required.

Your argument could be extended to RAX, or any other register. There comes a point where a trade-off has to be made. If you saved everything across all system calls then things get saved far more often. But the saving is hidden behind the inscrutable OS call routine so people don't realise about all the extra work that is being done. Whether your code saves it, or the OS code saves it, makes no difference to the performance. But if you can have the OS save less things and the user code only save things when needed then you get a performance boost.
Post 25 Feb 2015, 02:24
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 25 Feb 2015, 04:23
revolution wrote:
Then don't store your data in the XMM registers. There is no sense in saving and restoring it 100 times when instead you can put it in memory once and read it when required.

Your argument could be extended to RAX, or any other register. There comes a point where a trade-off has to be made. If you saved everything across all system calls then things get saved far more often. But the saving is hidden behind the inscrutable OS call routine so people don't realise about all the extra work that is being done. Whether your code saves it, or the OS code saves it, makes no difference to the performance. But if you can have the OS save less things and the user code only save things when needed then you get a performance boost.


idk revo.

I can live with the OS use up most gp registers. But involving fat registers such as XMM and YMM just to print a character is totally incomprehensible. You lose your XMM content right after the next unrelated line like this;

Code:
movdqa xmm0, dqword [byebye]
cinvoke putchar     ;----> bye bye XMM0    


putchar is not even an FP routine!

That should not happen because XMMs are specialized registers where user codes should be given more priority.

IMO, XMM registers are for the users / applications. Not for the OS. If the OS take up most of the GP registers, then XMMs should be left alone for user codes to use.
Post 25 Feb 2015, 04:23
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 25 Feb 2015, 04:43
Ok I get it. Calling a putchar from msvcrt is like calling the entire COUNTER-STRIKE game to load! Because they both use SSE registers and they're both MATH-intensive. LOOOLL!
Post 25 Feb 2015, 04:43
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 25 Feb 2015, 06:35
XMM is used for more than just floating point. They can also be used for integer arithmetic/boolean operations and for general data movement. I assume putChar places a character on the screen for viewing? If so then naturally that involves copying the character's bitmap data to the display memory so why not use XMM and do it efficiently?
Post 25 Feb 2015, 06:35
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 25 Feb 2015, 08:34
revolution wrote:
XMM is used for more than just floating point. They can also be used for integer arithmetic/boolean operations and for general data movement. I assume putChar places a character on the screen for viewing? If so then naturally that involves copying the character's bitmap data to the display memory so why not use XMM and do it efficiently?


We are talking about CLI-based character rendering here revo (which I believe have their own Font ROM in video). Not a graphical based MS-Word. BIOS don't need sse register to display a 'C' nor did DOOM engine. They all work efficiently without SSE registers. I am not even calling for a blinking text. Just a plain char to the DOS and I have to go thorough all the entire SSE documentation for that?? LOL
Post 25 Feb 2015, 08:34
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 25 Feb 2015, 08:50
The ROM fonts would only be active in full screen text mode in older versions of Windows OS (note that newer versions don't support this mode). But Windows can also display the console in graphics mode. Perhaps it makes a graphical copy also in case the user turns off full screen text? Anyhow, I don't know the details of what Windows does exactly but it does still follow the calling convention as stated above so there is no error or bug. And we can't go around having some functions follow one set of rules and others follow different sets of rules, it would all get to confusing and disorderly. If you really need to know exactly what Windows uses the lower XMM registers for then you can use a debugger to discover what it is doing.
Post 25 Feb 2015, 08:50
View user's profile Send private message Visit poster's website Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 514
Location: Czech republic, Slovak republic
Feryno 26 Feb 2015, 13:53
system error wrote:
why would the putchar (from msvcrt.dll) mess up with the SSE registers? This is insanely annoying because everytime I use it, it clears xmm0 to xmm5 for no reason.
...
I tested the code on Windows 8 (64-bit)

On x64 ms windows kernel, ring3 switches into ring0 using syscall instruction which transfers execution to KiSystemCall64 no matter 64 bit app or 32 bit app running in compatibility submode of long mode.
On return back from ring0 to ring3, the procedure name is KiSystemServiceExit which at the end executes something like this:
Code:
pxor xmm0,xmm0
pxor xmm1,xmm1
pxor xmm2,xmm2
pxor xmm3,xmm3
pxor xmm4,xmm4
pxor xmm5,xmm5
mov rcx,[rbp+CONTEXT.RIP] ; get RIP pointing after the syscall instruction
mov r11,[rbp+CONTEXT.RFLAGS] ; RFLAGS
mov rbp,r9
mov rsp,r8
swapgs
sysretq
    

Under ms windows x64, nonvolatile xmm registers are xmm6...xmm15. To access xmm8...15 your executable must not be 32 bit, you have to update it to x64.
Post 26 Feb 2015, 13:53
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 26 Feb 2015, 15:09
So I guess that the pxor instructions are a security precaution to ensure no out-of-process information leakage in case those registers were used for something sensitive.
Post 26 Feb 2015, 15:09
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 26 Feb 2015, 15:56
Feryno wrote:
system error wrote:
why would the putchar (from msvcrt.dll) mess up with the SSE registers? This is insanely annoying because everytime I use it, it clears xmm0 to xmm5 for no reason.
...
I tested the code on Windows 8 (64-bit)

On x64 ms windows kernel, ring3 switches into ring0 using syscall instruction which transfers execution to KiSystemCall64 no matter 64 bit app or 32 bit app running in compatibility submode of long mode.
On return back from ring0 to ring3, the procedure name is KiSystemServiceExit which at the end executes something like this:
Code:
pxor xmm0,xmm0
pxor xmm1,xmm1
pxor xmm2,xmm2
pxor xmm3,xmm3
pxor xmm4,xmm4
pxor xmm5,xmm5
mov rcx,[rbp+CONTEXT.RIP] ; get RIP pointing after the syscall instruction
mov r11,[rbp+CONTEXT.RFLAGS] ; RFLAGS
mov rbp,r9
mov rsp,r8
swapgs
sysretq
    

Under ms windows x64, nonvolatile xmm registers are xmm6...xmm15. To access xmm8...15 your executable must not be 32 bit, you have to update it to x64.


Yeah. Actually the code runs perfectly on a 32-bit CPU. The only problem is when I run it on a 64-bit OS. That means now I have to do manual save and restore whenever a string and a math routine cross path.
Post 26 Feb 2015, 15:56
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 26 Feb 2015, 16:05
revolution wrote:
The ROM fonts would only be active in full screen text mode in older versions of Windows OS (note that newer versions don't support this mode). But Windows can also display the console in graphics mode. Perhaps it makes a graphical copy also in case the user turns off full screen text? Anyhow, I don't know the details of what Windows does exactly but it does still follow the calling convention as stated above so there is no error or bug. And we can't go around having some functions follow one set of rules and others follow different sets of rules, it would all get to confusing and disorderly. If you really need to know exactly what Windows uses the lower XMM registers for then you can use a debugger to discover what it is doing.


Regardless of calling convention, I think SSE registers should be left alone, at least in the string or char routines. They can find excuses in DirectX routines, but a putchar?? LOL. Its an overkill.
Post 26 Feb 2015, 16:05
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 26 Feb 2015, 16:11
You need to tell MS about your suggestion. Perhaps they will like it and make a new calling convention and apply it to all string and char routines and then tell everyone to change there C codes for the new convention.
Post 26 Feb 2015, 16:11
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 26 Feb 2015, 16:17
revolution wrote:
So I guess that the pxor instructions are a security precaution to ensure no out-of-process information leakage in case those registers were used for something sensitive.
LOL.
Post 26 Feb 2015, 16:17
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.