flat assembler
Message board for the users of flat assembler.

Index > Windows > are the avx ymm registers saved in context structure?

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 21 May 2011, 23:37
I am trying to update fdbg so that I can debug avx programs, but I am having trouble reading the upper 128 bit the ymm registers.
Does anyone know where they are?
Post 21 May 2011, 23:37
View user's profile Send private message Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 22 May 2011, 10:31
Don't know if this helps as you still need to know the memory location of xsave[opt].

Code:
format PE GUI 4.0
include 'win32a.inc'

;-----------------------------------------------
section '.text' code readable executable
;===============================================
                                        ; check for AVX b4 running this!
        vlddqu  xmm0,[xmmreg]           ; 128bit
        vlddqu  ymm1,[ymmreg]           ; 256 bit
        mov     eax,-1                  ; lazy, not the right way
        mov     edx,-1                  ; Razz
        xsave   [Reg]                   ; save + extended states

        cinvoke wsprintf,Buff,wsformat,\
                dword[Reg+160],dword[Reg+164],dword[Reg+168],dword[Reg+172],\              ; xmm0
                dword[Reg+576+16],dword[Reg+576+20],dword[Reg+576+24],dword[Reg+576+28],\  ; ymm1 upper
                dword[Reg+176],dword[Reg+180],dword[Reg+184],dword[Reg+188],\              ; ymm1 (lower [xmm1])
                dword[Reg+576+16],dword[Reg+576+20],dword[Reg+576+24],dword[Reg+576+28]    ; ...  (upper)
        invoke  MessageBox,0,Buff,Tit,0

exit:

        invoke  ExitProcess,0



;-----------------------------------------------
section '.data' data readable writeable
;===============================================
align 64
  Reg:
                rd 1000h                ; being lazy again Very Happy
  Tit           db 'XSAVE',0
  wsformat      db 'xmm0',10,'0x%016I64X%016I64X',10,10
                db 'ymm1 upper[255:128]',10,'0x%016I64X%016I64X',10,10
                db 'ymm1',10,'0x%016I64X%016I64X%016I64X%016I64X',0
  Buff          rb 200h
align 64
  ymmreg:
                dq 1111222233334444h,5555666677778888h,99990000aaaabbbbh,0ccccddddeeeeffffh
  xmmreg:
                dq 1122334455667788h,9900aabbccddeeffh

;----------------------------------------------
section '.idata' import data readable writeable
;===============================================

     library kernel32,'KERNEL32.DLL',\
               user32,'USER32.DLL'

             include 'api\kernel32.inc'
             include 'api\user32.inc'    


Description: From chapter 13.10.4 of 253668.pdf (issue 38)
Filesize: 11.77 KB
Viewed: 14487 Time(s)

xsave.png


Post 22 May 2011, 10:31
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 22 May 2011, 11:50
Yes - I read the documentation and am aware of where they should be. The problem is that after a call to
GetThreadContext thead_handle, context_struture
the ymm fields are 0 in the structure even though the registers in the debugee containe non zero values. Is a different api function required?
Post 22 May 2011, 11:50
View user's profile Send private message Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 22 May 2011, 13:14
Post 22 May 2011, 13:14
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 22 May 2011, 13:55
Been there too.
I would like to know how these functions work (arguments, return values):
InitializeContext
CopyContext
but I couldn't find any documentation.
Could it be that the context structure needs to be intialized in a certain way so that the ymm states will be filled in upon a call to GetThreadContext?
I guess I will try experimenting by guessing the function parameters .........

These ones probably don't do anything that cpuid can't:
LocateXStateFeature
GetXStateFeaturesMask
SetXStateFeaturesMask
Post 22 May 2011, 13:55
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7103
Location: Slovakia
vid 22 May 2011, 18:33
Do you call GetThreadContext with CONTEXT_FULL flag set? Check out Windows SDK headers for other CONTEXT_### flags.
Post 22 May 2011, 18:33
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 23 May 2011, 00:51
vanilla fdbg uses CONTEXT_ALL which is giving me the problems discussed above. I am in the process of downloading the Windows SDK right now - I hope it has all of the answers

definitions:
Code:
CONTEXT_AMD64                      = 100000h
CONTEXT_CONTROL                    = CONTEXT_AMD64 or 01h
CONTEXT_INTEGER                       = CONTEXT_AMD64 or 02h
CONTEXT_SEGMENTS              = CONTEXT_AMD64 or 04h
CONTEXT_FLOATING_POINT                = CONTEXT_AMD64 or 08h
CONTEXT_DEBUG_REGISTERS               = CONTEXT_AMD64 or 10h
CONTEXT_FULL                  = CONTEXT_CONTROL or CONTEXT_INTEGER or CONTEXT_FLOATING_POINT
CONTEXT_ALL                   = CONTEXT_CONTROL or CONTEXT_INTEGER or CONTEXT_SEGMENTS or CONTEXT_FLOATING_POINT or CONTEXT_DEBUG_REGISTERS    
Post 23 May 2011, 00:51
View user's profile Send private message Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 516
Location: Czech republic, Slovak republic
Feryno 23 May 2011, 07:38
looking hardly for header files, still unable to reach them (SDK, WDK are about 1 year old, from RTM version, but not for SP1 yet)

disassembled something,
to obtain ymm, the mask is perhaps
CONTEXT_AMD64 or 20h
or perhaps
CONTEXT_AMD64 or 40h

but it is even more complicated - the link posted by Alphonso
Post 23 May 2011, 07:38
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2138
Location: Estonia
Madis731 23 May 2011, 08:23
tthsqe - a dumb question: Do you happen to have a non-Windows 7 OS
wiki wrote:

Operating system support

AVX adds new register-state through the 256-bit wide YMM register-file, so explicit operating system support is required to properly save & restore AVX's new registers between context switches. The following operating system versions will support AVX:

* Linux: supported since kernel version 2.6.30,[2] released on June 9, 2009.[3]
* Windows: supported in Windows 7 SP1 and Windows Server 2008 R2 SP1.[4]; hotfix 2517374 available for non-SP1 version of Windows Server 2008 R2.[5];
Post 23 May 2011, 08:23
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 516
Location: Czech republic, Slovak republic
Feryno 23 May 2011, 12:45
it is possible to download win 2008 server R2 SP1 directly from microsoft
it expires in 10 days
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=ba571339-5436-4cf5-9c37-6ed7dab6f781&displaylang=en
if you patch bootsector (boot via BIOS, not EFI boot) and you don't let the OS run for more than few hours then you have always the same date and OS never expires
it requires only to patch about 30 bytes of bootsector to set the same date every boot - by programming CMOS ports - e.g. overwrite some useless string in bootsector with such useful instructions
you can't sort new files by date then, but...
Post 23 May 2011, 12:45
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 24 May 2011, 02:44
Ok. I found in the SDK (7.1):

CONTEXT_XSAVE = CONTEXT_AMD64 | 0x020

So I do
Code:
mov [ThrdContext.ContextFlags],CONTEXT_ALL or 0x20    

and then call GetThreadContext. According to the SDK this should return a CONTEXT_EX structure. I do not know the size of the structure so I reserved 8 kbytes for it. The problem is that this 8 kbytes is the same whether or not I include the flag 0x020 or write different values to the ymm registers.

I am just going to assume it is broken and wait for more documentation. Sad
In the mean time, I'll improve other parts of fdbg.
Post 24 May 2011, 02:44
View user's profile Send private message Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 516
Location: Czech republic, Slovak republic
Feryno 24 May 2011, 07:34
this is correct as you already wrote:
CONTEXT_XSAVE = CONTEXT_AMD64 | 0x020

for 32 bits (32-bit win or maybe for 32-bit emulated subsystem of 64-bit win?) the mask is 40h

try these extra things:
zero the whole xsave area
set the first byte of the area (= header) to value of 7 (111b = enabled_YMM or enabled_XMM or enabled_FPU) it is the byte +512 after FPU/SSE (=512 bytes after MXCSR)

I don't plane to obtain AVX CPU, but studied this feature a lot, I'll try to help you as much as possible (have installed AVX capable win, may trace kernel and find out some secrets, it will be a bit complicated with improper CPU, but may try it anyway)
the bit 2. of the first byte of xsave area must be set to 1 for allowing the XSAVE instruction to save YMM_H into the area (high halves of YMM, for Intel it should be +64 bytes after the xsave area header = 64 bytes after the value you set to 7)
the bit 1. set to 1 to store XMM (low halves of YMM)
the bit 0. set to 1 to store FPU
the mask is then 111b = 7
put the byte 7 into the first byte of the xsave area which is +512 bytes after FPU area (MXCSR is at offset 0 of FPU area), the rest of xsave area should be zeroed
if that doesn't help, the perhaps kernel32.dll GetXStateFeaturesMask then set bit 2. of the result to 1 (maybe better do the OR with value of 7 to be sure) and then it using SetXStateFeaturesMask

do you have AMD or Intel CPU?
Post 24 May 2011, 07:34
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 25 May 2011, 00:28
I modified the calls to GetThreadContext as:
Code:
align 16
GetThreadContext_debuggee_CONTEXT_ALL:
; input: ECX dwThreadId
;        ThreadContext buffer
; output:set Carry Flag if error
       push    rax rcx rdx r8 r9 r10 r11
   sub     rsp,8*(4+0)
 
virtual at rdx
ThrdContext       CONTEXT64
end virtual

        mov     eax,ecx
     call    TID2hThread
 jc      GetThreadContext_debuggee_CONTEXT_ALL_epilogue
      mov     rcx,rdx
     lea     rdx,[ThreadContext]
 mov     dword[ThrdContext.ContextFlags],CONTEXT_ALL or 0x20
 xor     eax,eax
     mov     qword[ThrdContext.FltSave+512+8*0],rax  ; clear the header of X_SAVE
        mov     qword[ThrdContext.FltSave+512+8*1],rax  ; (the first 64 bytes)
      mov     qword[ThrdContext.FltSave+512+8*2],rax  ;
   mov     qword[ThrdContext.FltSave+512+8*3],rax  ;
   mov     qword[ThrdContext.FltSave+512+8*4],rax  ;
   mov     qword[ThrdContext.FltSave+512+8*5],rax  ;
   mov     qword[ThrdContext.FltSave+512+8*7],rax  ;
   mov     qword[ThrdContext.FltSave+512+8*8],rax  ;
   add     eax,7
       mov     byte[ThrdContext.FltSave+512],al        ; enabled_YMM | enabled_XMM | enabled_FPU
   
    call    qword [GetThreadContext]


    push rax
    push  rsi       
        invoke  CreateFileA,write_filename,GENERIC_WRITE,0,0,CREATE_ALWAYS,0,0
           mov  rsi,rax
        invoke  WriteFile,rsi,ThreadContext,8*1024,temp,0
        invoke  CloseHandle,rsi
          pop  rsi
    pop  rax
 

; If the function succeeds, the return value is nonzero. If the function fails, the return value is zero.
   sub     eax,1                           ; set Carry flag if API return 0
GetThreadContext_debuggee_CONTEXT_ALL_epilogue:
     lea     rsp,[rsp+8*(4+0)]               ; this doesn't touch carry flag instead of ADD
     pop     r11 r10 r9 r8 rdx rcx rax
   ret    


But still the whole output file consists of NULL's after the 07h that I set. Seems to not work.

If I knew the parameters to GetXFeatureMask, ..ect I could try this as well.
Also, aren't we overlapping the Vector registers and Special debug control registers part of the context structure?

Code:
struct    CONTEXT64
; Register parameter home addresses.
; N.B. These fields are for convience - they could be used to extend the context record in the future.
 P1Home                  rq      1
   P2Home                  rq      1
   P3Home                  rq      1
   P4Home                  rq      1
   P5Home                  rq      1
   P6Home                  rq      1
; Control flags.
       ContextFlags            rd      1
   MxCsr                   rd      1
; Segment Registers and processor flags.
       SegCs                   rw      1
   SegDs                   rw      1
   SegEs                   rw      1
   SegFs                   rw      1
   SegGs                   rw      1
   SegSs                   rw      1
...
...
...

  R15                     rq      1
; Program counter.
     Rip                     rq      1
; Floating point state.
        FltSave                 XMM_SAVE_AREA32
; Vector registers.
      VectorRegister          rb      16*26
       VectorControl           rq      1
; Special debug control registers.
     DebugControl            rq      1
   LastBranchToRip         rq      1
   LastBranchFromRip       rq      1
   LastExceptionToRip      rq      1
   LastExceptionFromRip    rq      1
ends    


The funny thing is that the upper 128 bits of the ymm registers work correctly (see commented line). Since fdbg doesn't even know about them, something else is at work here.
This is the test code I am running:
Code:
format PE64 GUI 4.0
include 'win64a.inc'

;----------------------------------------------- 
section '.text' code readable executable 
;===============================================

        vmovapd  ymm0,[ymmreg]          
        vmovapd  ymm1,[ymmreg]           
        vmovapd  ymm15,[ymmreg]           
        and     rsp,-32
        sub     rsp,32
        vmovapd  [rsp],ymm0        ; all 32 correct bytes of ymm0 are now visible on the stack in fdbg
        invoke  ExitProcess,0 



;----------------------------------------------- 
section '.data' data readable writeable 
;===============================================
  ymmreg: 
                dq 1111222233334444h,5555666677778888h,99990000aaaabbbbh,0ccccddddeeeeffffh

;---------------------------------------------- 
section '.idata' import data readable writeable 
;=============================================== 

  dd 0,0,0,RVA kernel_name,RVA kernel_table
  dd 0,0,0,0,0

  kernel_table:
    ExitProcess dq RVA _ExitProcess
    dq 0

  kernel_name db 'KERNEL32.DLL',0

  _ExitProcess dw 0
    db 'ExitProcess',0                
Code:
    
Post 25 May 2011, 00:28
View user's profile Send private message Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 516
Location: Czech republic, Slovak republic
Feryno 25 May 2011, 06:13
Hi, please wait, I'll download SDK 7.1 (I saw it months ago but the date of its release confused me - the date with year 2010 is older than the date of releasing SP1 in year 2011 so I ignored it for that reason yet, but it is really version 7.1 which is sign of SP1)
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=35aeda01-421d-4ba5-b44b-543dc8c33a20
http://download.microsoft.com/download/F/1/0/F10113F5-B750-4969-A255-274341AC6BCE/GRMSDKX_EN_DVD.iso

It seems to be necessary to go using official way as ms described and to use these 2 calls
InitializeContext
CopyContext

the low halves of YMM should be aliased in XMM registers
the upper halves should be at higher offsets in context

I'll extract necessary things from SDK and single step kernel and prepare something (to know how to pass params to the above APIs, the structure of extended context). Then you'll have to test that (I have only AVX OS, not AVX CPU).

see this (win initializes a lot of things + does aligning also)

Code:
000000007708D850   48895C2408              ntdll.RtlInitializeContext: mov [rsp+08],rbx
000000007708D855    48896C2410              mov [rsp+10],rbp
000000007708D85A    4889742418              mov [rsp+18],rsi
000000007708D85F    57                      push rdi
000000007708D860    4883EC20                sub rsp,20
000000007708D864  488B442450              mov rax,[rsp+50]
000000007708D869    498BF1                  mov rsi,r9
000000007708D86C  498BE8                  mov rbp,r8
000000007708D86F  488BFA                  mov rdi,rdx
000000007708D872 A80F                    test al,0F
000000007708D874  740B                    jz 000000007708D881
000000007708D876 B9090000C0              mov ecx,C0000009
000000007708D87B    E840FFFFFF              call 000000007708D7C0           ; ntdll.RtlRaiseStatus
000000007708D880      CC                      int3
000000007708D881        4883627800              and qword [rdx+78],00
000000007708D886       4883A2A000000000        and qword [rdx+000000A0],00
000000007708D88E 41B800020000            mov r8d,00000200
000000007708D894    44894244                mov [rdx+44],r8d
000000007708D898    48C7829000000001000000  mov qword [rdx+00000090],00000001
000000007708D8A3   48898298000000          mov [rdx+00000098],rax
000000007708D8AA      48C782A800000004000000  mov qword [rdx+000000A8],00000004
000000007708D8B5   48C782B000000005000000  mov qword [rdx+000000B0],00000005
000000007708D8C0   48C782B800000008000000  mov qword [rdx+000000B8],00000008
000000007708D8CB   48C782C80000000A000000  mov qword [rdx+000000C8],0000000A
000000007708D8D6   48C782D00000000B000000  mov qword [rdx+000000D0],0000000B
000000007708D8E1   48C782D80000000C000000  mov qword [rdx+000000D8],0000000C
000000007708D8EC   48C782E00000000D000000  mov qword [rdx+000000E0],0000000D
000000007708D8F7   48C782E80000000E000000  mov qword [rdx+000000E8],0000000E
000000007708D902   48C782F00000000F000000  mov qword [rdx+000000F0],0000000F
000000007708D90D   488D8F00010000          lea rcx,[rdi+00000100]
000000007708D914      33D2                    xor edx,edx
000000007708D916 E8B555F8FF              call 0000000077012ED0           ; ntdll.memset
000000007708D91B      488B5C2430              mov rbx,[rsp+30]
000000007708D920    B87F020000              mov eax,0000027F                ; FPU control word default value
000000007708D925    41BB801F0000            mov r11d,00001F80               ; MxCsr default value
000000007708D92B       66898700010000          mov [rdi+00000100],ax           ; control word
000000007708D932      48B8708090A0C0D0E0F0    mov rax,F0E0D0C0A0908070
000000007708D93C    4889B7F8000000          mov [rdi+000000F8],rsi
000000007708D943      488B742440              mov rsi,[rsp+40]
000000007708D948    4889AF80000000          mov [rdi+00000080],rbp
000000007708D94F      488B6C2438              mov rbp,[rsp+38]
000000007708D954    488987C0000000          mov [rdi+000000C0],rax
000000007708D95B      44895F34                mov [rdi+34],r11d               ; Context.MxCsr
000000007708D95F     44899F18010000          mov [rdi+00000118],r11d         ; Context.FltSave.MxCsr
000000007708D966     C747300B001000          mov dword [rdi+30],0010000B     ; Context.ContextFlags
000000007708D96D      4883C420                add rsp,20
000000007708D971  5F                      pop rdi
000000007708D972     C3                      ret
    


Code:
0000000077066D40  48895C2408              ntdll.RtlInitializeExtendedContext: mov [rsp+08],rbx
0000000077066D45    4889742410              mov [rsp+10],rsi
0000000077066D4A    57                      push rdi
0000000077066D4B    4883EC20                sub rsp,20
0000000077066D4F  448BD2                  mov r10d,edx
0000000077066D52        4C8BD9                  mov r11,rcx
0000000077066D55 488D542448              lea rdx,[rsp+48]
0000000077066D5A    418BCA                  mov ecx,r10d
0000000077066D5D        498BF0                  mov rsi,r8
0000000077066D60  33FF                    xor edi,edi
0000000077066D62 E8C9FDFFFF              call 0000000077066B30
0000000077066D67       85C0                    test eax,eax
0000000077066D69        0F88CC000000            js 0000000077066E3B
0000000077066D6F 418BD2                  mov edx,r10d
0000000077066D72        81E200000100            and edx,00010000
0000000077066D78    7411                    jz 0000000077066D8B
0000000077066D7A 498D4B03                lea rcx,[r11+03]
0000000077066D7E    4883E1FC                and rcx,FFFFFFFFFFFFFFFC        ; align context at dword (align 4)
0000000077066D82  488DB9CC020000          lea rdi,[rcx+000002CC]
0000000077066D89      EB32                    jmp 0000000077066DBD
0000000077066D8B        410FBAE214              bt r10d,14
0000000077066D90  7315                    jnc 0000000077066DA7
0000000077066D92        498D4B0F                lea rcx,[r11+0F]
0000000077066D96    4883E1F0                and rcx,FFFFFFFFFFFFFFF0        ; align Context at dqword (align 10h)
0000000077066D9A       44895130                mov [rcx+30],r10d               ; Context.ContextFlags
0000000077066D9E      488DB9D0040000          lea rdi,[rcx+000004D0]
0000000077066DA5      EB19                    jmp 0000000077066DC0
0000000077066DA7        410FBAE213              bt r10d,13
0000000077066DAC  7319                    jnc 0000000077066DC7
0000000077066DAE        498D4B0F                lea rcx,[r11+0F]
0000000077066DB2    4883E1F0                and rcx,FFFFFFFFFFFFFFF0
0000000077066DB6    488DB9700A0000          lea rdi,[rcx+00000A70]
0000000077066DBD      448911                  mov [rcx],r10d
0000000077066DC0      8BC7                    mov eax,edi
0000000077066DC2 2BC1                    sub eax,ecx
0000000077066DC4 89470C                  mov [rdi+0C],eax
0000000077066DC7    8B4F0C                  mov ecx,[rdi+0C]
0000000077066DCA    8BC1                    mov eax,ecx
0000000077066DCC F7D8                    neg eax
0000000077066DCE     894708                  mov [rdi+08],eax
0000000077066DD1    8907                    mov [rdi],eax
0000000077066DD3       8D4118                  lea eax,[rcx+18]
0000000077066DD6    894704                  mov [rdi+04],eax
0000000077066DD9    85D2                    test edx,edx
0000000077066DDB        7414                    jz 0000000077066DF1
0000000077066DDD B820000100              mov eax,00010020                ; CONTEXT_AMD64, CONTEXT_XSTATE
0000000077066DE2     4423D0                  and r10d,eax
0000000077066DE5        443BD0                  cmp r10d,eax
0000000077066DE8        7407                    jz 0000000077066DF1
0000000077066DEA C7470CCC000000          mov dword [rdi+0C],000000CC
0000000077066DF1 F644244802              test byte [rsp+48],02
0000000077066DF6       7433                    jz 0000000077066E2B
0000000077066DF8 33D2                    xor edx,edx
0000000077066DFA 488D5F57                lea rbx,[rdi+57]
0000000077066DFE    4883E3C0                and rbx,FFFFFFFFFFFFFFC0
0000000077066E02    448D4240                lea r8d,[rdx+40]
0000000077066E06    488BCB                  mov rcx,rbx
0000000077066E09 E8C2C0FAFF              call 0000000077012ED0           ; ntdll.memset
0000000077066E0E      2BDF                    sub ebx,edi
0000000077066E10 895F10                  mov [rdi+10],ebx
0000000077066E13    8B0425E803FE7F          mov eax,[7FFE03E8]              ; []=00000240
0000000077066E1A       0500FEFFFF              add eax,FFFFFE00
0000000077066E1F    894714                  mov [rdi+14],eax
0000000077066E22    2B07                    sub eax,[rdi]
0000000077066E24       03C3                    add eax,ebx
0000000077066E26 894704                  mov [rdi+04],eax
0000000077066E29    EB0B                    jmp 0000000077066E36
0000000077066E2B        83671400                and dword [rdi+14],00
0000000077066E2F       C7471019000000          mov dword [rdi+10],00000019
0000000077066E36 48893E                  mov [rsi],rdi
0000000077066E39       33C0                    xor eax,eax
0000000077066E3B 488B5C2430              mov rbx,[rsp+30]
0000000077066E40    488B742438              mov rsi,[rsp+38]
0000000077066E45    4883C420                add rsp,20
0000000077066E49  5F                      pop rdi
0000000077066E4A     C3                      ret


0000000077066B30 0FBAE110                        bt ecx,10
0000000077066B34   4C8BC2                          mov r8,rdx
0000000077066B37  7308                            jnc 0000000077066B41
0000000077066B39        F7C180FFFEFF                    test ecx,FFFEFF80
0000000077066B3F   741C                            jz 0000000077066B5D
0000000077066B41 0FBAE114                        bt ecx,14
0000000077066B45   7308                            jnc 0000000077066B4F
0000000077066B47        F7C1A0FFEF27                    test ecx,27EFFFA0
0000000077066B4D   740E                            jz 0000000077066B5D
0000000077066B4F 0FBAE113                        bt ecx,13
0000000077066B53   734C                            jnc 0000000077066BA1
0000000077066B55        F7C1C0FFF727                    test ecx,27F7FFC0
0000000077066B5B   7544                            jnz 0000000077066BA1
0000000077066B5D        41B940000100                    mov r9d,00010040
0000000077066B63    8BC1                            mov eax,ecx
0000000077066B65 BA01000000                      mov edx,00000001
0000000077066B6A    4123C1                          and eax,r9d
0000000077066B6D 413BC1                          cmp eax,r9d
0000000077066B70 740B                            jz 0000000077066B7D
0000000077066B72 B840001000                      mov eax,00100040
0000000077066B77    23C8                            and ecx,eax
0000000077066B79 3BC8                            cmp ecx,eax
0000000077066B7B 7519                            jnz 0000000077066B96
0000000077066B7D        48F70425E003FE7FFCFFFFFF        test qword [7FFE03E0],FFFFFFFC  ; []=0000000000000000
0000000077066B89       7506                            jnz 0000000077066B91
0000000077066B8B        B8BB0000C0                      mov eax,C00000BB
0000000077066B90    C3                              ret
0000000077066B91 BA03000000                      mov edx,00000003
0000000077066B96    4D85C0                          test r8,r8
0000000077066B99  7403                            jz 0000000077066B9E
0000000077066B9B 418910                          mov [r8],edx
0000000077066B9E        33C0                            xor eax,eax
0000000077066BA0 C3                              ret
0000000077066BA1 B80D0000C0                      mov eax,C000000D
0000000077066BA6    C3                              ret    
Post 25 May 2011, 06:13
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 25 May 2011, 08:26
tthsqe wrote:

If I knew the parameters to GetXFeatureMask, ..ect I could try this as well.



Seems to be something like...
Code:
  GetXStateFeaturesMask,pContext,pXState                ; returns true if okay
                                                        ; and XState of the context structure

  SetXStateFeaturesMask,pContext,[Xstate],0             ; Not sure about the last one (0)
                                                        ; Returns true if it thinks it's successful Razz

  LocateXStateFeaturesMask,pContext,[query],pSize       ; Returns position in context structure
                                                        ; and size for query 0,1,2?    


SetXStateFeaturesMask appears to need some other bits set in the Context, maybe this is what InitializeContext does???

Edit: Okay, InitializeContext makes things much easier, at least in 32-bit now have upper bits of ymm's. I'll try 64-bit and post some code. Smile
Post 25 May 2011, 08:26
View user's profile Send private message Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 25 May 2011, 12:43
Okay, hope this helps. Feel free to make corrections Wink
Code:
format PE64 GUI 6.0
include 'win64a.inc'

;-----------------------------------------------
section '.text' code readable executable
;===============================================
        sub     rsp,5*8

        invoke  InitializeContext,InitContext,100040h,AlignedContext,BytesRequired    ; Yes, 100040h.!
        cmp     rax,0
        jz      exit                                                                  ; If 0 then check buffer big enough
                                                                                      ; by check BytesRequired.
        mov     rbx,[AlignedContext]                                                  ; Our Aligned Context pointer.

        invoke  CreateThread,0,0,AVXThread,0,CREATE_SUSPENDED,ThreadID                ; Test thread.
        mov     [hThread],rax
        invoke  ResumeThread,[hThread]
        invoke  Sleep,50                                                              ; Give the thread a chance to switch.

        invoke  SetXStateFeaturesMask,rbx,7,0                                         ; Still need to set this.!
        cmp     rax,0
        jz      exit

        invoke  GetXStateFeaturesMask,rbx,XState
        cmp     rax,0
        jz      exit                                                                  ; Function failed.
        cmp     [XState],7                                                            ; See if now AVX ready.
        jne     exit

        invoke  GetThreadContext,[hThread],rbx
        cmp     rax,0
        jz      exit

        invoke  LocateXStateFeature,rbx,2,Length                                      ; Not so sure about this.
        cmp     rax,0                                                                 ; See what you think.
        jz      exit
        mov     rdi,rax                                                               ; Pointer to start of upper Ymm.

        invoke  wsprintf,Buff,wsformat,\
                qword[rbx+200h],qword[rbx+208h],\                                     ; Lower Ymm0 (Xmm0).
                qword[rdi],qword[rdi+8h]                                              ; Upper Ymm0.
        invoke  MessageBox,0,Buff,Tit,0

 exit:
        invoke  ExitProcess,0

;-----------------------------------------------
 proc   AVXThread
;===============================================
        push    rbx
        mov     rbx,3                                                                 ; Number of loops before terminating AVXThread.
 @@:
        vlddqu  ymm0,[ymmreg]                                                         ; Store some numbers.
        vlddqu  ymm1,[ymmreg]
        vlddqu  ymm2,[ymmreg]
        vlddqu  ymm3,[ymmreg]
        vlddqu  ymm4,[ymmreg]
        vlddqu  ymm5,[ymmreg]
        vlddqu  ymm6,[ymmreg]
        vlddqu  ymm7,[ymmreg]
        vlddqu  ymm8,[ymmreg]
        vlddqu  ymm9,[ymmreg]
        vlddqu  ymm10,[ymmreg]
        vlddqu  ymm11,[ymmreg]
        vlddqu  ymm12,[ymmreg]
        vlddqu  ymm13,[ymmreg]
        vlddqu  ymm14,[ymmreg]
        vlddqu  ymm15,[ymmreg]
        invoke  Sleep,200
        dec     rbx
        jnz     @b
        pop     rbx
        ret
 endp
;-----------------------------------------------
section '.data' data readable writeable
;===============================================
  InitContext           rb 1000h
  ymmreg:               dq 1111222233334444h,5555666677778888h,99990000aaaabbbbh,0ccccddddeeeeffffh
  AlignedContext        dq ?
  BytesRequired         dq 1000h
  XState                dq ?
  Length                dq ?
  YmmUpperStart         dq ?
  ThreadID              dq ?
  hThread               dq ?
  Tit                   db 'XSAVE',0
  wsformat              db 'Ymm0 : %016I64X%016I64X%016I64X%016I64X',0
  Buff                  rb 100h
;----------------------------------------------
section '.idata' import data readable writeable
;===============================================
     library kernel32,'KERNEL32.DLL',\
               user32,'USER32.DLL',\
               kern32,'KERNEL32.DLL'
             include 'api\kernel32.inc'
             include 'api\user32.inc'

     import  kern32,\
            GetEnabledXStateFeatures,'GetEnabledXStateFeatures',\
            InitializeContext,'InitializeContext',\
            GetXStateFeaturesMask,'GetXStateFeaturesMask',\
            SetXStateFeaturesMask,'SetXStateFeaturesMask',\
            LocateXStateFeature,'LocateXStateFeature'      
Post 25 May 2011, 12:43
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 25 May 2011, 13:56
It works!!! Way to go Alphonso!
You must be good at analysing that kernel code.
↑↑↑ This was my initial reation

↓↓↓ But then I realised:
Why the *** would M$ make you go though so many kernel functions that are included ONLY on the latest service pack for the lastest verson of windows?

Unless there is some kind of conditional import function, one would need two versons of the .exe file

....

The answer to the thread's title is N0!


EDIT: or maybe yes. I took out all of the unnec kernel calls involved in intializing the structure. Alphonso, see if it works for you.
Also, to follow up on your question on loosing the upper bit of ymm0-ymm5,
Your LOWER 128 bits are getting clobbered to 0 by the call to sleep (or print in that case), the UPPER ones are intact. Mine are also zero.
Code:
format PE64 GUI 6.0
include 'win64a.inc'


struct  XMM_SAVE_AREA32
        ControlWord             rw      1
        StatusWord              rw      1
        TagWord                 rb      1
        Reserved1               rb      1
        ErrorOpcode             rw      1
        ErrorOffset             rd      1
        ErrorSelector           rw      1
        Reserved2               rw      1
        DataOffset              rd      1
        DataSelector            rw      1
        Reserved3               rw      1
        MxCsr                   rd      1
        MxCsr_Mask              rd      1
        FloatRegisters          rb      8*16     
        XmmRegisters            rb      16*16   ; 0x1A0 
        Reserved4               rb      96
ends

struct  CONTEXT64_WITH_XSTATE
; Register parameter home addresses.
        P1Home                  rq      1       ; 0x000
        P2Home                  rq      1
        P3Home                  rq      1
        P4Home                  rq      1
        P5Home                  rq      1
        P6Home                  rq      1
; Control flags.
        ContextFlags            rd      1       ; 0x030
        MxCsr                   rd      1
; Segment Registers and processor flags.
        SegCs                   rw      1
        SegDs                   rw      1
        SegEs                   rw      1
        SegFs                   rw      1
        SegGs                   rw      1
        SegSs                   rw      1
        EFlags                  rd      1
; Debug registers.
        Dr0                     rq      1
        Dr1                     rq      1
        Dr2                     rq      1
        Dr3                     rq      1
        Dr6                     rq      1
        Dr7                     rq      1
; Integer registers.
        Rax                     rq      1
        Rcx                     rq      1
        Rdx                     rq      1
        Rbx                     rq      1
        Rsp                     rq      1
        Rbp                     rq      1
        Rsi                     rq      1
        Rdi                     rq      1
        R8                      rq      1
        R9                      rq      1
        R10                     rq      1
        R11                     rq      1
        R12                     rq      1
        R13                     rq      1
        R14                     rq      1
        R15                     rq      1
; Program counter.
        Rip                     rq      1       ; 0x0F8
; Floating point state.
        FltSave                 XMM_SAVE_AREA32 ; 0x100
; Vector registers.
        VectorRegister          rb      16*26   ; 0x300
        VectorControl           rq      1
; Special debug control registers.
        DebugControl            rq      1
        LastBranchToRip         rq      1
        LastBranchFromRip       rq      1
        LastExceptionToRip      rq      1
        LastExceptionFromRip    rq      1
;;;;;;;;;;;;; Extra Stuff ;;;;;;;;;;;;;;;;;;;;;;;;;;;;  
        XShitToInit             rb      48      ; 0x4D0
        XSaveHeader             rb      64      ; 0x500
        YmmUpper                rb      16*16   ; 0x540
                                rb      256
ends                                            ; 0x640




;Structure Context

;----------------------------------------------- 
section '.text' code readable executable 
;=============================================== 
             push  rbp
              mov  eax,1
            cpuid
              and  ecx,0x18000000
              cmp  ecx,0x18000000
              jne  exit
              mov  ecx,0
           xgetbv
              and  eax,0x06
              cmp  eax,0x06
              jne  exit



        invoke  CreateThread,0,0,AVXThread,0,CREATE_SUSPENDED,ThreadID                ; Test thread. 
        mov     [hThread],rax 
        invoke  ResumeThread,[hThread] 
        invoke  Sleep,50                                                              ; Give the thread a chance to switch. 




;        invoke  InitializeContext,InitContext,100040h,AlignedContext,BytesRequired    ; Yes, 100040h.!
;        cmp     rax,0                                                                 ; If 0 then check buffer big enough
;        jz      exit                                                                  ; by check BytesRequired.
;        mov     rbx,[AlignedContext]                                                  ; Our Aligned Context pointer.
;        invoke  SetXStateFeaturesMask,rbx,7,0                                         ; Still need to set this.!
;        cmp     rax,0
;        jz      exit
;;;;;;;;;;;;;;;;;;; achieved with below

        mov     dword[ThreadContext.ContextFlags],0x0010005F          ; let's get everything
        vpxor    xmm0,xmm0,xmm0
        vmovdqa  xword[ThreadContext.XShitToInit+16*0],xmm0                            ;
        vmovdqa  yword[ThreadContext.XShitToInit+16*1],ymm0                            ; 48 bytes
        vmovdqa  yword[ThreadContext.XSaveHeader+16*0],ymm0                             ;
        vmovdqa  yword[ThreadContext.XSaveHeader+16*2],ymm0                             ; 64 bytes
        mov     eax,0xFFFFFB30
        mov     dword[ThreadContext.XShitToInit+4*0],eax
        mov     dword[ThreadContext.XShitToInit+4*1],0x0640                             ; the size of context struct ?
        mov     dword[ThreadContext.XShitToInit+4*2],eax
        mov     dword[ThreadContext.XShitToInit+4*3],0x04D0                             ; offset of ShitToInit ?
        mov     dword[ThreadContext.XShitToInit+4*4],0x0030
        mov     dword[ThreadContext.XShitToInit+4*5],0x0140                             ; Size of XSave struct ?
        mov     dword[ThreadContext.XSaveHeader+4*0],0x0004


;        invoke  GetXStateFeaturesMask,rbx,XState
;        cmp     rax,0
;        jz      exit                                                                  ; Function failed.
;        cmp     [XState],7                                                            ; See if now AVX ready.
;        jne     exit
;;;;;;;;;;;;;;;;;;;;;; achieved with cpuid



        invoke  GetThreadContext,[hThread],ThreadContext
        cmp     rax,0 
        jz      exit 


;        invoke  LocateXStateFeature,rbx,2,Length                                      ; Not so sure about this.
;        cmp     rax,0                                                                 ; See what you think.
;        jz      exit
;        mov     rdi,rax                                                               ; Pointer to start of upper Ymm.
;;;;;;;;;;;;;;; upon return, rdi = rbx + 0x540


; let's read ymm15
        invoke  wsprintf,Buff,wsformat,\ 
                qword[ThreadContext.FltSave.XmmRegisters+16*15],qword[ThreadContext.FltSave.XmmRegisters+16*15+8],\
                qword[ThreadContext.YmmUpper+16*15],qword[ThreadContext.YmmUpper+16*15+8]
        invoke  MessageBox,0,Buff,Tit,0 

 exit: 
        invoke  ExitProcess,0 

;----------------------------------------------- 
 proc   AVXThread 
;=============================================== 
        push    rbx 
        mov     rbx,3                                                                 ; Number of loops before terminating AVXThread. 
 @@: 
        vlddqu  ymm0,[ymmreg]                                                         ; Store some numbers. 
        vlddqu  ymm1,[ymmreg] 
        vlddqu  ymm2,[ymmreg] 
        vlddqu  ymm3,[ymmreg] 
        vlddqu  ymm4,[ymmreg] 
        vlddqu  ymm5,[ymmreg] 
        vlddqu  ymm6,[ymmreg] 
        vlddqu  ymm7,[ymmreg] 
        vlddqu  ymm8,[ymmreg] 
        vlddqu  ymm9,[ymmreg] 
        vlddqu  ymm10,[ymmreg] 
        vlddqu  ymm11,[ymmreg] 
        vlddqu  ymm12,[ymmreg] 
        vlddqu  ymm13,[ymmreg] 
        vlddqu  ymm14,[ymmreg] 
        vlddqu  ymm15,[ymmreg] 
        invoke  Sleep,200 
        dec     rbx 
        jnz     @b 
        pop     rbx 
        ret 
 endp 
;----------------------------------------------- 
section '.data' data readable writeable 
;===============================================

ThreadContext            CONTEXT64_WITH_XSTATE

  ;InitContext           rb 1000h
  ymmreg:               dq 1111222233334444h,5555666677778888h,99990000aaaabbbbh,0ccccddddeeeeffffh
;  AlignedContext        dq ?
;  BytesRequired         dq 1000h
;  XState                dq ?
;  Length                dq ?
;  YmmUpperStart         dq ?
  ThreadID              dq ? 
  hThread               dq ? 
  Tit                   db 'XSAVE',0 
  wsformat              db 'Ymm15: %016I64X%016I64X%016I64X%016I64X',0
  Buff                  rb 100h 
;---------------------------------------------- 
section '.idata' import data readable writeable 
;=============================================== 
     library kernel32,'KERNEL32.DLL',\ 
               user32,'USER32.DLL';,\
    ;           kern32,'KERNEL32.DLL'
             include 'api\kernel32.inc' 
             include 'api\user32.inc' 

    ; import  kern32,\
    ;        GetEnabledXStateFeatures,'GetEnabledXStateFeatures',\
    ;        InitializeContext,'InitializeContext',\
    ;        GetXStateFeaturesMask,'GetXStateFeaturesMask',\
    ;        SetXStateFeaturesMask,'SetXStateFeaturesMask',\
    ;        LocateXStateFeature,'LocateXStateFeature'    
Post 25 May 2011, 13:56
View user's profile Send private message Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 26 May 2011, 10:49
tthsqe wrote:
I took out all of the unnec kernel calls involved in intializing the structure. Alphonso, see if it works for you.
It works for me, good job Smile

That SetXStateFeaturesMask will likely be just SetXStateFeaturesMask,pContext,[XstateMask] since if the mask is 64-bit that's why the 32-bit call requires an extra dword.

Not sure why MS are taking so long to sort out AVX support. You'd think if they are going to tell you what functions are required that they would at least post the API function description on the online MSDN. Seems they have had their own problems as well, wish I could download the fix for it.Sad


tthsqe wrote:
Also, to follow up on your question on loosing the upper bit of ymm0-ymm5,
Your LOWER 128 bits are getting clobbered to 0 by the call to sleep (or print in that case), the UPPER ones are intact. Mine are also zero.
Yes, it was the lower bits (xmm) not the upper bits. I must of been having a bad case of aixelsyd that day Laughing
Thanks for pointing that out, I've corrected the post for lower bits.


Last edited by Alphonso on 26 May 2011, 10:58; edited 1 time in total
Post 26 May 2011, 10:49
View user's profile Send private message Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 516
Location: Czech republic, Slovak republic
Feryno 26 May 2011, 10:58
the extended context must be aligned at 40h (because of xstate)
erasing low parts of ymm0-ymm5 is normal as OS destroys xmm0-xmm5 at API, the damage is made when thread calls Sleep, if you replace the call to sleep with jmp $-2 I'm sure the registers stay intact
the contextflag must be really CONTEXT_AMD64 or 40h which is contradictory with SDK 7.1 where the mask should be ... or 20h (the mask with 20h is refused by InitializeContext, only mask with 40h passes InitializeContext)

it is possible to produce only 1 exe capable to handle both situations with and without AVX by simple design no matter old CPU or old OS version:
Code:
lea rdx,[string_InitializeContext]
mov rcx,[hModule]
call [GetProcAddress]
mov [InitializeContext_address],rax


.....
lea r9,[BytesRequired]
lea r8,[AlignedContext]
mov edx,100040h
lea rcx,[context]
mov rax,[InitializeContext_address]
or rax,rax
jz fall_back_to_legacy_way ; skip AVX and go to call GetThreadContext
call rax
...

data section
string_InitializeContext db 'InitializeContext',0
InitializeContext_address dq ?    
Post 26 May 2011, 10:58
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 26 May 2011, 16:17
Good stuff. Everything is working on Vista AND Win7 (using the kernel functions with 40h). I haven't applied the hotfix though. I'll post the AVX compatible fdbg when I polish the disassembler and it becomes clear if we need to use 20h or 40h.
Debugging the disassembler for the debugger in itself does not work very well. Smile
Post 26 May 2011, 16:17
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.