flat assembler
Message board for the users of flat assembler.
![]() Goto page 1, 2 Next |
Author |
|
Alphonso 22 May 2011, 10:31
Don't know if this helps as you still need to know the memory location of xsave[opt].
Code: format PE GUI 4.0 include 'win32a.inc' ;----------------------------------------------- section '.text' code readable executable ;=============================================== ; check for AVX b4 running this! vlddqu xmm0,[xmmreg] ; 128bit vlddqu ymm1,[ymmreg] ; 256 bit mov eax,-1 ; lazy, not the right way mov edx,-1 ;
|
||||||||||
![]() |
|
tthsqe 22 May 2011, 11:50
Yes - I read the documentation and am aware of where they should be. The problem is that after a call to
GetThreadContext thead_handle, context_struture the ymm fields are 0 in the structure even though the registers in the debugee containe non zero values. Is a different api function required? |
|||
![]() |
|
Alphonso 22 May 2011, 13:14
|
|||
![]() |
|
tthsqe 22 May 2011, 13:55
Been there too.
I would like to know how these functions work (arguments, return values): InitializeContext CopyContext but I couldn't find any documentation. Could it be that the context structure needs to be intialized in a certain way so that the ymm states will be filled in upon a call to GetThreadContext? I guess I will try experimenting by guessing the function parameters ......... These ones probably don't do anything that cpuid can't: LocateXStateFeature GetXStateFeaturesMask SetXStateFeaturesMask |
|||
![]() |
|
vid 22 May 2011, 18:33
Do you call GetThreadContext with CONTEXT_FULL flag set? Check out Windows SDK headers for other CONTEXT_### flags.
|
|||
![]() |
|
tthsqe 23 May 2011, 00:51
vanilla fdbg uses CONTEXT_ALL which is giving me the problems discussed above. I am in the process of downloading the Windows SDK right now - I hope it has all of the answers
definitions: Code: CONTEXT_AMD64 = 100000h CONTEXT_CONTROL = CONTEXT_AMD64 or 01h CONTEXT_INTEGER = CONTEXT_AMD64 or 02h CONTEXT_SEGMENTS = CONTEXT_AMD64 or 04h CONTEXT_FLOATING_POINT = CONTEXT_AMD64 or 08h CONTEXT_DEBUG_REGISTERS = CONTEXT_AMD64 or 10h CONTEXT_FULL = CONTEXT_CONTROL or CONTEXT_INTEGER or CONTEXT_FLOATING_POINT CONTEXT_ALL = CONTEXT_CONTROL or CONTEXT_INTEGER or CONTEXT_SEGMENTS or CONTEXT_FLOATING_POINT or CONTEXT_DEBUG_REGISTERS |
|||
![]() |
|
Feryno 23 May 2011, 07:38
looking hardly for header files, still unable to reach them (SDK, WDK are about 1 year old, from RTM version, but not for SP1 yet)
disassembled something, to obtain ymm, the mask is perhaps CONTEXT_AMD64 or 20h or perhaps CONTEXT_AMD64 or 40h but it is even more complicated - the link posted by Alphonso |
|||
![]() |
|
Madis731 23 May 2011, 08:23
tthsqe - a dumb question: Do you happen to have a non-Windows 7 OS
wiki wrote:
|
|||
![]() |
|
Feryno 23 May 2011, 12:45
it is possible to download win 2008 server R2 SP1 directly from microsoft
it expires in 10 days http://www.microsoft.com/downloads/en/details.aspx?FamilyID=ba571339-5436-4cf5-9c37-6ed7dab6f781&displaylang=en if you patch bootsector (boot via BIOS, not EFI boot) and you don't let the OS run for more than few hours then you have always the same date and OS never expires it requires only to patch about 30 bytes of bootsector to set the same date every boot - by programming CMOS ports - e.g. overwrite some useless string in bootsector with such useful instructions you can't sort new files by date then, but... |
|||
![]() |
|
tthsqe 24 May 2011, 02:44
Ok. I found in the SDK (7.1):
CONTEXT_XSAVE = CONTEXT_AMD64 | 0x020 So I do Code: mov [ThrdContext.ContextFlags],CONTEXT_ALL or 0x20 and then call GetThreadContext. According to the SDK this should return a CONTEXT_EX structure. I do not know the size of the structure so I reserved 8 kbytes for it. The problem is that this 8 kbytes is the same whether or not I include the flag 0x020 or write different values to the ymm registers. I am just going to assume it is broken and wait for more documentation. ![]() In the mean time, I'll improve other parts of fdbg. |
|||
![]() |
|
Feryno 24 May 2011, 07:34
this is correct as you already wrote:
CONTEXT_XSAVE = CONTEXT_AMD64 | 0x020 for 32 bits (32-bit win or maybe for 32-bit emulated subsystem of 64-bit win?) the mask is 40h try these extra things: zero the whole xsave area set the first byte of the area (= header) to value of 7 (111b = enabled_YMM or enabled_XMM or enabled_FPU) it is the byte +512 after FPU/SSE (=512 bytes after MXCSR) I don't plane to obtain AVX CPU, but studied this feature a lot, I'll try to help you as much as possible (have installed AVX capable win, may trace kernel and find out some secrets, it will be a bit complicated with improper CPU, but may try it anyway) the bit 2. of the first byte of xsave area must be set to 1 for allowing the XSAVE instruction to save YMM_H into the area (high halves of YMM, for Intel it should be +64 bytes after the xsave area header = 64 bytes after the value you set to 7) the bit 1. set to 1 to store XMM (low halves of YMM) the bit 0. set to 1 to store FPU the mask is then 111b = 7 put the byte 7 into the first byte of the xsave area which is +512 bytes after FPU area (MXCSR is at offset 0 of FPU area), the rest of xsave area should be zeroed if that doesn't help, the perhaps kernel32.dll GetXStateFeaturesMask then set bit 2. of the result to 1 (maybe better do the OR with value of 7 to be sure) and then it using SetXStateFeaturesMask do you have AMD or Intel CPU? |
|||
![]() |
|
tthsqe 25 May 2011, 00:28
I modified the calls to GetThreadContext as:
Code: align 16 GetThreadContext_debuggee_CONTEXT_ALL: ; input: ECX dwThreadId ; ThreadContext buffer ; output:set Carry Flag if error push rax rcx rdx r8 r9 r10 r11 sub rsp,8*(4+0) virtual at rdx ThrdContext CONTEXT64 end virtual mov eax,ecx call TID2hThread jc GetThreadContext_debuggee_CONTEXT_ALL_epilogue mov rcx,rdx lea rdx,[ThreadContext] mov dword[ThrdContext.ContextFlags],CONTEXT_ALL or 0x20 xor eax,eax mov qword[ThrdContext.FltSave+512+8*0],rax ; clear the header of X_SAVE mov qword[ThrdContext.FltSave+512+8*1],rax ; (the first 64 bytes) mov qword[ThrdContext.FltSave+512+8*2],rax ; mov qword[ThrdContext.FltSave+512+8*3],rax ; mov qword[ThrdContext.FltSave+512+8*4],rax ; mov qword[ThrdContext.FltSave+512+8*5],rax ; mov qword[ThrdContext.FltSave+512+8*7],rax ; mov qword[ThrdContext.FltSave+512+8*8],rax ; add eax,7 mov byte[ThrdContext.FltSave+512],al ; enabled_YMM | enabled_XMM | enabled_FPU call qword [GetThreadContext] push rax push rsi invoke CreateFileA,write_filename,GENERIC_WRITE,0,0,CREATE_ALWAYS,0,0 mov rsi,rax invoke WriteFile,rsi,ThreadContext,8*1024,temp,0 invoke CloseHandle,rsi pop rsi pop rax ; If the function succeeds, the return value is nonzero. If the function fails, the return value is zero. sub eax,1 ; set Carry flag if API return 0 GetThreadContext_debuggee_CONTEXT_ALL_epilogue: lea rsp,[rsp+8*(4+0)] ; this doesn't touch carry flag instead of ADD pop r11 r10 r9 r8 rdx rcx rax ret But still the whole output file consists of NULL's after the 07h that I set. Seems to not work. If I knew the parameters to GetXFeatureMask, ..ect I could try this as well. Also, aren't we overlapping the Vector registers and Special debug control registers part of the context structure? Code: struct CONTEXT64 ; Register parameter home addresses. ; N.B. These fields are for convience - they could be used to extend the context record in the future. P1Home rq 1 P2Home rq 1 P3Home rq 1 P4Home rq 1 P5Home rq 1 P6Home rq 1 ; Control flags. ContextFlags rd 1 MxCsr rd 1 ; Segment Registers and processor flags. SegCs rw 1 SegDs rw 1 SegEs rw 1 SegFs rw 1 SegGs rw 1 SegSs rw 1 ... ... ... R15 rq 1 ; Program counter. Rip rq 1 ; Floating point state. FltSave XMM_SAVE_AREA32 ; Vector registers. VectorRegister rb 16*26 VectorControl rq 1 ; Special debug control registers. DebugControl rq 1 LastBranchToRip rq 1 LastBranchFromRip rq 1 LastExceptionToRip rq 1 LastExceptionFromRip rq 1 ends The funny thing is that the upper 128 bits of the ymm registers work correctly (see commented line). Since fdbg doesn't even know about them, something else is at work here. This is the test code I am running: Code: format PE64 GUI 4.0 include 'win64a.inc' ;----------------------------------------------- section '.text' code readable executable ;=============================================== vmovapd ymm0,[ymmreg] vmovapd ymm1,[ymmreg] vmovapd ymm15,[ymmreg] and rsp,-32 sub rsp,32 vmovapd [rsp],ymm0 ; all 32 correct bytes of ymm0 are now visible on the stack in fdbg invoke ExitProcess,0 ;----------------------------------------------- section '.data' data readable writeable ;=============================================== ymmreg: dq 1111222233334444h,5555666677778888h,99990000aaaabbbbh,0ccccddddeeeeffffh ;---------------------------------------------- section '.idata' import data readable writeable ;=============================================== dd 0,0,0,RVA kernel_name,RVA kernel_table dd 0,0,0,0,0 kernel_table: ExitProcess dq RVA _ExitProcess dq 0 kernel_name db 'KERNEL32.DLL',0 _ExitProcess dw 0 db 'ExitProcess',0 Code: |
|||
![]() |
|
Feryno 25 May 2011, 06:13
Hi, please wait, I'll download SDK 7.1 (I saw it months ago but the date of its release confused me - the date with year 2010 is older than the date of releasing SP1 in year 2011 so I ignored it for that reason yet, but it is really version 7.1 which is sign of SP1)
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=35aeda01-421d-4ba5-b44b-543dc8c33a20 http://download.microsoft.com/download/F/1/0/F10113F5-B750-4969-A255-274341AC6BCE/GRMSDKX_EN_DVD.iso It seems to be necessary to go using official way as ms described and to use these 2 calls InitializeContext CopyContext the low halves of YMM should be aliased in XMM registers the upper halves should be at higher offsets in context I'll extract necessary things from SDK and single step kernel and prepare something (to know how to pass params to the above APIs, the structure of extended context). Then you'll have to test that (I have only AVX OS, not AVX CPU). see this (win initializes a lot of things + does aligning also) Code: 000000007708D850 48895C2408 ntdll.RtlInitializeContext: mov [rsp+08],rbx 000000007708D855 48896C2410 mov [rsp+10],rbp 000000007708D85A 4889742418 mov [rsp+18],rsi 000000007708D85F 57 push rdi 000000007708D860 4883EC20 sub rsp,20 000000007708D864 488B442450 mov rax,[rsp+50] 000000007708D869 498BF1 mov rsi,r9 000000007708D86C 498BE8 mov rbp,r8 000000007708D86F 488BFA mov rdi,rdx 000000007708D872 A80F test al,0F 000000007708D874 740B jz 000000007708D881 000000007708D876 B9090000C0 mov ecx,C0000009 000000007708D87B E840FFFFFF call 000000007708D7C0 ; ntdll.RtlRaiseStatus 000000007708D880 CC int3 000000007708D881 4883627800 and qword [rdx+78],00 000000007708D886 4883A2A000000000 and qword [rdx+000000A0],00 000000007708D88E 41B800020000 mov r8d,00000200 000000007708D894 44894244 mov [rdx+44],r8d 000000007708D898 48C7829000000001000000 mov qword [rdx+00000090],00000001 000000007708D8A3 48898298000000 mov [rdx+00000098],rax 000000007708D8AA 48C782A800000004000000 mov qword [rdx+000000A8],00000004 000000007708D8B5 48C782B000000005000000 mov qword [rdx+000000B0],00000005 000000007708D8C0 48C782B800000008000000 mov qword [rdx+000000B8],00000008 000000007708D8CB 48C782C80000000A000000 mov qword [rdx+000000C8],0000000A 000000007708D8D6 48C782D00000000B000000 mov qword [rdx+000000D0],0000000B 000000007708D8E1 48C782D80000000C000000 mov qword [rdx+000000D8],0000000C 000000007708D8EC 48C782E00000000D000000 mov qword [rdx+000000E0],0000000D 000000007708D8F7 48C782E80000000E000000 mov qword [rdx+000000E8],0000000E 000000007708D902 48C782F00000000F000000 mov qword [rdx+000000F0],0000000F 000000007708D90D 488D8F00010000 lea rcx,[rdi+00000100] 000000007708D914 33D2 xor edx,edx 000000007708D916 E8B555F8FF call 0000000077012ED0 ; ntdll.memset 000000007708D91B 488B5C2430 mov rbx,[rsp+30] 000000007708D920 B87F020000 mov eax,0000027F ; FPU control word default value 000000007708D925 41BB801F0000 mov r11d,00001F80 ; MxCsr default value 000000007708D92B 66898700010000 mov [rdi+00000100],ax ; control word 000000007708D932 48B8708090A0C0D0E0F0 mov rax,F0E0D0C0A0908070 000000007708D93C 4889B7F8000000 mov [rdi+000000F8],rsi 000000007708D943 488B742440 mov rsi,[rsp+40] 000000007708D948 4889AF80000000 mov [rdi+00000080],rbp 000000007708D94F 488B6C2438 mov rbp,[rsp+38] 000000007708D954 488987C0000000 mov [rdi+000000C0],rax 000000007708D95B 44895F34 mov [rdi+34],r11d ; Context.MxCsr 000000007708D95F 44899F18010000 mov [rdi+00000118],r11d ; Context.FltSave.MxCsr 000000007708D966 C747300B001000 mov dword [rdi+30],0010000B ; Context.ContextFlags 000000007708D96D 4883C420 add rsp,20 000000007708D971 5F pop rdi 000000007708D972 C3 ret Code: 0000000077066D40 48895C2408 ntdll.RtlInitializeExtendedContext: mov [rsp+08],rbx 0000000077066D45 4889742410 mov [rsp+10],rsi 0000000077066D4A 57 push rdi 0000000077066D4B 4883EC20 sub rsp,20 0000000077066D4F 448BD2 mov r10d,edx 0000000077066D52 4C8BD9 mov r11,rcx 0000000077066D55 488D542448 lea rdx,[rsp+48] 0000000077066D5A 418BCA mov ecx,r10d 0000000077066D5D 498BF0 mov rsi,r8 0000000077066D60 33FF xor edi,edi 0000000077066D62 E8C9FDFFFF call 0000000077066B30 0000000077066D67 85C0 test eax,eax 0000000077066D69 0F88CC000000 js 0000000077066E3B 0000000077066D6F 418BD2 mov edx,r10d 0000000077066D72 81E200000100 and edx,00010000 0000000077066D78 7411 jz 0000000077066D8B 0000000077066D7A 498D4B03 lea rcx,[r11+03] 0000000077066D7E 4883E1FC and rcx,FFFFFFFFFFFFFFFC ; align context at dword (align 4) 0000000077066D82 488DB9CC020000 lea rdi,[rcx+000002CC] 0000000077066D89 EB32 jmp 0000000077066DBD 0000000077066D8B 410FBAE214 bt r10d,14 0000000077066D90 7315 jnc 0000000077066DA7 0000000077066D92 498D4B0F lea rcx,[r11+0F] 0000000077066D96 4883E1F0 and rcx,FFFFFFFFFFFFFFF0 ; align Context at dqword (align 10h) 0000000077066D9A 44895130 mov [rcx+30],r10d ; Context.ContextFlags 0000000077066D9E 488DB9D0040000 lea rdi,[rcx+000004D0] 0000000077066DA5 EB19 jmp 0000000077066DC0 0000000077066DA7 410FBAE213 bt r10d,13 0000000077066DAC 7319 jnc 0000000077066DC7 0000000077066DAE 498D4B0F lea rcx,[r11+0F] 0000000077066DB2 4883E1F0 and rcx,FFFFFFFFFFFFFFF0 0000000077066DB6 488DB9700A0000 lea rdi,[rcx+00000A70] 0000000077066DBD 448911 mov [rcx],r10d 0000000077066DC0 8BC7 mov eax,edi 0000000077066DC2 2BC1 sub eax,ecx 0000000077066DC4 89470C mov [rdi+0C],eax 0000000077066DC7 8B4F0C mov ecx,[rdi+0C] 0000000077066DCA 8BC1 mov eax,ecx 0000000077066DCC F7D8 neg eax 0000000077066DCE 894708 mov [rdi+08],eax 0000000077066DD1 8907 mov [rdi],eax 0000000077066DD3 8D4118 lea eax,[rcx+18] 0000000077066DD6 894704 mov [rdi+04],eax 0000000077066DD9 85D2 test edx,edx 0000000077066DDB 7414 jz 0000000077066DF1 0000000077066DDD B820000100 mov eax,00010020 ; CONTEXT_AMD64, CONTEXT_XSTATE 0000000077066DE2 4423D0 and r10d,eax 0000000077066DE5 443BD0 cmp r10d,eax 0000000077066DE8 7407 jz 0000000077066DF1 0000000077066DEA C7470CCC000000 mov dword [rdi+0C],000000CC 0000000077066DF1 F644244802 test byte [rsp+48],02 0000000077066DF6 7433 jz 0000000077066E2B 0000000077066DF8 33D2 xor edx,edx 0000000077066DFA 488D5F57 lea rbx,[rdi+57] 0000000077066DFE 4883E3C0 and rbx,FFFFFFFFFFFFFFC0 0000000077066E02 448D4240 lea r8d,[rdx+40] 0000000077066E06 488BCB mov rcx,rbx 0000000077066E09 E8C2C0FAFF call 0000000077012ED0 ; ntdll.memset 0000000077066E0E 2BDF sub ebx,edi 0000000077066E10 895F10 mov [rdi+10],ebx 0000000077066E13 8B0425E803FE7F mov eax,[7FFE03E8] ; []=00000240 0000000077066E1A 0500FEFFFF add eax,FFFFFE00 0000000077066E1F 894714 mov [rdi+14],eax 0000000077066E22 2B07 sub eax,[rdi] 0000000077066E24 03C3 add eax,ebx 0000000077066E26 894704 mov [rdi+04],eax 0000000077066E29 EB0B jmp 0000000077066E36 0000000077066E2B 83671400 and dword [rdi+14],00 0000000077066E2F C7471019000000 mov dword [rdi+10],00000019 0000000077066E36 48893E mov [rsi],rdi 0000000077066E39 33C0 xor eax,eax 0000000077066E3B 488B5C2430 mov rbx,[rsp+30] 0000000077066E40 488B742438 mov rsi,[rsp+38] 0000000077066E45 4883C420 add rsp,20 0000000077066E49 5F pop rdi 0000000077066E4A C3 ret 0000000077066B30 0FBAE110 bt ecx,10 0000000077066B34 4C8BC2 mov r8,rdx 0000000077066B37 7308 jnc 0000000077066B41 0000000077066B39 F7C180FFFEFF test ecx,FFFEFF80 0000000077066B3F 741C jz 0000000077066B5D 0000000077066B41 0FBAE114 bt ecx,14 0000000077066B45 7308 jnc 0000000077066B4F 0000000077066B47 F7C1A0FFEF27 test ecx,27EFFFA0 0000000077066B4D 740E jz 0000000077066B5D 0000000077066B4F 0FBAE113 bt ecx,13 0000000077066B53 734C jnc 0000000077066BA1 0000000077066B55 F7C1C0FFF727 test ecx,27F7FFC0 0000000077066B5B 7544 jnz 0000000077066BA1 0000000077066B5D 41B940000100 mov r9d,00010040 0000000077066B63 8BC1 mov eax,ecx 0000000077066B65 BA01000000 mov edx,00000001 0000000077066B6A 4123C1 and eax,r9d 0000000077066B6D 413BC1 cmp eax,r9d 0000000077066B70 740B jz 0000000077066B7D 0000000077066B72 B840001000 mov eax,00100040 0000000077066B77 23C8 and ecx,eax 0000000077066B79 3BC8 cmp ecx,eax 0000000077066B7B 7519 jnz 0000000077066B96 0000000077066B7D 48F70425E003FE7FFCFFFFFF test qword [7FFE03E0],FFFFFFFC ; []=0000000000000000 0000000077066B89 7506 jnz 0000000077066B91 0000000077066B8B B8BB0000C0 mov eax,C00000BB 0000000077066B90 C3 ret 0000000077066B91 BA03000000 mov edx,00000003 0000000077066B96 4D85C0 test r8,r8 0000000077066B99 7403 jz 0000000077066B9E 0000000077066B9B 418910 mov [r8],edx 0000000077066B9E 33C0 xor eax,eax 0000000077066BA0 C3 ret 0000000077066BA1 B80D0000C0 mov eax,C000000D 0000000077066BA6 C3 ret |
|||
![]() |
|
Alphonso 25 May 2011, 08:26
tthsqe wrote:
Seems to be something like... Code: GetXStateFeaturesMask,pContext,pXState ; returns true if okay ; and XState of the context structure SetXStateFeaturesMask,pContext,[Xstate],0 ; Not sure about the last one (0) ; Returns true if it thinks it's successful SetXStateFeaturesMask appears to need some other bits set in the Context, maybe this is what InitializeContext does??? Edit: Okay, InitializeContext makes things much easier, at least in 32-bit now have upper bits of ymm's. I'll try 64-bit and post some code. ![]() |
|||
![]() |
|
Alphonso 25 May 2011, 12:43
Okay, hope this helps. Feel free to make corrections
![]() Code: format PE64 GUI 6.0 include 'win64a.inc' ;----------------------------------------------- section '.text' code readable executable ;=============================================== sub rsp,5*8 invoke InitializeContext,InitContext,100040h,AlignedContext,BytesRequired ; Yes, 100040h.! cmp rax,0 jz exit ; If 0 then check buffer big enough ; by check BytesRequired. mov rbx,[AlignedContext] ; Our Aligned Context pointer. invoke CreateThread,0,0,AVXThread,0,CREATE_SUSPENDED,ThreadID ; Test thread. mov [hThread],rax invoke ResumeThread,[hThread] invoke Sleep,50 ; Give the thread a chance to switch. invoke SetXStateFeaturesMask,rbx,7,0 ; Still need to set this.! cmp rax,0 jz exit invoke GetXStateFeaturesMask,rbx,XState cmp rax,0 jz exit ; Function failed. cmp [XState],7 ; See if now AVX ready. jne exit invoke GetThreadContext,[hThread],rbx cmp rax,0 jz exit invoke LocateXStateFeature,rbx,2,Length ; Not so sure about this. cmp rax,0 ; See what you think. jz exit mov rdi,rax ; Pointer to start of upper Ymm. invoke wsprintf,Buff,wsformat,\ qword[rbx+200h],qword[rbx+208h],\ ; Lower Ymm0 (Xmm0). qword[rdi],qword[rdi+8h] ; Upper Ymm0. invoke MessageBox,0,Buff,Tit,0 exit: invoke ExitProcess,0 ;----------------------------------------------- proc AVXThread ;=============================================== push rbx mov rbx,3 ; Number of loops before terminating AVXThread. @@: vlddqu ymm0,[ymmreg] ; Store some numbers. vlddqu ymm1,[ymmreg] vlddqu ymm2,[ymmreg] vlddqu ymm3,[ymmreg] vlddqu ymm4,[ymmreg] vlddqu ymm5,[ymmreg] vlddqu ymm6,[ymmreg] vlddqu ymm7,[ymmreg] vlddqu ymm8,[ymmreg] vlddqu ymm9,[ymmreg] vlddqu ymm10,[ymmreg] vlddqu ymm11,[ymmreg] vlddqu ymm12,[ymmreg] vlddqu ymm13,[ymmreg] vlddqu ymm14,[ymmreg] vlddqu ymm15,[ymmreg] invoke Sleep,200 dec rbx jnz @b pop rbx ret endp ;----------------------------------------------- section '.data' data readable writeable ;=============================================== InitContext rb 1000h ymmreg: dq 1111222233334444h,5555666677778888h,99990000aaaabbbbh,0ccccddddeeeeffffh AlignedContext dq ? BytesRequired dq 1000h XState dq ? Length dq ? YmmUpperStart dq ? ThreadID dq ? hThread dq ? Tit db 'XSAVE',0 wsformat db 'Ymm0 : %016I64X%016I64X%016I64X%016I64X',0 Buff rb 100h ;---------------------------------------------- section '.idata' import data readable writeable ;=============================================== library kernel32,'KERNEL32.DLL',\ user32,'USER32.DLL',\ kern32,'KERNEL32.DLL' include 'api\kernel32.inc' include 'api\user32.inc' import kern32,\ GetEnabledXStateFeatures,'GetEnabledXStateFeatures',\ InitializeContext,'InitializeContext',\ GetXStateFeaturesMask,'GetXStateFeaturesMask',\ SetXStateFeaturesMask,'SetXStateFeaturesMask',\ LocateXStateFeature,'LocateXStateFeature' |
|||
![]() |
|
tthsqe 25 May 2011, 13:56
It works!!! Way to go Alphonso!
You must be good at analysing that kernel code. ↑↑↑ This was my initial reation ↓↓↓ But then I realised: Why the *** would M$ make you go though so many kernel functions that are included ONLY on the latest service pack for the lastest verson of windows? Unless there is some kind of conditional import function, one would need two versons of the .exe file .... The answer to the thread's title is N0! EDIT: or maybe yes. I took out all of the unnec kernel calls involved in intializing the structure. Alphonso, see if it works for you. Also, to follow up on your question on loosing the upper bit of ymm0-ymm5, Your LOWER 128 bits are getting clobbered to 0 by the call to sleep (or print in that case), the UPPER ones are intact. Mine are also zero. Code: format PE64 GUI 6.0 include 'win64a.inc' struct XMM_SAVE_AREA32 ControlWord rw 1 StatusWord rw 1 TagWord rb 1 Reserved1 rb 1 ErrorOpcode rw 1 ErrorOffset rd 1 ErrorSelector rw 1 Reserved2 rw 1 DataOffset rd 1 DataSelector rw 1 Reserved3 rw 1 MxCsr rd 1 MxCsr_Mask rd 1 FloatRegisters rb 8*16 XmmRegisters rb 16*16 ; 0x1A0 Reserved4 rb 96 ends struct CONTEXT64_WITH_XSTATE ; Register parameter home addresses. P1Home rq 1 ; 0x000 P2Home rq 1 P3Home rq 1 P4Home rq 1 P5Home rq 1 P6Home rq 1 ; Control flags. ContextFlags rd 1 ; 0x030 MxCsr rd 1 ; Segment Registers and processor flags. SegCs rw 1 SegDs rw 1 SegEs rw 1 SegFs rw 1 SegGs rw 1 SegSs rw 1 EFlags rd 1 ; Debug registers. Dr0 rq 1 Dr1 rq 1 Dr2 rq 1 Dr3 rq 1 Dr6 rq 1 Dr7 rq 1 ; Integer registers. Rax rq 1 Rcx rq 1 Rdx rq 1 Rbx rq 1 Rsp rq 1 Rbp rq 1 Rsi rq 1 Rdi rq 1 R8 rq 1 R9 rq 1 R10 rq 1 R11 rq 1 R12 rq 1 R13 rq 1 R14 rq 1 R15 rq 1 ; Program counter. Rip rq 1 ; 0x0F8 ; Floating point state. FltSave XMM_SAVE_AREA32 ; 0x100 ; Vector registers. VectorRegister rb 16*26 ; 0x300 VectorControl rq 1 ; Special debug control registers. DebugControl rq 1 LastBranchToRip rq 1 LastBranchFromRip rq 1 LastExceptionToRip rq 1 LastExceptionFromRip rq 1 ;;;;;;;;;;;;; Extra Stuff ;;;;;;;;;;;;;;;;;;;;;;;;;;;; XShitToInit rb 48 ; 0x4D0 XSaveHeader rb 64 ; 0x500 YmmUpper rb 16*16 ; 0x540 rb 256 ends ; 0x640 ;Structure Context ;----------------------------------------------- section '.text' code readable executable ;=============================================== push rbp mov eax,1 cpuid and ecx,0x18000000 cmp ecx,0x18000000 jne exit mov ecx,0 xgetbv and eax,0x06 cmp eax,0x06 jne exit invoke CreateThread,0,0,AVXThread,0,CREATE_SUSPENDED,ThreadID ; Test thread. mov [hThread],rax invoke ResumeThread,[hThread] invoke Sleep,50 ; Give the thread a chance to switch. ; invoke InitializeContext,InitContext,100040h,AlignedContext,BytesRequired ; Yes, 100040h.! ; cmp rax,0 ; If 0 then check buffer big enough ; jz exit ; by check BytesRequired. ; mov rbx,[AlignedContext] ; Our Aligned Context pointer. ; invoke SetXStateFeaturesMask,rbx,7,0 ; Still need to set this.! ; cmp rax,0 ; jz exit ;;;;;;;;;;;;;;;;;;; achieved with below mov dword[ThreadContext.ContextFlags],0x0010005F ; let's get everything vpxor xmm0,xmm0,xmm0 vmovdqa xword[ThreadContext.XShitToInit+16*0],xmm0 ; vmovdqa yword[ThreadContext.XShitToInit+16*1],ymm0 ; 48 bytes vmovdqa yword[ThreadContext.XSaveHeader+16*0],ymm0 ; vmovdqa yword[ThreadContext.XSaveHeader+16*2],ymm0 ; 64 bytes mov eax,0xFFFFFB30 mov dword[ThreadContext.XShitToInit+4*0],eax mov dword[ThreadContext.XShitToInit+4*1],0x0640 ; the size of context struct ? mov dword[ThreadContext.XShitToInit+4*2],eax mov dword[ThreadContext.XShitToInit+4*3],0x04D0 ; offset of ShitToInit ? mov dword[ThreadContext.XShitToInit+4*4],0x0030 mov dword[ThreadContext.XShitToInit+4*5],0x0140 ; Size of XSave struct ? mov dword[ThreadContext.XSaveHeader+4*0],0x0004 ; invoke GetXStateFeaturesMask,rbx,XState ; cmp rax,0 ; jz exit ; Function failed. ; cmp [XState],7 ; See if now AVX ready. ; jne exit ;;;;;;;;;;;;;;;;;;;;;; achieved with cpuid invoke GetThreadContext,[hThread],ThreadContext cmp rax,0 jz exit ; invoke LocateXStateFeature,rbx,2,Length ; Not so sure about this. ; cmp rax,0 ; See what you think. ; jz exit ; mov rdi,rax ; Pointer to start of upper Ymm. ;;;;;;;;;;;;;;; upon return, rdi = rbx + 0x540 ; let's read ymm15 invoke wsprintf,Buff,wsformat,\ qword[ThreadContext.FltSave.XmmRegisters+16*15],qword[ThreadContext.FltSave.XmmRegisters+16*15+8],\ qword[ThreadContext.YmmUpper+16*15],qword[ThreadContext.YmmUpper+16*15+8] invoke MessageBox,0,Buff,Tit,0 exit: invoke ExitProcess,0 ;----------------------------------------------- proc AVXThread ;=============================================== push rbx mov rbx,3 ; Number of loops before terminating AVXThread. @@: vlddqu ymm0,[ymmreg] ; Store some numbers. vlddqu ymm1,[ymmreg] vlddqu ymm2,[ymmreg] vlddqu ymm3,[ymmreg] vlddqu ymm4,[ymmreg] vlddqu ymm5,[ymmreg] vlddqu ymm6,[ymmreg] vlddqu ymm7,[ymmreg] vlddqu ymm8,[ymmreg] vlddqu ymm9,[ymmreg] vlddqu ymm10,[ymmreg] vlddqu ymm11,[ymmreg] vlddqu ymm12,[ymmreg] vlddqu ymm13,[ymmreg] vlddqu ymm14,[ymmreg] vlddqu ymm15,[ymmreg] invoke Sleep,200 dec rbx jnz @b pop rbx ret endp ;----------------------------------------------- section '.data' data readable writeable ;=============================================== ThreadContext CONTEXT64_WITH_XSTATE ;InitContext rb 1000h ymmreg: dq 1111222233334444h,5555666677778888h,99990000aaaabbbbh,0ccccddddeeeeffffh ; AlignedContext dq ? ; BytesRequired dq 1000h ; XState dq ? ; Length dq ? ; YmmUpperStart dq ? ThreadID dq ? hThread dq ? Tit db 'XSAVE',0 wsformat db 'Ymm15: %016I64X%016I64X%016I64X%016I64X',0 Buff rb 100h ;---------------------------------------------- section '.idata' import data readable writeable ;=============================================== library kernel32,'KERNEL32.DLL',\ user32,'USER32.DLL';,\ ; kern32,'KERNEL32.DLL' include 'api\kernel32.inc' include 'api\user32.inc' ; import kern32,\ ; GetEnabledXStateFeatures,'GetEnabledXStateFeatures',\ ; InitializeContext,'InitializeContext',\ ; GetXStateFeaturesMask,'GetXStateFeaturesMask',\ ; SetXStateFeaturesMask,'SetXStateFeaturesMask',\ ; LocateXStateFeature,'LocateXStateFeature' |
|||
![]() |
|
Alphonso 26 May 2011, 10:49
tthsqe wrote: I took out all of the unnec kernel calls involved in intializing the structure. Alphonso, see if it works for you. ![]() That SetXStateFeaturesMask will likely be just SetXStateFeaturesMask,pContext,[XstateMask] since if the mask is 64-bit that's why the 32-bit call requires an extra dword. Not sure why MS are taking so long to sort out AVX support. You'd think if they are going to tell you what functions are required that they would at least post the API function description on the online MSDN. Seems they have had their own problems as well, wish I could download the fix for it. ![]() tthsqe wrote: Also, to follow up on your question on loosing the upper bit of ymm0-ymm5, ![]() Thanks for pointing that out, I've corrected the post for lower bits. Last edited by Alphonso on 26 May 2011, 10:58; edited 1 time in total |
|||
![]() |
|
Feryno 26 May 2011, 10:58
the extended context must be aligned at 40h (because of xstate)
erasing low parts of ymm0-ymm5 is normal as OS destroys xmm0-xmm5 at API, the damage is made when thread calls Sleep, if you replace the call to sleep with jmp $-2 I'm sure the registers stay intact the contextflag must be really CONTEXT_AMD64 or 40h which is contradictory with SDK 7.1 where the mask should be ... or 20h (the mask with 20h is refused by InitializeContext, only mask with 40h passes InitializeContext) it is possible to produce only 1 exe capable to handle both situations with and without AVX by simple design no matter old CPU or old OS version: Code: lea rdx,[string_InitializeContext] mov rcx,[hModule] call [GetProcAddress] mov [InitializeContext_address],rax ..... lea r9,[BytesRequired] lea r8,[AlignedContext] mov edx,100040h lea rcx,[context] mov rax,[InitializeContext_address] or rax,rax jz fall_back_to_legacy_way ; skip AVX and go to call GetThreadContext call rax ... data section string_InitializeContext db 'InitializeContext',0 InitializeContext_address dq ? |
|||
![]() |
|
tthsqe 26 May 2011, 16:17
Good stuff. Everything is working on Vista AND Win7 (using the kernel functions with 40h). I haven't applied the hotfix though. I'll post the AVX compatible fdbg when I polish the disassembler and it becomes clear if we need to use 20h or 40h.
Debugging the disassembler for the debugger in itself does not work very well. ![]() |
|||
![]() |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.