flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
Feryno 19 Nov 2024, 16:31
Hi I abandoned 32 bits about 15-20 years ago.
But in the proc DispatchControl you are using ESI register. Shouldn't you protect and restore it by push esi at the proc prologue \ pop esi at the proc epilogue? Or at least you can try to use another register like EDX. You are also touching EBX register without protecting/restoring it. If you have signature issues (vista and newer versions) you can press F8 during boot and choose Disable driver signature enforcement (and on latest w10, w11 disable also something like memory protection IIRC I forgot its exact name). If you need an install image of XP x64 let me know, I like it as it boots extremely fast and does not yet include driver signatures. 32 bits are almost dead today, I suggest you to switch to x64. |
|||
![]() |
|
Core i7 19 Nov 2024, 19:54
@Feryno, i have Win7 x64, but it gives BSOD. That's why tests are on XP x32.
I sign the driver with the DSEO program, it immediately sets the options bcdedit as in the picture. For x64, we align the stack "sub rsp, 8" at the input, is it necessary to do this for drivers? I tried inside DriverEntry(), but then 2 parameters are ahead of me like that. Or is the stack already aligned in the kernel, unlike userSpace? Code: push rbp mov rbp,rsp sub rsp,8 ...... The archive contains my source code for x64, and the signed driver. If it's not too much trouble, maybe you could check it yourself? The archive also contains a crash dump from the MiniDump folder. Thank you!
|
||||||||||||||||||||
![]() |
|
Core i7 19 Nov 2024, 20:48
Tried pusha/popa - KeGetCurrentIrql() on XP x32 still stops.
Now I looked in the debugger, it turns out Hal works via acpi. Is this normal? Code: kd> !lmi hal.dll Loaded Module Info: [hal.dll] Module: halaacpi <--------// Base Address: 806ec000 Image Name: halaacpi.dll Machine Type: 332 (I386) Time Stamp: 47f3693c Wed Apr 02 15:08:44 2008 Size: 20380 CheckSum: 226f6 Characteristics: 212e Debug Data Dirs: Type Size VA Pointer CODEVIEW 25, 768, 768 RSDS - GUID: {CBF22E4D-CBE4-4036-8359-B5229D03A3AB} Age: 1, Pdb: halaacpi.pdb Image Type: MEMORY - Image read successfully from loaded memory. Symbol Type: PDB - Symbols loaded successfully from symbol server. c:\symbols\halaacpi.pdb\CBF22E4DCBE440368359B5229D03A3AB1\halaacpi.pdb Load Report: public symbols, not source indexed c:\symbols\halaacpi.pdb\CBF22E4DCBE440368359B5229D03A3AB1\halaacpi.pdb kd> |
|||
![]() |
|
Feryno 20 Nov 2024, 06:11
Hi, 32 bit version gets IRQL from APIC page, as you disassembled:
Quote: 806ee2e8 a18000feff mov eax,dword ptr ds:[FFFE0080h] on x64 stack things are little more complicated - yes you need to align stack at 10h because kernel uses XMM registers with faster aligned instructions like movdqa [rsp+30h],xmm0 another thing with stack - you have to reserve 4 more qwords, I expect during developing early versions of OS all the parameters were passed in the stack (like in 32 bits) but later the developers wanted to speed up things and changed passing first 4 parameters in registers instead (RCX, RDX, R8, R9), so that freed lowest 4 qwords in the stack and that freed space is used by kernel for storing some registers ("red area"), the kernel does things like: Code: krnl_proc: sub rsp,28h mov [rsp+40h],rbx ; save RBX in the "red area" ... mov rbx,[rsp+40h] add rsp,28h ret another thing: prologues like push rsp \ mov rbp,rsp are useless and only waste of processor time and stack space, you do not need to use RBP for accessing stack, x64 allows directly using rsp for accessing stack, x64 procedure changes stack at its prologue (few push instructions and one sub rsp instruction) and then the whole proc does not change the RSP at all (except when executing the call instruction which of course changes RSP) so the RBP register is free to use. So the x64 krnl procedures look like: Code: ; prologue: push rbx rsi rdi sub rsp,60h ; the procedure itself: ... ; epilogue: add rsp,60h pop rdi rsi rbx ret you can learn a lot by disassembling ntoskrnl.exe and any sys files (drivers) when loading your dmp file into kd/windbg and issuing command !analyze -v the interesting findings is here (the stack trace, the same could be obtained by a command k) Code: STACK_TEXT: fffff880`02f96618 fffff800`0297c132 : fffff880`00000000 00000000`0000001c 00000000`00000000 00000000`000007ff : nt!memmove+0x1e5 fffff880`02f96620 fffff800`0297bfeb : fffffa80`02a3f5e0 fffff880`031ae274 fffff880`00000001 fffff8a0`021e3df0 : nt!ObpCaptureObjectName+0x102 fffff880`02f966a0 fffff800`0297fe35 : fffffa80`012bcb40 fffffa80`012b0240 00000000`0000006c fffff800`028f48a8 : nt!ObpCaptureObjectCreateInformation+0x279 fffff880`02f96720 fffff800`028f31be : 00000000`00000000 00000000`00000001 fffff880`031ae274 00000000`00000022 : nt!ObCreateObject+0x75 fffff880`02f96790 fffff880`031ae361 : fffff880`02f96938 00000000`20200001 00000000`0000000d fffff880`031ae316 : nt!IoCreateDevice+0x16e fffff880`02f96910 fffff880`02f96938 : 00000000`20200001 00000000`0000000d fffff880`031ae316 00000000`00000100 : Win64_Legacy+0x361 fffff880`02f96918 00000000`20200001 : 00000000`0000000d fffff880`031ae316 00000000`00000100 00000000`00000000 : 0xfffff880`02f96938 fffff880`02f96920 00000000`0000000d : fffff880`031ae316 00000000`00000100 00000000`00000000 fffff880`031ae28c : 0x20200001 fffff880`02f96928 fffff880`031ae316 : 00000000`00000100 00000000`00000000 fffff880`031ae28c fffffa80`012ea0b8 : 0xd fffff880`02f96930 00000000`00000100 : 00000000`00000000 fffff880`031ae28c fffffa80`012ea0b8 00000000`00000000 : Win64_Legacy+0x316 fffff880`02f96938 00000000`00000000 : fffff880`031ae28c fffffa80`012ea0b8 00000000`00000000 fffff800`02a6e477 : 0x100 the crash is by memmove proc which got wrong input param, certainly a pointer to some unallocated memory, that happened while executing IoCreateDevice which was called from your driver and the call should return to the Win64_Legacy+0x361 (stack trace leaves return addresses in the stack which were made by all call instructions) there is something wrong when you passed input params: Code: .text:000000000010031A sub rsp, 40h .text:000000000010031E mov rcx, [rbp+DriverObject] ; DriverObject .text:0000000000100322 mov rdx, 0 ; DeviceExtensionSize .text:0000000000100329 mov r8, offset DeviceName ; DeviceName .text:0000000000100333 mov r9, 22h ; DeviceType .text:000000000010033A mov qword ptr [rsp+40h+DeviceCharacteristics], 100h ; DeviceCharacteristics .text:0000000000100343 mov qword ptr [rsp+40h+Exclusive], 0 ; Exclusive .text:000000000010034C mov rax, (offset SymbolicLinkName.Buffer+4) .text:0000000000100356 mov [rsp+40h+DeviceObject], rax ; DeviceObject .text:000000000010035B call cs:IoCreateDevice especially these 2 lines look very strange for me (like a mistake) Code: .text:000000000010034C mov rax, (offset SymbolicLinkName.Buffer+4) .text:0000000000100356 mov [rsp+40h+DeviceObject], rax ; DeviceObject it should be something like: lea rax,[rsp+...] ; a pointer to some 1 empty qword in the stack where you receive device object on return from the IoCreateDevice, it should point to this in your source code: Code: pDevObj dq 0 but it points into middle of the previous Code: plinkName UNICODE_STRING so the IoCreateDevice seems to destroy something in the unicode string by writing there DeviceObject btw the unicode string for x64 should look like: Code: struc UNICODE_STRING { .Length rw 1 .MaximumLength rw 1 ; usually size in Length + 2 rd 1 ; padding .Buffer rq 1 ; pointer to string } note the 32 bit version does not use padding and the buffer size is not qword but only dword |
|||
![]() |
|
Core i7 20 Nov 2024, 10:31
Feryno, great job - thanks for the technical explanation!
I tried "analyze -v" in the debugger, but I didn't understand anything. For some reason I didn't pay attention to UNICODE_STRING - now I added "Padding", but the problem remains. And regarding stack alignment at the entry, it turns out that you need to manually implement DriverEntry() without the "proc" macro? Apparently fasm is not designed for x64 mode, when the procedure immediately receives control at the entry point - it is not clear how to align the stack. I tried to align the stack like this - the crash dump still shows a similar "memmove" error: Code: format pe64 wdm native 6.0 at 0x10000 entry @driverEntry ;.... @driverEntry: sub rsp,8 ; align mov rax,[rsp+8] ; rax = i/o manager retAddr mov [rsp],rax ; fix jmp DriverEntry nop align 16 proc DriverEntry, pDrvObj, pRegPath, resvR8, resvR9 mov [pDrvObj], rcx mov [pRegPath],rdx ;...... ret endp It's just inconvenient to access arguments in the code via rsp - it's good when aliases pDrvObject, pRegPath, and others are assigned. Thank you again, there are no more questions for now. |
|||
![]() |
|
Feryno 20 Nov 2024, 13:36
Yes I do it manually, it generates nicer looking code.
No need to jump into entry, here one of my driver in disasm, note all numbers are hexadecimal even they do not end with "h" Code: 0000000000010240: push rbx ; push 1 reg, now the RSP is aligned at 10h 0000000000010241: sub rsp,60 ; reserve some stack space for local variables and input params (5th, 6th, 7th and so on) for kernel calls 0000000000010245: lea rbx,[rcx] ; rbx = pDriverObject ; next few lines were deleted as they are my driver specific ; ... ; only the next is important 000000000001027E: lea r8,[00000000000103A0] ; DriverUnload 0000000000010285: lea r9,[0000000000010430] ; DispatchCreateClose 000000000001028C: lea r10,[0000000000010450] ; DispatchControl 0000000000010293: mov [rbx+68],r8 ; [rbx + DriverObject.DriverUnload] 0000000000010297: mov [rbx+70],r9 ; [rbx + DriverObject.MajorFunction + IRP_MJ_CREATE_OFFSET] 000000000001029B: mov [rbx+00000080],r9 ; [rbx + DriverObject.MajorFunction + IRP_MJ_CLOSE_OFFSET] 00000000000102A2: mov [rbx+000000E0],r10 ; [rbx + DriverObject.MajorFunction + IRP_MJ_DEVICE_CONTROL_OFFSET] 00000000000102A9: lea rdx,[0000000000010330] ; cusDevice_string 00000000000102B0: lea rcx,[rsp+40] ; UNICODE_STRING 00000000000102B5: call qword [00000000000144F0] ; call [RtlInitUnicodeString] 00000000000102BB: lea rax,[rsp+38] ; pointer to 1 qword where we get DeviceObject 00000000000102C0: mov [rsp+30],rax ; 7th param for the call = pointer to DeviceObject 00000000000102C5: and byte [rsp+28],00 ; 6th param for the call = FALSE 00000000000102CA: and dword [rsp+20],00 ; 5th param for the call 00000000000102CF: mov r9d,00000022 ; 4th param = FILE_DEVICE_UNKNOWN 00000000000102D5: lea r8,[rsp+40] ; 3rd param = pointer to UNICODE_STRING which was intitialized previously 00000000000102DA: xor edx,edx ; 2nd param = 0 00000000000102DC: lea rcx,[rbx] ; 1st param = pDriverObject 00000000000102DF: call qword [0000000000014440] ; call [IoCreateDevice] 00000000000102E5: test eax,eax 00000000000102E7: jnz 000000000001031E 00000000000102E9: lea rdx,[0000000000010350] ; cusSymbolicLink 00000000000102F0: lea rcx,[rsp+50] ; pointer to UNICODE_STRING 00000000000102F5: call qword [00000000000144F0] ; call [RtlInitUnicodeString] 00000000000102FB: lea rdx,[rsp+40] ; pointer to UNICODE_STRING initialized previously 0000000000010300: lea rcx,[rsp+50] 0000000000010305: call qword [0000000000014448] ; call [IoCreateSymbolicLink] 000000000001030B: test eax,eax 000000000001030D: jz 000000000001031E 000000000001030F: mov ebx,eax ; save error code 0000000000010311: mov rcx,[rsp+38] ; DeviceObject 0000000000010316: call qword [0000000000014450] ; call [IoDeleteDevice] 000000000001031C: mov eax,ebx ; error code into eax to return it to the kernel 000000000001031E: add rsp,60 0000000000010322: pop rbx 0000000000010323: ret that's the whole DriverEntry procedure with everything necessary I can try to rewrite that into something more flexible... I looked into your driver again and also these offsets are not aligned: Code: .text:0000000000100476 mov qword ptr [rsi+64h], offset sub_1004B0 .text:000000000010047E mov qword ptr [esi+6Ch], offset sub_100520 .text:0000000000100487 mov qword ptr [esi+7Ch], offset sub_100590 .text:0000000000100490 mov qword ptr [esi+0DCh], offset sub_100600 the offsets should be 68h, 70h, 80h, 0E0h, also the 2nd, 3rd, 4th should be accessed by RSI as the first one, not by ESI which is 64 bit pointer truncated to 32 bits |
|||
![]() |
|
Feryno 20 Nov 2024, 14:22
I created you a very simple x64 driver which you can modify as you like. I hope it will help you.
|
|||||||||||
![]() |
|
Core i7 20 Nov 2024, 15:56
Feryno wrote: the offsets should be 68h, 70h, 80h, 0E0h yes, that's because UNICODE_STRING was missing 4 bytes in DRIVER_OBJECT struct. Now the offsets are correct. Also changed esi to rsi. Thank you very much for your example - very useful! |
|||
![]() |
|
Core i7 22 Nov 2024, 06:28
In general, I managed to write a framework for the x64 driver.
It turns out that there is no need to align the stack "sub rsp,8" at the input, because in dll/sys only whole "proc" procedures always receive control, and there are usually arguments. As a result, the prologue will have "push rbp", which is equivalent to "sub rsp,8". When calling the next procedure, everything will repeat, and the stack will be correct until the "ret" instruction. In x64 mode, the "invoke" macro counts the number of parameters for the function, and will allocate a frame of the required size on the stack itself. For example, if IoCreateDevice() has 7 parameters, then fasm will put "sub rsp,8*7" at the input, and "add rsp,8*7" at the output. If we want to get rid of constant sub/add, we can place all function calls inside the "frame/endf" macros, then the frame will be global for the entire procedure, for example: Code: ;//============= Default compiling .text:00000000000102C0 push rbp <-------- sub rsp,8 .text:00000000000102C1 mov rbp, rsp .text:00000000000102C4 mov [rbp+arg_0], rcx .text:00000000000102C8 mov [rbp+arg_8], rdx .text:00000000000102C8 .text:00000000000102CC sub rsp, 20h .text:00000000000102D0 mov rcx, offset DestinationString ; DestinationString .text:00000000000102DA mov rdx, offset SourceString ; "\\Device\\TestDrv" .text:00000000000102E4 call cs:RtlInitUnicodeString .text:00000000000102EA add rsp, 20h .text:00000000000102EE sub rsp, 20h .text:00000000000102F2 mov rcx, offset stru_102A0 ; DestinationString .text:00000000000102FC mov rdx, offset aDosdevicesTest ; "\\DosDevices\\TestDrv" .text:0000000000010306 call cs:RtlInitUnicodeString .text:000000000001030C add rsp, 20h ;//=================== Frame/Endf macro .text:00000000000102CC sub rsp, 20h .text:00000000000102D0 mov rcx, offset DestinationString ; DestinationString .text:00000000000102DA mov rdx, offset SourceString ; "\\Device\\TestDrv" .text:00000000000102E4 call cs:RtlInitUnicodeString .text:00000000000102EA mov rcx, offset stru_102A0 ; DestinationString .text:00000000000102F4 mov rdx, offset aDosdevicesTest ; "\\DosDevices\\TestDrv" .text:00000000000102FE call cs:RtlInitUnicodeString .text:0000000000010304 add rsp, 20h Here is the code of my driver - it works fine, only DispatchUnload fails. On x32 this design worked, but on x64 for some reason there is an error. Failure to pass parameters via mov/lea, but there is no result. But what is interesting is that in x64 mode the KeGetCurrentIrql() function causes a BSOD on the virtual machine, so I read directly from the CR8 register, where APIC.TPR saves the current value of Irql. If you disassemble KeGetCurrentIrql(), it also simply has "mov rax,cr8", but the function call fails (I don't understand anything). It remains to figure out DispatchUnload(): Code: format pe64 wdm native 6.0 at 0x10000 entry DriverEntry include 'win64ax.inc' include 'equates\wdm64.inc' ;//****** STRING ************* section '.data' data readable writeable notpageable DevName du '\Device\TestDrv',0 linkName du '\DosDevices\TestDrv',0 align 16 pDevName UNICODE_STRING plinkName UNICODE_STRING pDevObj dq 0 Status dq 0 ;//****** MAIN *************** section '.text' code readable executable notpageable proc DriverEntry pDrvObject, pRegPath mov [pDrvObject],rcx mov [pRegPath], rdx invoke RtlInitUnicodeString,pDevName,DevName invoke RtlInitUnicodeString,plinkName,linkName cinvoke DbgPrint,<'Init string OK!',0> invoke IoCreateDevice,[pDrvObject],0,pDevName,\ FILE_DEVICE_UNKNOWN,\ FILE_DEVICE_SECURE_OPEN,0,pDevObj or rax,rax je @f mov [Status],rax cinvoke DbgPrint,<'CreateDevice error: %I64x',0>,rax jmp @exit @@: cinvoke DbgPrint,<'CreateDevice OK! %I64x',0>,[pDevObj] invoke IoCreateSymbolicLink,plinkName,pDevName or rax,rax je @f mov [Status],rax cinvoke DbgPrint,<'Create link ERROR! %I64x',0>,rax jmp @exit @@: cinvoke DbgPrint,<'Create link OK!',10,'--------',0> ;// Register callbacks push rsi r10 r11 r12 r13 lea r10,[CodebyUnload] lea r11,[CodebyCreate] lea r12,[CodebyClose] lea r13,[CodebyControl] mov rsi,[pDrvObject] mov [rsi + DRIVER_OBJECT.DriverUnload] ,r10 mov [rsi + DRIVER_OBJECT.IRP_MJ_CREATE],r11 mov [rsi + DRIVER_OBJECT.IRP_MJ_CLOSE] ,r12 mov [rsi + DRIVER_OBJECT.IRP_MJ_DEVICE_CONTROL],r13 pop r13 r12 r11 r10 rsi mov [Status],STATUS_SUCCESS @exit: mov rax,[Status] ret endp ;//****************************************** ;//*********** Callback array ************* ;//****************************************** align 16 proc CodebyUnload pDrvObject mov [pDrvObject],rcx invoke RtlInitUnicodeString,plinkName,linkName invoke IoDeleteSymbolicLink,plinkName mov rbx,[pDrvObject] mov rcx,[rbx + DRIVER_OBJECT.DeviceObject] invoke IoDeleteDevice,rcx cinvoke DbgPrint,<'Unload driver OK!',0> mov eax,STATUS_SUCCESS ret endp ;//---------------- align 16 proc CodebyCreate pDevObj, pIrp mov [pDevObj],rcx mov [pIrp], rdx cinvoke DbgPrint,<'Dispatch Create! IRP: %I64x',10,' ',0>,[pIrp] mov rax,[pIrp] mov qword[rax + _IRP.IoStatus],STATUS_SUCCESS ; Status mov qword[rax + _IRP.IoStatus+8],0 ; Information invoke IoCompleteRequest,[pIrp],IO_NO_INCREMENT mov eax,STATUS_SUCCESS ret endp ;//---------------- align 16 proc CodebyClose pDevObj, pIrp mov [pDevObj],rcx mov [pIrp], rdx cinvoke DbgPrint,<'Dispatch Close! IRP: %I64x',10,'--------',0>,[pIrp] mov rax,[pIrp] mov qword[rax + _IRP.IoStatus],STATUS_SUCCESS ; Status mov qword[rax + _IRP.IoStatus+8],0 ; Information invoke IoCompleteRequest,[pIrp],IO_NO_INCREMENT mov eax,STATUS_SUCCESS ret endp ;//---------------- align 16 proc CodebyControl pDevObj, pIrp ; 0x220004 mov [pDevObj],rcx mov [pIrp], rdx push rsi rbx mov rsi,[pIrp] mov rsi,[rsi + _IRP.CurrentStackLocation] lea rsi,[rsi + IO_STACK_LOCATION.Parameters] mov rbx,[rsi + PARAM_IO_CONTROL.IoControlCode] mov rax,cr8 ; invoke KeGetCurrentIrql cinvoke DbgPrint,<'Dispatch IoControl',10,\ 'IOCTL: %08x IRP: %I64x IRQL: %d',10,' ',0>,rbx,[pIrp],rax pop rbx rsi mov rax,[pIrp] mov qword[rax + _IRP.IoStatus],STATUS_SUCCESS ; Status mov qword[rax + _IRP.IoStatus+8],0 ; Information invoke IoCompleteRequest,[pIrp],IO_NO_INCREMENT mov eax,STATUS_SUCCESS ret endp ;//************************** section '.idata' import data readable writeable library ntoskrnl,"ntoskrnl.exe" include 'api\ntoskrnl.inc' ;//************************** section '.reloc' fixups data readable discardable if $-$$ dd 0,8 end if Feryno, thank you very much for helping me figure this all out! My error was that when registering the Callback array inside DriverEntry(), I passed the procedure addresses via "mov", but "lea" was needed. Therefore, I received relative offsets inside the *.sys file, without taking into account the base in memory. I saw this in the program WinObjExp64. After loading the driver into the kernel, it can conveniently display the already filled structures DriverObject, DeviceObject and others (in WinDbg this is not very clear). And here is the user's software that makes a request to the driver with the code IOCTL=0x00220004 (DevUnknown, Operation=1, Buffered): Code: format pe64 console include 'win64ax.inc' entry start ;//----------- .data devName db '\\.\TestDrv',0 devHndl dq 0 inBuff dq 0 outBuff rb 1024 retSize dq 0 buff db 0 ;//----------- section '.code' code readable executable start: sub rsp,8 invoke SetConsoleTitle,<'*** DriverTest ***',0> invoke CreateFile,devName,GENERIC_READ,FILE_SHARE_READ,0,OPEN_EXISTING,0,0 mov [devHndl],rax cmp rax,-1 jnz @f cinvoke printf,'Open error!' jmp @exit @@: invoke DeviceIoControl,rax,0x220004,\ inBuff,8,outBuff,1024,retSize,0 @exit: invoke CloseHandle,[devHndl] cinvoke _getch cinvoke exit, 0 ;//----------- section '.idata' import data readable library msvcrt,'msvcrt.dll',kernel32,'kernel32.dll' include 'api\msvcrt.inc' include 'api\kernel32.inc'
|
||||||||||
![]() |
|
revolution 22 Nov 2024, 08:14
Core i7 wrote: It turns out that there is no need to align the stack "sub rsp,8" at the input, because in dll/sys only whole "proc" procedures always receive control, and there are usually arguments. As a result, the prologue will have "push rbp", which is equivalent to "sub rsp,8". |
|||
![]() |
|
Core i7 23 Nov 2024, 04:29
It turns out that DispatchUnload returned an error because of the "WDM" option when specifying the format. It sets DllCharacteristicts=0x2000 (WdmDriver) in the PE file header. It is reset for system drivers. Now my x64 driver is unloaded. So you just need to specify it like this (I don't know why I added WDM there):
Code: format pe64 native 6.0 at 0x10000 |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.