flat assembler
Message board for the users of flat assembler.
![]() Goto page Previous 1, 2 |
Windows 11 Upgrade | ||||||||||||||
|
||||||||||||||
Total Votes : 13 |
Author |
|
Feryno 24 Oct 2021, 09:16
sleepsleep wrote:
the most performance expensive are vm exits (and then vmentries) so the fastest hypervisor is the one which enables less vm exits, under Intel there are unconditional vm exits like cpuid and then conditional vm exits like certain instructions (modifications of some MSRs and CRx etc - here a possibility to optimize) moreover, hyper-v uses the same prologue and epilogue for all vm exits which executes > 100 instructions which is overkill e.g. for the cpuid instruction vm exit which could be handled in 20 asm instructions including prologue and epilogue I always measure cpuid and syscall cycles before starting hv and when hv running, it looks like this: before starting hv: cpuid CPU cycles: Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz cpu00 00000000000004F6 cpu01 00000000000001BD cpu02 00000000000001BD cpu03 00000000000001AD cpu04 00000000000001BD cpu05 00000000000001AD cpu06 00000000000001BE cpu07 00000000000001BD syscall CPU cycles: Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz cpu00 000000000000080F cpu01 000000000000078A cpu02 0000000000000717 cpu03 0000000000000559 cpu04 00000000000007BC cpu05 000000000000078B cpu06 0000000000000717 cpu07 00000000000005EE after hv running: cpuid CPU cycles: Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz cpu00 0000000000000FBA cpu01 0000000000000E70 cpu02 0000000000000EB2 cpu03 0000000000000E80 cpu04 0000000000000ED3 cpu05 0000000000000E91 cpu06 0000000000000EC3 cpu07 0000000000000E91 syscall CPU cycles: Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz cpu00 0000000000001FB6 cpu01 0000000000002573 cpu02 00000000000021D6 cpu03 00000000000021E7 cpu04 00000000000021E7 cpu05 0000000000002111 cpu06 0000000000002194 cpu07 00000000000021E7 the cycles are measured at every cpu core the syscall / sysretq instruction pair is measured also because this is the purpose of developing this hypervisor (monitoring usermode apps which kernelmode services they call) most of the increase of the cpuid cpu cycles (from 1B0h to 0E00h) are due to vm exit and vm entry, handling the cpuid is done by circa 20 FASM instructions with very limited prologue/epilogue (no need to push/pop registers like rbp, rsi, rdi, r8...r15 etc) from that you can approximate how much cost vm exit / vm entry itself the CPU seems to be some strange but it was the first one with VMCS shadowing feature which I needed in development of nested hypervisoring, I also need always a motherboard with serial port to send information during debugging (server MB or few of workstation MB have COM port) when hyper-v is running you can't start your hypervisor as a driver from running OS, the only way is starting before OS (using UEFI or BIOS to load your hv) and then you need nesting so your hypervisor is running as a parent and hyper-v as a child (hyper-v does not know that) but that grows your hypervisor significantly (executable from 10 kB to 35 kB) + you need also to handle ACPI S3 sleep in this case (so after resume from sleep your hv is started again as a first and only then the hyper-v with OS resumes) sleepsleep wrote:
it works well when your windows (and then OS loader too) is installed via BIOS to MRB partitioned disk... you may try it when booting from UEFI but I suppose in such a case CSM is not loaded so the 512 boot sectors cannot call 16 bit interrupts (e.g. to access disk), I use this only because I sometimes have >5 operating systems installed in one hdd (MBR formatted hdd and bios booting them) and this is the quickest way to start them all from 1 boot menu these are steps I do for adding 512 bytes MBR/BIOS loader into win boot menu, the a00.sdb file is an image of boot sector and is stored in the root directory of the boot partition of the win: Code: set ENTRY_GUID={46595952-454E-4F50-4747-554944FEDCBA} bcdedit -create %ENTRY_GUID% -d "hypervisor" -application BOOTSECTOR bcdedit -set %ENTRY_GUID% device partition=%SYSTEMDRIVE% bcdedit -set %ENTRY_GUID% path \a00.sdb bcdedit -displayorder %ENTRY_GUID% -addlast bcdedit -timeout 5 you can also create the entry without defining its GUID and subsequently use the GUID which bcdedit generates, no need to use the above exact number for the GUID btw because such my hdd usually has 3 primary partitions, 1 extended at which few other logical partitions (e.g. linux does not protest to be installed there and be booted from logical partition on extended partition) the win installer does not create separate boot partition and everything is nicely stored in one partition (so bootmgr file and BOOT directory are in the same partition with WINDOWS directory, the installer does not have any space where to create these separate partitions) sleepsleep wrote:
we will see how long will this "feature" stay there here a source code for a program which measures cpuid cycles at every cpu, you can test how much cycles costs the execution of cpuid when no hyper-v running and compare with a value when running, the reported cpu cycles are not exactly only cpuid but also few cycles cost of the rdtsc instruction, hyper-v may or may not enable the rdtsc instruction to cause vm exit too which may further increase the reported value Code: format PE64 console at 100000000h on 'nul' entry start include '%fasminc%\win64a.inc' section '.code' code readable executable start: push rbx rbp sub rsp,8*(4+7) lea r8,[rsp+8*(4+5)] ; lpSystemAffinityMask lea rdx,[rsp+8*(4+4)] ; lpProcessAffinityMask or rcx,-1 ; hProcess = current process call [GetProcessAffinityMask] or eax,eax jz exit mov rbp,[rsp+8*(4+5)] ; SystemAffinityMask (all CPUs should run) mov rdx,rbp ; lpSystemAffinityMask or rcx,-1 ; hProcess = current process call [SetProcessAffinityMask] or eax,eax jz exit xor ebx,ebx ; CPU counter align 10h L0: btr rbp,rbx jnc L9 lea rax,[rsp+8*(4+3)] mov [rsp+8*(4+1)],rax mov dword [rsp+8*(4+0)],CREATE_SUSPENDED mov r9,rbx ; argument for new thread = CPU number lea r8,[per_cpu_thread] xor edx,edx xor ecx,ecx call [CreateThread] or rax,rax jz L9 mov [rsp+8*(4+4)],rax xor edx,edx bts rdx,rbx mov rcx,[rsp+8*(4+4)] call [SetThreadAffinityMask] or eax,eax jz L8 mov rcx,[rsp+8*(4+4)] call [ResumeThread] cmp eax,-1 jz L8 or edx,-1 ; INFINITE timeout mov rcx,[rsp+8*(4+4)] call [WaitForSingleObject] sub eax,WAIT_FAILED jnz L8 lea rdx,[cpuid_cycles] lock and [rdx+rbx*8],rax L8: mov rcx,[rsp+8*(4+4)] call [CloseHandle] L9: inc ebx cmp bl,64 jc L0 lea rdi,[cpuid_cycles_msg_ascii] cld mov eax,80000002h cpuid stosd xchg ebx,eax stosd xchg ecx,eax stosd xchg edx,eax stosd mov eax,80000003h cpuid stosd xchg ebx,eax stosd xchg ecx,eax stosd xchg edx,eax stosd mov eax,80000004h cpuid stosd xchg ebx,eax stosd xchg ecx,eax stosd xchg edx,eax stosd mov ecx,3*4*4 @@: cmp byte [rdi-1],' ' jnbe @f dec rdi loop @b @@: mov al,0Dh stosb mov al,0Ah stosb lea rsi,[cpuid_cycles] lea rbx,[hex_trans] xor edx,edx ; counter align 10h convert_L0: cmp qword [rsi],0 jz convert_L8 mov eax,'cpu' stosd mov al,dl shr al,4 xlat [rbx] mov [rdi-1],al mov al,dl and al,1111b xlat [rbx] stosb mov al,' ' stosb mov r8,[rsi] repeat 16 rol r8,4 mov al,r8b and al,1111b xlat [rbx] stosb end repeat mov al,0Dh stosb mov al,0Ah stosb convert_L8: lodsq ; rsi+8 inc edx cmp dl,64 jc convert_L0 mov byte [rdi-2],0 push STD_OUTPUT_HANDLE pop rcx call [GetStdHandle] push rax pop rcx if INVALID_HANDLE_VALUE = -1 inc rax else sub rax,INVALID_HANDLE_VALUE end if jz exit and qword [rsp+8*(4+0)],0 lea r9,[rsp+8*(4+1)] lea r8d,[rdi-2] lea rdx,[cpuid_cycles_msg_ascii] sub r8d,edx ; mov rcx,rcx call [WriteFile] ; or eax,eax ; jz exit ; cmp [rsp+8*(4+1)],size ; jnz ... exit: xor ecx,ecx call [ExitProcess] xor eax,eax add rsp,8*(4+7) pop rbp rbx ret align 10h per_cpu_thread: push rbx rbp rsi rdi ; rcx = parameter passed to thread = cpu number mov rbp,rcx ; we should disable interrupts or elevate IRQL, but that is not possible in ring3 ; at worst something will interrupt the following code and reported CPUID cycles will be more than without interruption rdtsc mov esi,eax mov edi,edx xor eax,eax cpuid rdtsc sub eax,esi sbb edx,edi lea rcx,[cpuid_cycles] mov [rcx+rbp*8+0],eax mov [rcx+rbp*8+4],edx pop rdi rsi rbp rbx xor eax,eax ret align 10h hex_trans db '0123456789ABCDEF' section '.bss' readable writeable cpuid_cycles: times 64 dq 0 ; upto 64 CPUs supported cpuid_cycles_msg_ascii rb 3*4*4+2 rb 64*25 ; db 'cpu00 FFFFFFFFFFFFFFFF', 0Dh, 0Ah ; ... ; db 'cpu3F 0000000000000000',0 section '.idata' import data readable writeable dd 0,0,0, RVA kernel_name, RVA kernel_table dd 0,0,0, 0, 0 kernel_table: CloseHandle dq RVA _CloseHandle CreateThread dq RVA _CreateThread ExitProcess dq RVA _ExitProcess GetModuleHandleA dq RVA _GetModuleHandleA GetProcessAffinityMask dq RVA _GetProcessAffinityMask GetStdHandle dq RVA _GetStdHandle ResumeThread dq RVA _ResumeThread SetProcessAffinityMask dq RVA _SetProcessAffinityMask SetThreadAffinityMask dq RVA _SetThreadAffinityMask WaitForSingleObject dq RVA _WaitForSingleObject WriteFile dq RVA _WriteFile dq 0 kernel_name db 'KERNEL32.DLL',0 _CloseHandle db 0,0,'CloseHandle',0 _CreateThread db 0,0,'CreateThread',0 _ExitProcess db 0,0,'ExitProcess',0 _GetModuleHandleA db 0,0,'GetModuleHandleA',0 _GetProcessAffinityMask db 0,0,'GetProcessAffinityMask',0 _GetStdHandle db 0,0,'GetStdHandle',0 _ResumeThread db 0,0,'ResumeThread',0 _SetProcessAffinityMask db 0,0,'SetProcessAffinityMask',0 _SetThreadAffinityMask db 0,0,'SetThreadAffinityMask',0 _WaitForSingleObject db 0,0,'WaitForSingleObject',0 _WriteFile db 0,0,'WriteFile',0 |
|||
![]() |
|
sleepsleep 24 Oct 2021, 16:37
these cycles thing are far away from what am capable to digest,
the bcdedit also something i didn't use, let me try this on my vm, thanks. |
|||
![]() |
|
macgub 25 Oct 2021, 13:55
Code: lea rax,[rsp+8*(4+3)] mov [rsp+8*(4+1)],rax mov dword [rsp+8*(4+0)],CREATE_SUSPENDED mov r9,rbx ; argument for new thread = CPU number lea r8,[per_cpu_thread] xor edx,edx xor ecx,ecx call [CreateThread] or rax,rax jz L9 Code: per_cpu_thread: push rbx rbp rsi rdi ; rcx = parameter passed to thread = cpu number mov rbp,rcx ;.... So as I see we can pass parameter to thread. Via rbx, rcx, r9 ? Its not clear for me. So can I pass parameter in 32 bit to proc via CreateThread? Were it resides? In 32 parameters are passed via stack so maybye: Code: dword[ebp+4]? dword[ebp+8]? Thanks Feryno for your code, it gives me some hope to solve a few problems I have in a clean, decent way. [/b] |
|||
![]() |
|
macgub 25 Oct 2021, 14:43
ok, I get kick to check. x32dbg say true:
Code: invoke CreateThread,NULL,NuLL,ThreadFunction,\ 0xa33,NORMAL_PRIORITY_CLASS,\ [ThreadID0] 0xa33 is a passed parameter. In ebx. Fun ![]() |
|||
![]() |
|
Picnic 27 Oct 2021, 11:47
donn wrote: Windows 11 Upgrade: Yay or Nay and why?. What? But I just installed Windows 10. No, not for now i guess. |
|||
![]() |
|
wizgogo 28 Oct 2021, 03:25
The ability to install android apps is Yay but the minimum system requirements is Nay.
|
||||||||||
![]() |
|
sinsi 07 Nov 2021, 02:44
How the #!*% can you break something so basic? The LEDs on my keyboard don't light up. The lock keys work, just the LED doesn't light up.
Windows 11 Pro Insider Preview, dev version, build 22494.1000 Of course I can't revert to Windows 10, I have to do a full reinstall...sucked in me ![]() |
|||
![]() |
|
revolution 07 Nov 2021, 05:53
sinsi wrote: How the #!*% can you break something so basic? They fired all the testers and use paying customers to figure our their problems for them? They are so busy making code to grab ever more of your data that stuff like making the system work for you are not a priority any more? All of the above? ![]() |
|||
![]() |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.