flat assembler
Message board for the users of flat assembler.

Index > Projects and Ideas > Transparent CPU instruction set emulation

Author
Thread Post new topic Reply to topic
Matrix



Joined: 04 Sep 2004
Posts: 1169
Location: Overflow
Matrix 14 Dec 2025, 03:29
Hello friends!

I have stumbled upon a new problem nowadays:
Question Let's say that you have a specific CPU from 2012 that lacks the specific AVX instruction set that a new program requires to run,

Idea Your options:

    Find another program
    Buy a new CPU
    Recompile the program if you have the source
    Emulate the missing instruction, using for example: INTEL SDE
    Cry in the corner
Post 14 Dec 2025, 03:29
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20795
Location: In your JS exploiting you and your system
revolution 14 Dec 2025, 03:49
This is what a lot of old software would do when the FPU was absent.

Catch illegal instruction exceptions, emulate them, continue execution.

It isn't hard, but it is tedious to get it all working correctly and seamlessly.
Post 14 Dec 2025, 03:49
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1169
Location: Overflow
Matrix 14 Dec 2025, 04:12
Hey revolution!

Yeah, And it would be very cool if the kernel could do this automatically, for missing instructions if required.

I would also like to show a very cool opensource project related:
The Open-source PlayStation 3 Emulator

It has evolved over the recent years significantly.
Post 14 Dec 2025, 04:12
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20795
Location: In your JS exploiting you and your system
revolution 14 Dec 2025, 04:53
Matrix wrote:
Yeah, And it would be very cool if the kernel could do this automatically, for missing instructions if required.
Which kernel, which OS, is this intended for? I don't know of any existing kernel of any OS that will do this.

The most likely existing solution is a library that can be linked to. Maybe Linux has a library that can be injected with LD_PRELOAD? But that would require that the code uses the "standard" C interpreter and libraries.
Post 14 Dec 2025, 04:53
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1169
Location: Overflow
Matrix 14 Dec 2025, 16:09
I was thinking about linux.

But right now it should be possible to start a cmd.exe or terminal using sde (as root) that will "emulate" instruction sets for the programs started within.

And decompile/recompile type emulation should be very fast, especially if it only replaces less than 1% of the program code, AVX for example with compatible instructions...
Post 14 Dec 2025, 16:09
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20795
Location: In your JS exploiting you and your system
revolution 14 Dec 2025, 19:25
Emulation code doesn't need to be root. An emulator program could start the target as a child process and "debug" it to catch illegal instruction exceptions.
Post 14 Dec 2025, 19:25
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4325
Location: vpcmpistri
bitRAKE 15 Dec 2025, 08:40
Matrix wrote:
I have stumbled upon a new problem nowadays:
Question Let's say that you have a specific CPU from 2012 that lacks the specific AVX instruction set that a new program requires to run
New instruction length makes patch-in-place extremely easy. First illegal instruction exception is costly - requiring detection and routing to emulation code. Subsequent execution of an instruction executes emulation code.

This is a low-effort solution with very good performance. Furthermore, it can be developed incrementally - instruction-by-instruction - once the initial system overhead of capturing illegal instructions is complete.

In Windows, capturing undefined instructions (#UD) of another process has several options: In-Process VEH, Windows Hypervisor Platform (WHP) / Virtualization, Windows Debugging API, and Kernel Mode Driver. Your choice might be guided by other goals, but injecting a DLL into the target process which calls AddVectoredExceptionHandler is very easy - no context switches are needed and full control within the address space with little effort (VirtualProtect). If instructions to replace are shorter than 5-bytes your code will need to become more creative: trampolines, code caves, intelligent patching, ... (these are always a one time cost per instruction - regardless of complexity)
Code:
format PE64 NX GUI 6.0
entry start

include 'win64ax.inc'

section '.data' data readable writeable
    ; [!] IMPORTANT: Always use ABSOLUTE PATHS for injection. 
    ; The target process might have a different Current Working Directory.
    target_path db 'C:\Path\To\TargetApp.exe', 0
    dll_path    db 'C:\Path\To\YourVEH.dll', 0 
    dll_len     = $ - dll_path

    si          STARTUPINFO
    pi          PROCESS_INFORMATION
    
    remote_mem  dq ?     ; Address of string in target process
    h_inj_th    dq ?     ; Handle to injection thread
    
    k32_lib     db 'kernel32.dll', 0
    loadlib_fn  db 'LoadLibraryA', 0

section '.code' code readable executable

start:
    ; Initialize startup info size (Critical for CreateProcess)
    mov     [si.cb], sizeof.STARTUPINFO

    ; 1. Launch Target in SUSPENDED state
    ;    The main thread halts at RtlUserThreadStart (before any app code runs).
    invoke  CreateProcessA, 0, target_path, 0, 0, FALSE, \
            CREATE_SUSPENDED, 0, 0, si, pi
    
    test    rax, rax
    jz      .exit

    ; 2. Allocate memory in target for the DLL path
    invoke  VirtualAllocEx, [pi.hProcess], 0, dll_len, \
            MEM_COMMIT or MEM_RESERVE, PAGE_READWRITE
    
    mov     [remote_mem], rax
    test    rax, rax
    jz      .cleanup

    ; 3. Write the DLL path string into the target
    invoke  WriteProcessMemory, [pi.hProcess], [remote_mem], \
            dll_path, dll_len, 0

    ; 4. Resolve LoadLibraryA address
    ;    (Safe assumption: Kernel32 is mapped at the same address in all processes)
    invoke  GetModuleHandleA, k32_lib
    invoke  GetProcAddress, rax, loadlib_fn
    
    ; 5. Create Remote Thread to execute LoadLibraryA("YourVEH.dll")
    invoke  CreateRemoteThread, [pi.hProcess], 0, 0, \
            rax,            \ ; lpStartAddress (LoadLibraryA)
            [remote_mem],   \ ; lpParameter (Path String)
            0, 0
            
    mov     [h_inj_th], rax
    test    rax, rax
    jz      .cleanup

    ; 6. Wait for Injection Thread to finish
    ;    This ensures DllMain runs and installs the VEH *before* we resume.
    invoke  WaitForSingleObject, [h_inj_th], INFINITE
    invoke  CloseHandle, [h_inj_th]

    ; 7. Resume the Main Thread
    ;    The target now runs with your VEH active from the very first instruction.
    invoke  ResumeThread, [pi.hThread]

.cleanup:
    invoke  CloseHandle, [pi.hProcess]
    invoke  CloseHandle, [pi.hThread]

.exit:
    invoke  ExitProcess, 0

section '.idata' import data readable writeable
    library kernel32,'KERNEL32.DLL'
    include 'api\kernel32.inc'    
* Mostly, assuming the application isn't trying to make your job difficult - which is also easy to do, imho.

_________________
¯\(°_o)/¯ AI may [not] have aided with the above reply.
Post 15 Dec 2025, 08:40
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1169
Location: Overflow
Matrix 17 Dec 2025, 19:43
Just a tip guys:
https://www.techpowerup.com/cpu-specs/xeon-w3690.c929
https://cpulist.com/lga-1366-cpu-list/

These kind of things even in dual-cpu configuration are getting dirt cheap now, in case you want a workstation for cheap. Server parks are throwing these away. Workstations, servers...
The down-side is that they do not support modern instruction sets, and new programs are being made that will not work on them for this reason. They usually support (and often have) 128GB, 256GB, 384GB triple/quad channel DDR3 ECC memory.
Post 17 Dec 2025, 19:43
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.