flat assembler
Message board for the users of flat assembler.

flat assembler > Windows > [fasmg] Revisiting the minimal PE64

Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
randall's recent post had me diving into the details of Crinkler and some of my older work (here and elsewhere). Hadn't used Crinkler in many years - it mostly crashed on 64-bit back then. Today it seems very stable and improved.

So, this is the first proof of concept - two other parts are needed to create a general framework: the hash collision avoidance system, and a compression algorithm. But this first part works:
Code:
format binary as 'exe'  ; legacy extension specifier
include 'cpu\x64.inc'   ; 64-bit code
use64                   ; set mode

macro align? scale
        while ($-$$) and (scale-1)
                rb 1
        end while
end macro

; zero section stub (minimal requirements with padding)
BASE = $000000
if BASE = 0 ; $10000 default on zero
        IMAGEBASE = $10000
else
        IMAGEBASE = BASE
end if
org IMAGEBASE

Entry:
        db "MZ"         ; pop r10
        jmp _Entry      ; db ?,?                ;

        dw "PE",0,$8664,0 ; x86-64, no sections
rb 14
        dw $102         ; IMAGE_FILE_EXECUTABLE_IMAGE | IMAGE_FILE_32BIT_MACHINE
        dw $20B         ; IMAGE_NT_OPTIONAL_HDR64_MAGIC
_Entry:
        jmp __Entry
        rb 12

        dd Entry - IMAGEBASE    ; 002C  start execution here
        dd ?            ; BaseOfCode
        dq BASE         ; image base address to load PE
        dd 4            ; 003C offset to PE header / .SectionAlignment
        dd 4            ; FileAlignment
        dw ?,?,?,?,4,?  ; MajorSubsystemVersion
        dd ?            ; Win32VersionValue
        dd IMAGESIZE    ; SizeOfImage ($40=smallest)
        dd ?,?
        dw 2            ; IMAGE_SUBSYSTEM_WINDOWS_GUI

        dw ?
        dq ?,?,?,? ; stack/heap
        dd ? ; loader flags
        dd ? ; NumberOfRvaAndSizes
__Entry:
;===============================================================================

; FYI: this gets compressed

; Initial Program Execution Context:
;       https://docs.microsoft.com/en-us/windows/desktop/api/winternl/ns-winternl-peb
;       https://stackoverflow.com/questions/41392014/what-does-windows-do-before-main-is-called

; these parameters are generated by separate program
DLLS equ "kernel32","user32"
HASH.Found_Key = 50127          ; search for value that eliminates collisions
HASH.Found_Shift = 17           ; [8,30] scales hash to table size

macro hash_call dll,name
        match many,DLLS
        iterate one,many
                if +one = +dll
                        break
                end if
                if %=%%
                        err "Need to define DLL order for ",dll
                end if
        end iterate
        else
                err "DLL ordering required."
        end match

        local hash,b
        hash = 0
        virtual at 0
                db name,0
                repeat $
                        load b:1 from $$+%-1
                        hash = ((hash and $FFFFFF00) or b) * HASH.Found_Key
                        hash = (hash and $FFFFFF00) or ((hash shl 1) and $FE)
                end repeat
                assert (hash and $FF) = 0
                hash = hash shr HASH.Found_Shift
        end virtual
        call qword [HASH.Table + hash*8]
end macro

;$77 bytes ; 19 more than 32-bit
        mov rax,[rcx+24]                ; PEB_LDR_DATA
        mov rax,[rax+16]                ; head of double linked list
        mov rax,[rax]                   ; forward link to next item
        mov rax,[rax]                   ; forward again
        mov rax,[rax+48]                ; DLL base address (kernel32)

        lea edi,[DLL_List]              ; #32#
Find_Exports:
        push rax
        pop rbp
        mov eax,[rbp+3Ch]       ; IMAGE_DOS_HEADER.e_lfanew
        mov ebx,[rax+rbp+88h]   ; DataDirectory[0] = IMAGE_EXPORT_DIRECTORY
        add rbx,rbp
        mov ecx,[rbx+18h]       ; IMAGE_EXPORT_DIRECTORY.NumberOfNames
Function_Hashing:
        mov eax,[rbx+32]
        add rax,rbp
        mov esi,[rax+rcx*4-4]
        add rsi,rbp                     ; Function String

        mov eax,[rbx+36]
        add rax,rbp
        movzx edx,word [rax+rcx*2-2]
        mov eax,[rbx+28]
        add rax,rbp
        mov edx,[rax+rdx*4]
        add rdx,rbp                     ; Function address

        xor eax,eax ; salt with : mov eax,[edi-4]
.hash:
        lodsb
        imul eax,eax,HASH.Found_Key
        add al,al ; try rol/ror
        jnz .hash
        shr eax,HASH.Found_Shift
        mov [HASH.Table+rax*8],rdx      ; insert into table
        loop Function_Hashing

        push rdi
        pop rcx
        hash_call "kernel32","LoadLibraryA"
        repnz scasb ; replaces : add edi,DLL_List.max_length ; #32#
        test rax,rax
        jnz Find_Exports
;===============================================================================

; Put real program here
int3

;===============================================================================

DLL_List:
        match many,DLLS
        iterate one,many
                if %=1
                        assert +one = "kernel32"
                else
                        db one,0
                end if
        end iterate
        else
                err "DLL ordering required."
        end match
        db 0,0


align 8
HASH.Table:
        rq 1 shl (32-HASH.Found_Shift)  ; needed space

; TODO: set correct image size or greater
IMAGESIZE := $-IMAGEBASE

; padding to minimal file size
section 0
        while $% < 268
                db $FF
        end while    
In terms of size, the API method used by Crinkler is superior to my previous work. There is still room for improvement in both the 32-bit and 64-bit code.

Next I'm going to work on the API hash search program. I stole the database from Crinkler as a starting point - rather than scan all supported windows versions. Fairly easy to code something to strip the names from the DLLs.

After that will be the de-/compression.

...and then sandwich is all together. Tasty!

_________________
¯\(°_o)/¯ unlicense.org
Post 25 May 2019, 01:00
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 7322
Location: Kraków, Poland
Looks promising!
I guess the hash search might be too slow as a fasmg script, so it is going to be a regular program, am I right?
Post 25 May 2019, 12:29
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
Definitely. No way to guarantee this method works in all cases. The same function name in multiple DLLs always collides, but is only a problem if it's used. The ordering of functions between DLL versions can create corner cases only detectable by testing every specific DLL ordering in use - something Crinkler doesn't do because it's so rarely a problem and blows up search space.

Present build plan:
  • del config
  • fasmg file.asm
  • hashsrch
  • fasmg file.asm
  • compress
First time creates a list of used functions which hashsrch will use to create the configuration for the loader. Then the compressor will pack and merger the stub/depacker.

Having the other parts in fasmg would maybe restrict them too much. Anything for a few bytes usually means a time trade off - both in the hash search and compression. Crinkler can permute code section ordering which is a nice feature I'm still brainstorming about. Of course, we have more flexibility in asm, but to automatically produce a large number of variations does require another abstraction, imho.

Having the compressor in fasmg is tempting.

_________________
¯\(°_o)/¯ unlicense.org
Post 25 May 2019, 17:54
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 7322
Location: Kraków, Poland
bitRAKE wrote:
Crinkler can permute code section ordering which is a nice feature I'm still brainstorming about. Of course, we have more flexibility in asm, but to automatically produce a large number of variations does require another abstraction, imho.
This reminds me of a curiosity snippet I made in another thread. Probably not viable here, but it was a fun thing to try.
Post 25 May 2019, 18:05
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
Wow, you went full caps-lock on that code! LOL, that's not your style. Do you realize how confused someone coming from another assembler would be with that? Or, from many other languages for that matter. There is like an imaginary loop on every listing - other assemblers/compilers would just throw an error.

I will surely use something like that if I write a compressor in fasmg.

_________________
¯\(°_o)/¯ unlicense.org
Post 26 May 2019, 03:41
View user's profile Send private message Visit poster's website Reply with quote
Mikl___



Joined: 30 Dec 2014
Posts: 113
a working PE64 with import, size of exe-file is 268 bytes 11 Jun 2015, 09:28 https://board.flatassembler.net/topic.php?t=17955
Code:
format binary as 'exe'

IMAGE_DOS_SIGNATURE             equ 5A4Dh
IMAGE_NT_SIGNATURE              equ 00004550h
PROCESSOR_AMD_X8664             equ 8664h
IMAGE_SCN_CNT_CODE              equ 00000020h
IMAGE_SCN_MEM_READ              equ 40000000h
IMAGE_SCN_MEM_WRITE             equ 80000000h
IMAGE_SCN_CNT_INITIALIZED_DATA  equ 00000040h
IMAGE_SUBSYSTEM_WINDOWS_GUI     equ 2
IMAGE_NT_OPTIONAL_HDR64_MAGIC   equ 20Bh
IMAGE_FILE_RELOCS_STRIPPED      equ 1
IMAGE_FILE_EXECUTABLE_IMAGE     equ 2
IMAGE_DLLCHARACTERISTICS_TERMINAL_SERVER_AWARE equ 8000h

include 'win64a.inc'
org 0
use64
IMAGE_BASE = 400000h
Signature:              dw IMAGE_DOS_SIGNATURE,0
ntHeader                dd IMAGE_NT_SIGNATURE;'PE'
;image_header--------------------------
.Machine                dw PROCESSOR_AMD_X8664
.Count_of_section       dw 0;2
.TimeStump              dd 0
.Symbol_table_offset    dd 0;ntHeader
.Symbol_table_count     dd 0
.Size_of_optional_header dw EntryPoint-optional_header
.Characteristics        dw 0x20 or IMAGE_FILE_RELOCS_STRIPPED or IMAGE_FILE_EXECUTABLE_IMAGE
;20h Handle >2Gb addresses
;-------------------------------------
optional_header:
.Magic_optional_header  dw IMAGE_NT_OPTIONAL_HDR64_MAGIC
.Linker_version_major_and_minor dw 9 
.Size_of_code           dd 0
.Size_of_init_data      dd 0;xC0
.Size_of_uninit_data    dd 0
.entry_point            dd EntryPoint
.base_of_code           dd ntHeader
.image_base             dq IMAGE_BASE
.section_alignment      dd 4
.file_alignment         dd 4
.OS_version_major_minor dw 5,2
.image_version_major_minor dd 0
.subsystem_version_major_minor dw 5,2
.Win32_version          dd 0
.size_of_image          dd EndOfImage
.size_of_header         dd EntryPoint
.checksum               dd 0
.subsystem              dw IMAGE_SUBSYSTEM_WINDOWS_GUI
.DLL_flag               dw IMAGE_DLLCHARACTERISTICS_TERMINAL_SERVER_AWARE
.Stack_allocation       dq 0x100000
.Stack_commit           dq 0x1000
.Heap_allocation        dq 0x100000
.Heap_commit            dq 0x1000
.loader_flag            dd 0
.number_of_dirs         dd (EntryPoint-export_RVA_size)/8
export_RVA_size        dq 0
.import_RVA             dd import_
.import_size            dd end_import-import_
;------------------------------------------------
EntryPoint:
   enter 20h,0        ; space for 4 arguments + 16byte aligned stack
   xor ecx, ecx                   ; 1. argument: rcx = hWnd = NULL
   mov r9, rcx                    ; 4. argument: r9d = uType = MB_OK = 0
   mov edx,MsgCaption+IMAGE_BASE  ; 2. argument: edx = window text
   mov r8,rdx                     ; 3. argument: r8  = caption
   call [MessageBox]
   leave
   ret
;------------------------------------------------
MsgCaption      db "Iczelion's tutorial #2a",0
;-------------------------------------------------
Import_Table:
user32_table:
MessageBox  dq _MessageBox
import_:
dd 0,0,0,user32_dll,user32_table
dd 0
user32_dll    db "user32",0,0
dw 0
_MessageBox     db 0,0,"MessageBoxA"
end_import:
times 268-end_import db 0  ;filling up to 268 bytes
EndOfImage:        
Post 26 May 2019, 06:54
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
There are good references on the limitations of different execution environments:
https://github.com/corkami/pocs/tree/master/PE

Using an import section is not size efficient beyond a single function. I'm in search of something more general and functional on most 64-bit versions of Windows.

_________________
¯\(°_o)/¯ unlicense.org
Post 26 May 2019, 07:21
View user's profile Send private message Visit poster's website Reply with quote
Mikl___



Joined: 30 Dec 2014
Posts: 113
Hi, bitRAKE!
minimal size PE64 is 268 bytes and it doesn't matter file has the import section or not has
Post 26 May 2019, 07:42
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2795
Location: dank orb
Yes, but I don't mean minimal in the absolute sense, but generally the minimum of any program (not really, but more so than the other extreme). Very Happy

_________________
¯\(°_o)/¯ unlicense.org
Post 26 May 2019, 08:10
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2019, Tomasz Grysztar.

Powered by rwasa.