flat assembler
Message board for the users of flat assembler.

Index > Tutorials and Examples > Windows on ARM64 - simple example with fasmg

Thread Post new topic Reply to topic
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8039
Location: Kraków, Poland
Tomasz Grysztar 14 Oct 2022, 19:04
I have finally got my hands on an ARM64-based machine running Windows, and - obviously - one of the very first things I wanted to try was to assemble some new PE files.

It turned out to be very easy with already available resources. I used the existing aarch64 includes for fasmg (made by tthsqe) and the standard PE formatter that comes with fasmg.

My first working example, analogous to the basic PE demos from fasm packages:
format binary as 'exe'

PE.Settings.Magic = 0x20B
PE.Settings.Machine = IMAGE_FILE_MACHINE_ARM64
PE.Settings.ImageBase = 0x140000000
include 'format/pe.inc'

include 'aarch64.inc'
define xIP0? x16        ; Windows-specific aliases
define xIP1? x17

section '.text' code readable executable

        entry $

                mov     x0,0
                adr     x1,_message
                adr     x2,_caption
                mov     x3,0
                bl      MessageBoxA

                bl      ExitProcess

                adr     xip0,imp__MessageBoxA
                ldr     xip0,[xip0]
                br      xip0

                adr     xip0,imp__ExitProcess
                ldr     xip0,[xip0]
                br      xip0

section '.data' data readable writeable

  _caption db 'Windows on ARM64',0
  _message db 'Hello, world of assembly!',0

section '.idata' import data readable writeable

  dd 0,0,0,RVA kernel_name,RVA kernel_table
  dd 0,0,0,RVA user_name,RVA user_table
  dd 0,0,0,0,0

    imp__ExitProcess dq RVA _ExitProcess
    dq 0
    imp__MessageBoxA dq RVA _MessageBoxA
    dq 0

  kernel_name db 'KERNEL32.DLL',0
  user_name db 'USER32.DLL',0

  _ExitProcess dw 0
    db 'ExitProcess',0
  _MessageBoxA dw 0
    db 'MessageBoxA',0

section '.reloc' fixups data readable discardable

  if $=$$
    dd 0,8              ; if there are no fixups, generate dummy entry
  end if    
(Tested on Windows 11 on ARM.)

This might encourage me to work on CALM-based ARM64 instruction set myself... But no promises. And my PE tutorial deserves another chapter.
Post 14 Oct 2022, 19:04
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8039
Location: Kraków, Poland
Tomasz Grysztar 17 Oct 2022, 16:12
Taking the universal_template.asm from the PE tutorial as a starting point, just a couple of minor changes suffices to turn it into a valid ARM64 PE.

Obviously, we need to change the CPU instruction set. Replace
use 'x64.inc'
use 'aarch64.inc'

define xIP0? x16
define xIP1? x17    
These additional register aliases are related to Windows conventions, x16 and x17 are designed as "intra-procedure call scratch registers" and we are going to use one of them under this alias to make import stubs for calling Windows API.

The default image base needs to be different for ARM64, with a value above 32-bit range:
DEFAULT_IMAGE_BASE := 0x140000000    
The change of target architecture should be quite obvious:
        .Machine                        dw IMAGE_FILE_MACHINE_ARM64    
And since we have image base above 4G boundary, we may as well add the "large address aware" flag for consistency:
        .Characteristics                dw IMAGE_FILE_EXECUTABLE_IMAGE + IMAGE_FILE_LARGE_ADDRESS_AWARE    
Now all that is left is to replace the x86 code.

                mov     x0,0
                adr     x1,MessageString
                adr     x2,CaptionString
                mov     x3,0
                bl      stub_MessageBoxA

                bl      stub_ExitProcess

                adr     xip0,MessageBoxA
                ldr     xip0,[xip0]
                br      xip0

                adr     xip0,ExitProcess
                ldr     xip0,[xip0]
                br      xip0    
While x86 code could also use import stubs to avoid indirect call instruction (which has a slightly longer opcode that direct one), in case of ARM they are pretty much necessary - there is no such instruction as x86's indirect call. We are using ADR to get the address of the pointer, it is encoded with PC-relative value and does not require relocation. The standard stubs in Windows are similar, but they use ADRP to get the address of the page of pointers, and then LDR adding offset within the page, which would look something like this:
                adrp    xip0,MessageBoxA
                ldr     xip0,[xip0,(MessageBoxA-IMAGE_BASE) and 0FFFh]    
It allows a longer range between the stub code and the import table, since ADR only operates within 1M radius. For a tiny example constructed in assembly this does not matter, though.

Note: this variant does not work with tthsqe's aarch64.inc, because current implementation there tries to generate a superfluous relocation record for ADRP. But it is easy to fix, works for me with a simple correction. If this becomes the official part of the tutorial, I'm going to provide a corrected aarch64.inc in the tutorial packages.

And that is all - after applying these changes and assembling with fasmg, we get a working template for the new architecture (tested on Windows 11 on ARM).
Post 17 Oct 2022, 16:12
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum

Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.