flat assembler
Message board for the users of flat assembler.

Index > Tutorials and Examples > [fasm2] Programming with your x86 ...

Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 4162
Location: vpcmpistri
bitRAKE 09 Feb 2025, 01:40
Several features have been added by fasm2 enabling much customization in the use of x86. There aren't many examples using these features. So, I wanted to create a thread exploring instruction selection and usage options.

If one can read CALM syntax the \include\x86-2.inc file begins with the configuration details. Some of these features are about how instructions are encoded - assumptions made about the runtime environment, or constraints imposed by the programmer. These ideas have been in the works for a long time.

I want to start with a simple conditional compilation example:
Code:
        {bss:8} .pt POINT
        GetCursorPos & .pt
        ; :Note: point structure data is pass in register
        MonitorFromPoint qword [.pt], MONITOR_DEFAULTTONEAREST
        xchg rcx, rax
        {data:8} .mi MONITORINFOEXA cbSize: sizeof .mi
        GetMonitorInfoA rcx, & .mi

.xstyle = WS_EX_TOPMOST ; Just to make debugging easier.
.style = WS_POPUP or WS_SIZEBOX or WS_VISIBLE

; First resolve the width/height for the minimal window size; and then
; resolve the initial window coordinates within the mouse monitor. Unless
; the window style changes these dimensions should remain the minimal.
        {data:16} .rc RECT 0, 0, MINIMUM_ASPECT*BOARD_WIDTH, MINIMUM_ASPECT*BOARD_HEIGHT
        AdjustWindowRect & .rc, dword .style, FALSE, .xstyle

        {bss:8} .min_width      dd ?
        {bss:8} .min_height     dd ?
        if x86.simd < x86.SSE2.simd ; 64-bit always supports SSE2+
                mov eax, [.rc.right]
                mov ecx, [.rc.bottom]
                sub eax, [.rc.left]
                sub ecx, [.rc.top]
                mov [.min_width], eax
                mov [.min_height], ecx

                mov eax, [.mi.rcMonitor.right]
                mov ecx, [.mi.rcMonitor.bottom]
                sub eax, [.mi.rcMonitor.left]   ; monitor width
                sub ecx, [.mi.rcMonitor.top]    ; monitor height
                sar eax, 1
                sar ecx, 1
                ; adjust by 1/2 initial window size
                sub eax, (DEFAULT_ASPECT*BOARD_WIDTH) shr 1
                sub ecx, (DEFAULT_ASPECT*BOARD_HEIGHT) shr 1
                add eax, [.mi.rcMonitor.left]
                add ecx, [.mi.rcMonitor.top]
                mov [.rc.left], eax
                mov [.rc.top], ecx
                ; initial window size
                mov [.rc.right], DEFAULT_ASPECT*BOARD_WIDTH
                mov [.rc.bottom], DEFAULT_ASPECT*BOARD_HEIGHT
        else if x86.simd < x86.AVX.simd
                movq xmm0, qword [.rc.right]
                movq xmm1, qword [.rc.left]
                psubd xmm0, xmm1
                movq qword [.min_width], xmm0

                mov rax, (((DEFAULT_ASPECT*BOARD_HEIGHT) shr 1) shl 32) \
                        or ((DEFAULT_ASPECT*BOARD_WIDTH) shr 1)
                movq xmm0, qword [.mi.rcMonitor.right]
                movq xmm1, qword [.mi.rcMonitor.left]
                movq xmm2, rax
                shl rax, 1
                psubd xmm0, xmm1
                psrad xmm0, 1
                psubd xmm0, xmm2
                paddd xmm0, xmm1
                movq qword [.rc.left], xmm0
                mov qword [.rc.right], rax
        else ; AVX+
                vmovq xmm0, qword [.rc.right]
                vpsubd xmm0, xmm0, dqword [.rc.left]
                vmovq qword [.min_width], xmm0

                mov rax, (((DEFAULT_ASPECT*BOARD_HEIGHT) shr 1) shl 32) \
                        or ((DEFAULT_ASPECT*BOARD_WIDTH) shr 1)
                vmovq xmm0, qword [.mi.rcMonitor.right]
                vmovq xmm1, qword [.mi.rcMonitor.left]
                vmovq xmm2, rax
                shl rax, 1
                vpsubd xmm0, xmm0, xmm1
                vpsrad xmm0, xmm0, 1
                vpsubd xmm0, xmm0, xmm2
                vpaddd xmm0, xmm0, xmm1
                vmovq qword [.rc.left], xmm0
                mov qword [.rc.right], rax
        end if
        AdjustWindowRect & .rc, .style, 0, .xstyle    
In the above code, different implementations are assembled based on how the assembler is configured. "use AMD64" will select the SSE2 code, while "use AMD64,AVX" or greater will select the AVX code. I've left the generic code for illustrative purposes - it will never be assembled. An AVX512 implementation is missing, but could be easily added.

Extending this idea we can easily see how a library of functions can be created where all implementations of a particular function co-exist within the same file. Then at assemble-time the execution environment being built for would select appropriate code.

There are many use cases for these features. More to come ....

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 09 Feb 2025, 01:40
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4162
Location: vpcmpistri
bitRAKE 10 Feb 2025, 04:39
What is meant by "your x86" in the post title? Well, we all come to programming from different directions, we have different responsiblities to ourselves - others. This creates an expectation in syntax. Coming from fasm there aren't as many surprises in the x86 instruction syntax. Yet, if you're working with GNU AS or perhaps another assembler it's a different story.
Code:
; Checking of elapsed time is done in the raw-tick-time domain. Rather than
; transforming raw-ticks into milliseconds, it's more efficient to transform
; the turn-time delta into the raw-tick-time domain. Note: The tick time
; typically has a grainularity of 10-16 milliseconds.
        {bss:8} .turn_time dq ?
        mov rax, ((1+STEP_IN_MILLISECONDS) shl 24)-1 ; round up?
        xor edx, edx
        mov ecx, [dword TickCountMultiplier] ; scaled period (in 100ns units) / 10000
        div rcx
        mov [.turn_time], rax    
Depending on one's perspective, the above code could raise a number of questions? Why does TickCountMultiplier need an explicit dword? What does .turn_time refer to? TickCountMultiplier is in KUSER_SHARED_DATA, an absolute address, mapped into every process's address space. It's qualified with dword to insure the interpretation as an absolute address because the default is automatic RIP-relative addressing. fasm2 enables turning off automatic RIP-relative addressing, or verbosely setting absolute interpretation for the single line:
Code:
use norip
        mov ecx, [TickCountMultiplier]
        div rcx
        mov [rip + .turn_time], rax
use ripauto ; back to default setting    
Or settings of individual lines:
Code:
{absolute}      mov ecx, [TickCountMultiplier]
                div rcx
{norip}         mov [rip + .turn_time], rax    
In this way (and others) fasm2 allows the x86 syntax to meet your expectations.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 10 Feb 2025, 04:39
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4162
Location: vpcmpistri
bitRAKE 06 Mar 2025, 12:43
In the first post I touched on using a range of SIMD instructions - responding to the use setting. Another way to integrate these settings into your program is to block various sections of code to provide a broader range of support at runtime - which is typical.

The feature sets cover: x86.cpu, x86.fpu, x86.simd, and x86.ext.

The last one - x86.ext - consists of several flags. We might think initially to operate on or examine these bit flags, but there is something special about the bits - they correspond to the bits in the feature flags returned by CPUID.

This alignment with CPUID allows us to prefix relevant sections of the program with code to verify extension support:
Code:
temp = x86.ext and 0xFFFF_FFFF
if temp
        mov eax, 1
        cpuid
        and ecx, temp
        cmp ecx, temp
        jnz CPU_not_supported
end if
if x86.ext shr 32
        xor ecx, ecx
        lea eax, [rcx+7]
        cupid

        iterate REG, ebx,ecx,edx
                temp = (x86.ext shr (%*32)) and 0xFFFF_FFFF
                if temp
                        and REG, temp
                        cmp REG, temp
                        jnz CPU_not_supported
                end if
        end iterate
end if    
... perhaps this would make a good macro?

The processor and OS support would also be needed.
(See examples\win64avx512\ in the fasm2 distribution.)

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 06 Mar 2025, 12:43
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.