flat assembler
Message board for the users of flat assembler.

Index > Main > RSI and RDI in x64

Author
Thread Post new topic Reply to topic
Andy



Joined: 17 Oct 2011
Posts: 55
Andy 11 Nov 2023, 23:42
Probably it's not the best way but I have this piece of code for x86 that compares two data structures byte by byte and return the maximum absolute difference between two bytes situated at same offset from each structure.

Code:
use32
mov esi, dword [esp + 4]      ; Pointer to first structure of bytes
mov edi, dword [esp + 8]      ; Pointer to second structure of bytes
mov ecx, dword [esp + 12]     ; Number of bytes in each structre
xor edx, edx                  ; Max difference at start is zero

next:
mov al, [esi]
mov bl, [edi]
cmp al, bl
ja skip_swap    ; Make sure bl > al
mov ah, al
mov al, bl
mov bl, ah
skip_swap:
sub al, bl      ; al = bl - al
cmp al, dl      ; Compare with max difference until now
jb absdiff      ; If al > dl Then dl = al
mov dl, al
absdiff:
inc esi
inc edi
loop next
mov eax, edx
ret 12    


Now I try to do the same for x64 but for some reason I can't use properly RSI and RDI as in the code above so I tried to work directly with r8 and r9 registers but the application crash after first 1024 bytes.
Code:
use64
; RCX = Number of bytes in each structure
; RDX = Max difference at start is zero
; R8 = Pointer to first structure of bytes 
; R9 = Pointer to second structure of bytes 

xor rax, rax
xor rbx, rbx

next:
mov al, [r8]
mov bl, [r9]
cmp al, bl
ja skipswap
mov ah, al
mov al, bl
mov bl, ah
skipswap:
sub al, bl
cmp al, dl
jb skipdiff
mov dl, al
skipdiff:
inc r8
inc r9
loop next
mov rax, rdx
ret    


Any help is much appreciated.
Post 11 Nov 2023, 23:42
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4071
Location: vpcmpistri
bitRAKE 12 Nov 2023, 00:09
The primary difference is the registers cleared. The 64-bit version clears RAX/RBX, but the 32-bit version clears EDX. Does the 64-bit version initialize EDX elsewhere?
Code:
        xor eax, eax
next:
        mov al, [r8 + rcx - 1]
        sub al, [r9 + rcx - 1]
        ja skipswap
        neg al
skipswap:
        cmp al, ah
        jb skipdiff
        mov ah, al
skipdiff:
        loop next
        movzx eax, ah
        ret    
... should perform the same operation - greatest absolute difference.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 12 Nov 2023, 00:09
View user's profile Send private message Visit poster's website Reply with quote
Andy



Joined: 17 Oct 2011
Posts: 55
Andy 12 Nov 2023, 00:26
Not just EDX but all parameters are passed from a high level language. If I get it right, x64 calling convention say that first 4 integer parameters are passed in RCX, RDX, R8, and R9. So in RCX I have the number of bytes, in RDX I pass 0 at start and R8,R9 are pointers to data structures. Now that you said that 64-bit version clears RAX/RBX I pushed them into stack before working with these registers and restore them before return. I did it with RDI and RSI also and seems to work fine.

Code:
use64

push rsi
push rdi
push rax
push rbx

mov rsi, r8
mov rdi, r9
xor rax, rax
xor rbx, rbx

next:
mov al, [rsi]
mov bl, [rdi]
cmp al, bl
ja skipswap
mov ah, al
mov al, bl
mov bl, ah
skipswap:
sub al, bl
cmp al, dl
jb skipdiff
mov dl, al
skipdiff:
inc rsi
inc rdi
loop next

pop rbx
pop rax
pop rdi
pop rsi

mov rax, rdx
ret    


Thank you very much.
Post 12 Nov 2023, 00:26
View user's profile Send private message Reply with quote
Andy



Joined: 17 Oct 2011
Posts: 55
Andy 12 Nov 2023, 00:49
bitRAKE wrote:
The primary difference is the registers cleared. The 64-bit version clears RAX/RBX, but the 32-bit version clears EDX. Does the 64-bit version initialize EDX elsewhere?
Code:
        xor eax, eax
next:
        mov al, [r8 + rcx - 1]
        sub al, [r9 + rcx - 1]
        ja skipswap
        neg al
skipswap:
        cmp al, ah
        jb skipdiff
        mov ah, al
skipdiff:
        loop next
        movzx eax, ah
        ret    
... should perform the same operation - greatest absolute difference.


Tried this and also works great. Thank you.
Post 12 Nov 2023, 00:49
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20445
Location: In your JS exploiting you and your system
revolution 12 Nov 2023, 04:36
Andy wrote:
... RAX/RBX I pushed them into stack before working with these registers and restore them before return. I did it with RDI and RSI also and seems to work fine.
Of those four registers rbx, rsi and rdi need to be preserved. The other, rax, can be freely modified.

Note that if you want to call another HLL function from within your code then there is a requirement to have the stack pointer, rsp, correctly aligned to 0 mod 16 before the call, else the code can potentially crash.
Post 12 Nov 2023, 04:36
View user's profile Send private message Visit poster's website Reply with quote
Andy



Joined: 17 Oct 2011
Posts: 55
Andy 12 Nov 2023, 05:39
Thank you for clarification. I figure it out later reading more about x64 calling convention that RAX doesn't require to be saved.

Quote:
The x64 ABI considers registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15, and XMM6-XMM15 nonvolatile. They must be saved and restored by a function that uses them.
Post 12 Nov 2023, 05:39
View user's profile Send private message Reply with quote
Andy



Joined: 17 Oct 2011
Posts: 55
Andy 12 Nov 2023, 13:16
I was thinking about SIMD version of the code above but I have one more question. I am not sure if I get it right how MPSADBW works and if it helps me. If I have this sample data in XMM1 and XMM2, after executing MPSADBW the result will be as I anticipate below?

XMM1: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
XMM2: 00 00 00 00 00 00 00 00 00 00 06 05 04 03 02 01
MPSADBW
XMM1: 00 00 00 00 00 00 00 00 00 00 00 00 00 0B 00 0A

From what I read MPSADBW will compute absolute difference of quadruplets of 8-bit unsigned integers from first register compared to those in second register, and store the 16-bit results in first register. Also I am not sure if I fully understand the last operand, it looks like it's some kind of offset from where the quadruplets are formed.
Post 12 Nov 2023, 13:16
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.