flat assembler
Message board for the users of flat assembler.
Index
> Main > best way to do.... |
Author |
|
tthsqe 20 Jun 2011, 20:23
What is the best way to do:
( 32 bit unsigned ) * ( 32 bit signed ) -> 64 bit signed on 32 bit x86? SSE/MMX welcome |
|||
20 Jun 2011, 20:23 |
|
bitshifter 20 Jun 2011, 22:35
can we use the fpu?
Code: ;data a dd 1234 b dd -2 c rq 1 ;code fild [a] fimul [b] fistp [c] |
|||
20 Jun 2011, 22:35 |
|
tthsqe 20 Jun 2011, 23:21
OK - maybe I should make my motivations clear.
I'm trying to decide between two representations for fixed point multi-precision numbers. (the goal is speed) 1. store the number in 2's complement 2. store the sign and absolute value. With 1, add/sub are fast and multiply requires these mixed unsigned/signed multiplications With 2, add/sub becomes a little awkward but all multiplications are unsigned. Any ideas? |
|||
20 Jun 2011, 23:21 |
|
JohnFound 20 Jun 2011, 23:24
unsigned х abs(signed) (by "mul") instruction will give you 64bit result, that have to be inverted if the signed number is negative.
|
|||
20 Jun 2011, 23:24 |
|
ctl3d32 20 Jun 2011, 23:25
SSE2:
Code: format PE GUI 4.0 entry start include 'win32ax.inc' section '.text' code readable executable start: invoke GetModuleHandle,0 mov [h_process],eax cvtss2sd xmm0,dword[numbers] cvtss2sd xmm1,dword[numbers+4] mulpd xmm0,xmm1 movsd [result],xmm0 cinvoke sprintf,caption,fmt,double[result] invoke MessageBox,HWND_DESKTOP,caption,title,MB_OK exit: invoke ExitProcess,0 section '.bss' data readable writable align 16 h_process dd ? ;handles to the executable process. result dq ? caption rb 40h section '.data' data readable writable align 16 title db 'SSE2 Multiplication:',0 fmt db 'The result is %f',0 numbers dd 1000.0,-5.5 section '.idata' import data readable writeable library kernel32,'KERNEL32.DLL',\ user32,'USER32.DLL',\ msvcrt,'MSVCRT.DLL' include 'API\KERNEL32.INC' include 'API\USER32.INC' import msvcrt,\ sprintf,'sprintf' |
|||
20 Jun 2011, 23:25 |
|
LocoDelAssembly 21 Jun 2011, 02:23
My implementation of JonhFound's idea:
Code: ; EAX = unsigned; EDX = signed ; Take EDX sign mov ecx, edx add ecx, ecx sbb ecx, ecx ; Negate EDX if it was negative at entry xor edx, ecx sub edx, ecx mul edx ; Negate result if EDX was negative at entry xor eax, ecx xor edx, ecx sub eax, ecx sbb edx, ecx ; Signed result in EDX:EAX; ECX destroyed The "Take EDX sign" part could be replaced with: Code: mov ecx, edx sar ecx, 31 Yet another method: Code: ; Take EDX sign xor ecx, ecx cmp edx, $8000'0000 adc ecx, -1 |
|||
21 Jun 2011, 02:23 |
|
tthsqe 21 Jun 2011, 07:47
After testing LocoDelAssembly implementation of JonhFound's idea, I found that
Code: mov rax,qword[a] ; signed mov rcx,rax sar rcx,63 ; sar is 1 cycle faster that add/sbb combo xor rax,rcx sub rax,rcx mul qword[b] ; unsigned xor rax,rcx xor rdx,rcx sub rax,rcx sbb rdx,rcx mov qword[c+8*0],rax ; signed mov qword[c+8*1],rdx ; is not bad at 5.33 cycles. the 32x32->64 instead of 64x64->128 was actualy SLOWER ????. While (correct me if this code is not correct) Code: movd xmm0,dword[a] ; signed movd xmm1,dword[b] ; unsigned movdqa xmm2,xmm1 pmuludq xmm1,xmm0 psllq xmm2,32 psllq xmm0,32 psubq xmm2,xmm1 blendvpd xmm1,xmm2 movq qword[c],xmm1 ; signed is 3.33 cycles but only works for 32x32->64 Cycle count is for a loop consiting of just the above instructions. I think that the conditional negation latency might hold back the multiplications enough that approach #2 might be better. New question: how about this: say we have Code: a_sign dq ? ; sign mask for a a dq ?,?,?,? ; absolute value of a stored in 4 legs b_sign dq ? ; dido for b b dq ?,?,?,? c_sign dq ? ; and c c dq ?,?,?,? what tricks can we pull to do c = a + b |
|||
21 Jun 2011, 07:47 |
|
idle 21 Jun 2011, 14:10
High Level Assembler
http://webster.cs.ucr.edu/ ide: http://sites.google.com/site/highlevelassembly/downloads/hide downloading everything |
|||
21 Jun 2011, 14:10 |
|
Enko 21 Jun 2011, 15:40
idle wrote: High Level Assembler No fasm Iside |
|||
21 Jun 2011, 15:40 |
|
tthsqe 21 Jun 2011, 15:55
OK - but I didn't have to download everything. With the usual 2's complement way of representing integers, it looks like a multiprecision c = a*b is implemented as:
c_sign = 0 if a < 0, a = neg(a); c_sign ^= 1; if b < 0, b = neg(b); c_sign ^= 1; c = a *b (unsigned!!) if c_sign, c = neg(c) I seriously doubt this is the fastest way |
|||
21 Jun 2011, 15:55 |
|
r22 21 Jun 2011, 17:29
@LocoDelAssembly
If we trash 1 more register and interleave the instructions we can avoid the partial stall Code: ;;; let EAX = signed ;;; let ECX = unsigned MOV EBX, EAX CDQ ;;; EDX = EAX.sign ? -1 : 0 NEG EBX ;;; EBX = -EAX, SF = (-EAX).sign ? 1 : 0 XCHG EDX, ECX ;;; ECX = EAX.sign ? -1 : 0 CMOVNS EAX, EBX ;;; EAX = abs(EAX) ... @tthsqe With x86-64 32bU X 32bS = 64bS is pretty trivial, stop living in the past with your 32bit requirements Code: *** edit *** this code was removed because it apparently used an instruction set that only exists in my imagination. Last edited by r22 on 22 Jun 2011, 02:01; edited 1 time in total |
|||
21 Jun 2011, 17:29 |
|
tthsqe 21 Jun 2011, 20:48
Code: MOVZX RDX, dword[a] ;unsigned MOVSX RAX, dword[b] ;signed IMUL RDX ; RAX signed On x84-64, one would hope to accomplish 64bU X 64bS = 128bS. This is what the posted code does. |
|||
21 Jun 2011, 20:48 |
|
LocoDelAssembly 21 Jun 2011, 22:08
r22, "MOVZX RDX, dword[a]" has to be replaced with "MOV EDX, dword [a]" because the former doesn't exist. Same problem with "MOVSX RAX, dword[b]" but I don't know of a proper substitute.
[edit]Removed comment about your ABS(EAX) code, I was confused.[/edit] |
|||
21 Jun 2011, 22:08 |
|
revolution 21 Jun 2011, 23:13
LocoDelAssembly wrote: Same problem with "MOVSX RAX, dword[b]" but I don't know of a proper substitute. Code: mov eax,dword[b] cdqe |
|||
21 Jun 2011, 23:13 |
|
LocoDelAssembly 21 Jun 2011, 23:32
Yes! I don't know why I thought that the "convert to sign-extended" instructions family always used DX/EDX/RDX as part of the destination.
|
|||
21 Jun 2011, 23:32 |
|
tthsqe 22 Jun 2011, 01:07
Code: Same problem with "MOVSX RAX, dword[b]" but I don't know of a proper substitute. There is a movsxd rax,dword[b] command |
|||
22 Jun 2011, 01:07 |
|
r22 22 Jun 2011, 01:59
LocoDelAssembly wrote: r22, "MOVZX RDX, dword[a]" has to be replaced with "MOV EDX, dword [a]" because the former doesn't exist. Same problem with "MOVSX RAX, dword[b]" but I don't know of a proper substitute. Drat! foiled by Intel/AMD again... apparently I've only ever used MOV[S/Z]X with byte sized source operands, so I've managed to keep myself completely ignorant to the instructions' lack of dword mem -> qword encoding. :oops: |
|||
22 Jun 2011, 01:59 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.