flat assembler
Message board for the users of flat assembler.
![]() Goto page Previous 1, 2 |
Author |
|
macomics 20 Oct 2025, 08:55
I think there is much more to be won by not using a conditional loop, but by writing the hardcode of the desired equation using commands and the mathematical form of the calculation in place.
|
|||
![]() |
|
macomics 20 Oct 2025, 09:32
Let the assembler generate the desired sequence of mulss without cycles and conditions, depending on the specific exponent.
Code: macro int_power_inline p, x, n { local t if n = 0 movss xmm0, [const1] movss p, xmm0 else if n = 1 movss xmm0, x movss p, xmm0 else if n = -1 movss xmm0, [const1] movss xmm1, x divss xmm0, xmm1 movss p, xmm0 else if n < 0 t = 0 - n else t = n end if movss xmm0, [const1] movss xmm1, x while t > 1 if t and 1 mulss xmm0, xmm1 end if mulss xmm1, xmm1 t = t shr 1 end while mulss xmm0, xmm1 if n < 0 movss xmm1, [const1] divss xmm1, xmm0 movss p, xmm1 else movss p, xmm0 end if end if } |
|||
![]() |
|
revolution 20 Oct 2025, 13:31
macomics wrote: Let the assembler generate the desired sequence of mulss without cycles and conditions, depending on the specific exponent. |
|||
![]() |
|
Roman 20 Oct 2025, 15:02
little optimization code macomics.
Code: mov xmm1 [float value] mov ecx,13 call xmpow xmpow: movss xmm0, [.const1] jecxz .exit @@: cmp ecx, 1 jz .final test ecx, 1 jz .next mulss xmm0, xmm1 .next: mulss xmm1, xmm1 shr ecx, 1 jmp @b .final: mulss xmm0, xmm1 .exit: ret .const1 dd 1.0 |
|||
![]() |
|
macomics 20 Oct 2025, 15:27
revolution wrote:
I checked and everything is ok. It generates what I need. The last mulss is just for the highest bit in the exponent.
|
||||||||||
![]() |
|
macomics 20 Oct 2025, 15:44
Roman wrote: little optimization code macomics. |
|||
![]() |
|
Roman 20 Oct 2025, 18:14
macro int_power_inline good. And some cases best variant.
But sometime we needed calculate with proc xmpow. I start writing physic for my 3D game. And I needed using proc xmpow. |
|||
![]() |
|
revolution 20 Oct 2025, 22:56
macomics wrote: I checked and everything is ok. It generates what I need. Code: const1: int_power_inline [esp],[esp],2 Code: movss xmm0,dword [0x0] movss xmm1,dword [esp] mulss xmm1,xmm1 mulss xmm0,xmm1 movss dword [esp],xmm0 Code: movss xmm0,dword [esp] mulss xmm0,xmm0 movss dword [esp],xmm0 |
|||
![]() |
|
macomics 21 Oct 2025, 00:21
I agree. We can take away of multiplication by 1
Code: macro int_power_inline p, x, n { local f, t if n = 0 movss xmm0, [const1] movss p, xmm0 else if n = 1 movss xmm0, x movss p, xmm0 else if n = -1 movss xmm0, [const1] movss xmm1, x divss xmm0, xmm1 movss p, xmm0 else f = 0 if n < 0 t = 0 - n else t = n end if movss xmm1, x while t > 1 if t and 1 if f = 0 f = 1 movss xmm0, xmm1 else mulss xmm0, xmm1 end if end if mulss xmm1, xmm1 t = t shr 1 end while if f = 0 if n < 0 movss xmm0, [const1] divss xmm0, xmm1 movss p, xmm0 else movss p, xmm1 end if else mulss xmm0, xmm1 if n < 0 movss xmm1, [const1] divss xmm1, xmm0 movss p, xmm1 else movss p, xmm0 end if end if end if } |
|||
![]() |
|
revolution 21 Oct 2025, 00:34
A flaw with using n as a raw value in "t = 0 - n" is with this:
Code: int_power_inline [esp],[esp],-1-1 Code: t = 0 - (n) |
|||
![]() |
|
bitRAKE 21 Oct 2025, 00:34
What about something like:
Code: ; input: XMM0, ECX ; output XMM0 <- XMM0^ECX ; First, handle special cases: 0, 1 and powers of two: .reduce: shr ecx, 1 jz .done jc .first_bit vmulss xmm0, xmm0, xmm0 jmp .reduce .first_bit: vmovss xmm1, xmm0 ; perhaps replace above, for no merge dependancy from prior xmm1 ; vmovaps xmm1, xmm0 ; Second hot loop to complete calculation: .square: vmulss xmm1, xmm1, xmm1 shr ecx, 1 jnc .square vmulss xmm0, xmm0, xmm1 jnz .square .not_zero: retn .done: jc .not_zero ; perdict backward likely jump mov ecx, 1f ; 0x3f800000 vmovd xmm0, ecx retn ; + No memory access. |
|||
![]() |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.