flat assembler
Message board for the users of flat assembler.

Index > Windows > how do x^6 on sse or avx ?

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20753
Location: In your JS exploiting you and your system
revolution 20 Oct 2025, 08:35
It is possible to use fewer multiplies if the code is permitted to include DIV.

For example n=127 can be done with 7 multiplies and one division. Compared to 12 multiplies if DIV is not used. This can be a win if DIV costs less than 5 multiplies.
Post 20 Oct 2025, 08:35
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1197
Location: Russia
macomics 20 Oct 2025, 08:55
I think there is much more to be won by not using a conditional loop, but by writing the hardcode of the desired equation using commands and the mathematical form of the calculation in place.
Post 20 Oct 2025, 08:55
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1197
Location: Russia
macomics 20 Oct 2025, 09:32
Let the assembler generate the desired sequence of mulss without cycles and conditions, depending on the specific exponent.
Code:
macro int_power_inline p, x, n { local t
  if n = 0
    movss xmm0, [const1]
    movss p, xmm0
  else if n = 1
    movss xmm0, x
    movss p, xmm0
  else if n = -1
    movss xmm0, [const1]
    movss xmm1, x
    divss xmm0, xmm1
    movss p, xmm0
  else
    if n < 0
      t = 0 - n
    else
      t = n
    end if
    movss xmm0, [const1]
    movss xmm1, x
    while t > 1
      if t and 1
        mulss xmm0, xmm1
      end if
      mulss xmm1, xmm1
      t = t shr 1
    end while
    mulss xmm0, xmm1
    if n < 0
      movss xmm1, [const1]
      divss xmm1, xmm0
      movss p, xmm1
    else
      movss p, xmm0
    end if
  end if
}    
Post 20 Oct 2025, 09:32
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20753
Location: In your JS exploiting you and your system
revolution 20 Oct 2025, 13:31
macomics wrote:
Let the assembler generate the desired sequence of mulss without cycles and conditions, depending on the specific exponent.
<snip>
The code posted generates one more multiply than necessary for n>1
Post 20 Oct 2025, 13:31
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 20 Oct 2025, 15:02
little optimization code macomics.
Code:
mov xmm1 [float value]
mov  ecx,13
call xmpow

xmpow:
     movss xmm0, [.const1] 
     jecxz .exit
@@:  cmp ecx, 1
     jz .final
     test ecx, 1
     jz .next
      mulss xmm0, xmm1
.next:
      mulss xmm1, xmm1
      shr ecx, 1
      jmp @b
.final: mulss xmm0, xmm1 
.exit: ret

.const1 dd 1.0    
Post 20 Oct 2025, 15:02
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1197
Location: Russia
macomics 20 Oct 2025, 15:27
revolution wrote:
macomics wrote:
Let the assembler generate the desired sequence of mulss without cycles and conditions, depending on the specific exponent.
<snip>
The code posted generates one more multiply than necessary for n>1


I checked and everything is ok. It generates what I need.
The last mulss is just for the highest bit in the exponent.


Description:
Filesize: 246.17 KB
Viewed: 298 Time(s)

Снимок экрана_20251020_192323.png


Post 20 Oct 2025, 15:27
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1197
Location: Russia
macomics 20 Oct 2025, 15:44
Roman wrote:
little optimization code macomics.
Take a look at the latest changes in my code. There are other optimizations in the loop.
Post 20 Oct 2025, 15:44
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 20 Oct 2025, 18:14
macro int_power_inline good. And some cases best variant.
But sometime we needed calculate with proc xmpow.
I start writing physic for my 3D game.
And I needed using proc xmpow.
Post 20 Oct 2025, 18:14
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20753
Location: In your JS exploiting you and your system
revolution 20 Oct 2025, 22:56
macomics wrote:
I checked and everything is ok. It generates what I need.
The last mulss is just for the highest bit in the exponent.
I put this:
Code:
const1:
int_power_inline [esp],[esp],2    
It generated this:
Code:
        movss xmm0,dword [0x0]
        movss xmm1,dword [esp]
        mulss xmm1,xmm1
        mulss xmm0,xmm1
        movss dword [esp],xmm0    
That has two multiplies. But it can be done with one.
Code:
        movss xmm0,dword [esp]
        mulss xmm0,xmm0
        movss dword [esp],xmm0    
The "const1" is never required, unless it needs to compute for n<=0. Starting at the highest bit and working down allows the code to generate the minimal multiplies.
Post 20 Oct 2025, 22:56
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1197
Location: Russia
macomics 21 Oct 2025, 00:21
I agree. We can take away of multiplication by 1
Code:
macro int_power_inline p, x, n { local f, t
  if n = 0
    movss xmm0, [const1]
    movss p, xmm0
  else if n = 1
    movss xmm0, x
    movss p, xmm0
  else if n = -1
    movss xmm0, [const1]
    movss xmm1, x
    divss xmm0, xmm1
    movss p, xmm0
  else
    f = 0
    if n < 0
      t = 0 - n
    else
      t = n
    end if
    movss xmm1, x
    while t > 1
      if t and 1
        if f = 0
          f = 1
          movss xmm0, xmm1
        else
          mulss xmm0, xmm1
        end if
      end if
      mulss xmm1, xmm1
      t = t shr 1
    end while
    if f = 0
      if n < 0
        movss xmm0, [const1]
        divss xmm0, xmm1
        movss p, xmm0
      else
        movss p, xmm1
      end if
    else
      mulss xmm0, xmm1
      if n < 0
        movss xmm1, [const1]
        divss xmm1, xmm0
        movss p, xmm1
      else
        movss p, xmm0
      end if
    end if
  end if
}    
Post 21 Oct 2025, 00:21
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20753
Location: In your JS exploiting you and your system
revolution 21 Oct 2025, 00:34
A flaw with using n as a raw value in "t = 0 - n" is with this:
Code:
int_power_inline [esp],[esp],-1-1    
The n needs to be put inside parentheses:
Code:
t = 0 - (n)    
Post 21 Oct 2025, 00:34
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4308
Location: vpcmpistri
bitRAKE 21 Oct 2025, 00:34
What about something like:
Code:
; input: XMM0, ECX
; output XMM0 <- XMM0^ECX

; First, handle special cases: 0, 1 and powers of two:

.reduce:
        shr ecx, 1
        jz .done
        jc .first_bit
        vmulss  xmm0, xmm0, xmm0
        jmp .reduce

.first_bit:
        vmovss xmm1, xmm0
; perhaps replace above, for no merge dependancy from prior xmm1
;       vmovaps xmm1, xmm0

; Second hot loop to complete calculation:

.square:
        vmulss xmm1, xmm1, xmm1
        shr ecx, 1
        jnc .square
        vmulss xmm0, xmm0, xmm1
        jnz .square
.not_zero:
        retn

.done:  jc .not_zero ; perdict backward likely jump
        mov ecx, 1f ; 0x3f800000
        vmovd xmm0, ecx
        retn

; + No memory access.    
edit: ... test/timing of algorithm.
Post 21 Oct 2025, 00:34
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.