flat assembler
Message board for the users of flat assembler.

Index > Windows > how do x^6 on sse or avx ?

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 16 Oct 2025, 22:10
I search simple way do this x*x*x*x*x*x.
I not like do 5 times mulss xmm1,xmm1.
I searching couple sse or avx asm commands for this task.

I found this.
Code:
pow(base,power) = exp( power * log(base) )    

In this case do 5 times mulss xmm1,xmm1 not bad solution.

AVX have vhaddps horizontal apply.
Maybe exist horizontal multiplication?
ymm0 could multiply 8 floats numbers.
Post 16 Oct 2025, 22:10
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20753
Location: In your JS exploiting you and your system
revolution 16 Oct 2025, 23:42
A power of 6 can be done with three multiplies.

a = x*x ; x^2
b = a*x ; x^3
c = b*b ; x^6
Post 16 Oct 2025, 23:42
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 17 Oct 2025, 00:08
Sorry my mistake.
Code:
;x^8
align 32
x dd 1f,2f,3f,4f,5f,6f,7f,8f

vmovaps ymm0,yword [x]
    vmulps ymm0, ymm0, ymm0    ; x^2
    vmulps ymm0, ymm0, ymm0    ; x^4
    vmulps ymm0, ymm0, ymm0    ; x^8  

;after  in ymm0 1f^8 ,2f^8 ,3f^8 ,4f^8 ,5f^8 ,6f^8 ,7f^8 ,8f^8 
    

Code:
x dd 3.0

movss xmm1,[x]
movss xmm0,xmm1
mulss xmm0,xmm0 ;x^2
mulss xmm0,xmm0 ;x^4
mulss xmm1,xmm0 ;x^5 xmm1=3^5=243
    
Post 17 Oct 2025, 00:08
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20753
Location: In your JS exploiting you and your system
revolution 17 Oct 2025, 06:30
In general x-to-power-of-n can be computed in no more than [ bsr(n) + popcnt(n) - 1 ] multiplies.
Post 17 Oct 2025, 06:30
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 17 Oct 2025, 14:37
Quote:

[ bsr(n) + popcnt(n) - 1 ]

Very interesting see example how do this.
Post 17 Oct 2025, 14:37
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8464
Location: Kraków, Poland
Tomasz Grysztar 17 Oct 2025, 17:14
Roman wrote:
Quote:

[ bsr(n) + popcnt(n) - 1 ]

Very interesting see example how do this.
Take the binary representation of the number, for example 13 = 1101b:
13 = 1101b = 1\cdot{2^3} + 1\cdot{2^2} + 0\cdot{2^1} + 1\cdot{2^0}

x^{13} = x^{2^3+2^2+2^0} = x^{2^3}\cdot{x^{2^2}}\cdot{x}

So if we have the square powers of x ready, we only need to perform 2 more multiplications - this is the "popcnt(n) - 1" portion of revolution's formula.

Now, taking into consideration that x^{2^i}=(x^{2^{i-1}})^2 it follows that we can compute x^{2^i} by squaring i times, and we get all the intermediate squares on the way. This is the "bsr(n)" part of the formula.

For n = 13 we get this sequence of multiplications:
a = x*x
b = a*a
c = b*b
x^13 = c*b*x
For final calculation you take the product of all the consecutive squares and remove the ones that correspond to zeros in the binary representation of n. Here, because 13 = 1101b, it meant removing "a" from "c*b*a*x" (as "a" corresponds to the only zero in 1101b).
Post 17 Oct 2025, 17:14
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 18 Oct 2025, 08:13
Quote:

[ bsr(n) + popcnt(n) - 1 ]

I thinked this code help do two multiplication.
But if not, this code only confusing and complicating. But not help.
Post 18 Oct 2025, 08:13
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2685
Furs 18 Oct 2025, 21:40
Roman wrote:
Quote:

[ bsr(n) + popcnt(n) - 1 ]

I thinked this code help do two multiplication.
But if not, this code only confusing and complicating. But not help.
No, it tells you how many multiplications you will need.

It doesn't do the calculation itself.
Post 18 Oct 2025, 21:40
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 19 Oct 2025, 00:18
And for this reason I better do this instead bsr(n) + popcnt(n) - 1
Code:
movss xmm0, [x] 
movss xmm1, xmm0
mov ecx, pow-1
Lp: mulss xmm1, xmm0
dec ecx
jnz Lp
    
Post 19 Oct 2025, 00:18
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2685
Furs 19 Oct 2025, 21:24
Roman wrote:
And for this reason I better do this instead bsr(n) + popcnt(n) - 1
Code:
movss xmm0, [x] 
movss xmm1, xmm0
mov ecx, pow-1
Lp: mulss xmm1, xmm0
dec ecx
jnz Lp
    
Sure, if you don't mind doing 6 multiplications instead of 3 for the same thing.
Post 19 Oct 2025, 21:24
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 20 Oct 2025, 04:50
O.
I understand.
Code:
x dd 2.0

    mov eax,8
    mov ecx, eax        

    bsr ebx, eax        ; EBX = position of highest set bit

    popcnt eax, ecx     ; EAX = number of set bits

    ;add eax, ebx
    ;dec eax
    

         movss xmm0, [x]
         movss xmm1, xmm0
         lea ecx,[ eax-1+ebx]
aaLp:    mulss xmm1, xmm1
         dec ecx
         jnz aaLp    

xmm1=2f^8=256f
Post 20 Oct 2025, 04:50
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 20 Oct 2025, 06:31
ok. How I must do if 2^10 ?

I try my code and get 2^16=65536 but this is not right.
Post 20 Oct 2025, 06:31
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20753
Location: In your JS exploiting you and your system
revolution 20 Oct 2025, 06:44
This is the first link I found using duckduckgo:

https://cp-algorithms.com/algebra/binary-exp.html
Post 20 Oct 2025, 06:44
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 20 Oct 2025, 06:47
I write this code. its ok for 2^10=1024
Code:
 
x dd 2.0

    mov eax,8+2
    mov ecx, eax        ; Save original n
    
    ; bsr(n)
    bsr ebx, eax        ; EBX = position of highest set bit

    ; popcnt(n)  
    popcnt eax, ecx     ; EAX = number of set bits 

       movss xmm0, [x]
       movss xmm1, xmm0
       mov ecx,ebx
aaLp:  mulss xmm1, xmm1
       dec ecx
       jnz aaLp

       mov ecx,eax
baLp:  mulss xmm1, xmm0
       dec ecx
       jnz baLp            
Post 20 Oct 2025, 06:47
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1196
Location: Russia
macomics 20 Oct 2025, 06:50
let n = 9 = 1001b; x = 3
p = x^8 * x
let a = x * x
p = a^4 * x
let b = a * a; (or x*x*x*x)
p = b^2 * x
let c = b * b
p = c * x

in total (4 mul)
a = x * x
b = a * a
c = b * b
p = c * x

verify: 3^9=19683 | a=3*3=9; b=9*9=81; c=81*81=6561; p=6561*3=19683
---------------------


let n = 27 = 11011b; x = 2
p = x^16 * x^8 * x^2 * x
let a = x * x
p = a^8 * a^4 * a * x
let b = a * a
p = b^4 * b^2 * a * x
let c = b * b
p = c^2 * c * a * x
let d = c * c
p = d * c * a * x

in total (7 mul)
a = x * x
b = a * a
c = b * b
d = c * c
p = d * c * a * x

verify: 2^27=134217728 | a=2*2=4; b=4*4=16; c=16*16=256; d=256*256=65536; p=65536*256*4*2=134217728

https://dzen.ru/b/ZLRuI_S-kniRjdTN

ADD: fast exponentiation polynomial (in common)
p=(n[high_bit]*x^(high_bit+1)) * (n[high_bit-1]*x^high_bit) * ... * (n[2]*x^3) * (n[1]*x^2) * (n[0]*x^1)

Code:
// Pseudocode 2^27
const x = 2, n = 27;
const v = highBitSet(n); // v = 4
let p = 1;
for (let i = 0, u = x; i <= v; i += 1) {
  if (n & 2^i !== 0) p *= u;
  if (i < v) u *= u; // reduce one extra mul
}
    


Last edited by macomics on 20 Oct 2025, 07:10; edited 3 times in total
Post 20 Oct 2025, 06:50
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 20 Oct 2025, 07:04
my code2. 2^9=512
Code:
   x dd 2.0
    mov eax,8+1
    mov ecx, eax        ; Save original n
    
    ; bsr(n)
    bsr ebx, eax        ; EBX = position of highest set bit

    ; popcnt(n)  
    popcnt eax, ecx     ; EAX = number of set bits


 rept 1 {
        movss xmm0, [x]
        movss xmm1, xmm0
        mov ecx,ebx
aaLp:   mulss xmm1, xmm1
        dec ecx
        jnz aaLp

        cmp al,1
        jbe @f
        mov ecx,eax
        dec ecx
baLp:
        mulss xmm1, xmm0
        dec ecx
        jnz baLp
@@:
}                   
Post 20 Oct 2025, 07:04
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 20 Oct 2025, 07:14
Intel not doing his job well !
The shame not have on modern CPU pow asm instruction!
Must be one asm command mulxss xmm1,reg or xmm or memory.
mulxss equivalent my code from previous message.
Post 20 Oct 2025, 07:14
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1196
Location: Russia
macomics 20 Oct 2025, 07:21
Code:
.data
  const1 dd 1.0
  x dd 2.0
  n dd 27
  p dd ?
.code
  movss xmm1, [x] ; u
  movss xmm0, [const1] ; p
  mov ecx, [n] ; 27
  jecxz @exit
  cmp ecx, 1
  ja @f
  movss [p], xmm1
  jmp @quit
  @@:
    shr ecx, 1
    jnc @next
      mulss xmm0, xmm1
    @next:
    mulss xmm1, xmm1
    cmp ecx, 1
    ja @b
  mulss xmm0, xmm1
  @exit:
  movss [p], xmm0
  @quit:    

ADD:
Roman wrote:
mulxss equivalent my code from previous message.
@Ronan, your code is incorrect!


Last edited by macomics on 20 Oct 2025, 07:52; edited 5 times in total
Post 20 Oct 2025, 07:21
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 20 Oct 2025, 07:31
Intel will write correct:)
Post 20 Oct 2025, 07:31
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 2016
Roman 20 Oct 2025, 07:40
Thanks macomics for good code.
Post 20 Oct 2025, 07:40
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.