how do x^6 on sse or avx ?

Index > Windows > how do x^6 on sse or avx ?

Goto page 1, 2 Next

Author

Thread

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 16 Oct 2025, 22:10

I search simple way do this x*x*x*x*x*x.
I not like do 5 times mulss xmm1,xmm1.
I searching couple sse or avx asm commands for this task.

I found this.

Code:

pow(base,power) = exp( power * log(base) )

In this case do 5 times mulss xmm1,xmm1 not bad solution.

AVX have vhaddps horizontal apply.
Maybe exist horizontal multiplication?
ymm0 could multiply 8 floats numbers.

16 Oct 2025, 22:10

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20864
Location: In your JS exploiting you and your system

revolution 16 Oct 2025, 23:42

A power of 6 can be done with three multiplies.

a = x*x ; x^2
b = a*x ; x^3
c = b*b ; x^6

16 Oct 2025, 23:42

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 17 Oct 2025, 00:08

Sorry my mistake.

Code:

;x^8
align 32
x dd 1f,2f,3f,4f,5f,6f,7f,8f

vmovaps ymm0,yword [x]
    vmulps ymm0, ymm0, ymm0    ; x^2
    vmulps ymm0, ymm0, ymm0    ; x^4
    vmulps ymm0, ymm0, ymm0    ; x^8  

;after  in ymm0 1f^8 ,2f^8 ,3f^8 ,4f^8 ,5f^8 ,6f^8 ,7f^8 ,8f^8

Code:

x dd 3.0

movss xmm1,[x]
movss xmm0,xmm1
mulss xmm0,xmm0 ;x^2
mulss xmm0,xmm0 ;x^4
mulss xmm1,xmm0 ;x^5 xmm1=3^5=243

17 Oct 2025, 00:08

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20864
Location: In your JS exploiting you and your system

revolution 17 Oct 2025, 06:30

In general x-to-power-of-n can be computed in no more than [ bsr(n) + popcnt(n) - 1 ] multiplies.

17 Oct 2025, 06:30

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 17 Oct 2025, 14:37

Quote:

[ bsr(n) + popcnt(n) - 1 ]

Very interesting see example how do this.

17 Oct 2025, 14:37

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8501
Location: Kraków, Poland

Tomasz Grysztar 17 Oct 2025, 17:14

Roman wrote:

Quote:

[ bsr(n) + popcnt(n) - 1 ]

Very interesting see example how do this.

Take the binary representation of the number, for example 13 = 1101b:
$13 = 1101b = 1\cdot{2^3} + 1\cdot{2^2} + 0\cdot{2^1} + 1\cdot{2^0}$

$x^{13} = x^{2^3+2^2+2^0} = x^{2^3}\cdot{x^{2^2}}\cdot{x}$

So if we have the square powers of x ready, we only need to perform 2 more multiplications - this is the "popcnt(n) - 1" portion of revolution's formula.

Now, taking into consideration that $x^{2^i}=(x^{2^{i-1}})^2$ it follows that we can compute $x^{2^i}$ by squaring i times, and we get all the intermediate squares on the way. This is the "bsr(n)" part of the formula.

For n = 13 we get this sequence of multiplications:
a = x*x
b = a*a
c = b*b
x^13 = c*b*x
For final calculation you take the product of all the consecutive squares and remove the ones that correspond to zeros in the binary representation of n. Here, because 13 = 1101b, it meant removing "a" from "c*b*a*x" (as "a" corresponds to the only zero in 1101b).

17 Oct 2025, 17:14

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 18 Oct 2025, 08:13

Quote:

[ bsr(n) + popcnt(n) - 1 ]

I thinked this code help do two multiplication.
But if not, this code only confusing and complicating. But not help.

18 Oct 2025, 08:13

Furs

Joined: 04 Mar 2016
Posts: 2712

Furs 18 Oct 2025, 21:40

Roman wrote:

Quote:

[ bsr(n) + popcnt(n) - 1 ]

I thinked this code help do two multiplication.
But if not, this code only confusing and complicating. But not help.

No, it tells you how many multiplications you will need.

It doesn't do the calculation itself.

18 Oct 2025, 21:40

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 19 Oct 2025, 00:18

And for this reason I better do this instead bsr(n) + popcnt(n) - 1

Code:

movss xmm0, [x] 
movss xmm1, xmm0
mov ecx, pow-1
Lp: mulss xmm1, xmm0
dec ecx
jnz Lp

19 Oct 2025, 00:18

Furs

Joined: 04 Mar 2016
Posts: 2712

Furs 19 Oct 2025, 21:24

Roman wrote:

And for this reason I better do this instead bsr(n) + popcnt(n) - 1
Code:
movss xmm0, [x] 
movss xmm1, xmm0
mov ecx, pow-1
Lp: mulss xmm1, xmm0
dec ecx
jnz Lp
    

Sure, if you don't mind doing 6 multiplications instead of 3 for the same thing.

19 Oct 2025, 21:24

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 20 Oct 2025, 04:50

O.
I understand.

Code:

x dd 2.0

    mov eax,8
    mov ecx, eax        

    bsr ebx, eax        ; EBX = position of highest set bit

    popcnt eax, ecx     ; EAX = number of set bits

    ;add eax, ebx
    ;dec eax
    

         movss xmm0, [x]
         movss xmm1, xmm0
         lea ecx,[ eax-1+ebx]
aaLp:    mulss xmm1, xmm1
         dec ecx
         jnz aaLp

xmm1=2f^8=256f

20 Oct 2025, 04:50

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 20 Oct 2025, 06:31

ok. How I must do if 2^10 ?

I try my code and get 2^16=65536 but this is not right.

20 Oct 2025, 06:31

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20864
Location: In your JS exploiting you and your system

revolution 20 Oct 2025, 06:44

This is the first link I found using duckduckgo:

https://cp-algorithms.com/algebra/binary-exp.html

20 Oct 2025, 06:44

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 20 Oct 2025, 06:47

I write this code. its ok for 2^10=1024

Code:

 
x dd 2.0

    mov eax,8+2
    mov ecx, eax        ; Save original n
    
    ; bsr(n)
    bsr ebx, eax        ; EBX = position of highest set bit

    ; popcnt(n)  
    popcnt eax, ecx     ; EAX = number of set bits 

       movss xmm0, [x]
       movss xmm1, xmm0
       mov ecx,ebx
aaLp:  mulss xmm1, xmm1
       dec ecx
       jnz aaLp

       mov ecx,eax
baLp:  mulss xmm1, xmm0
       dec ecx
       jnz baLp

20 Oct 2025, 06:47

macomics

Joined: 26 Jan 2021
Posts: 1208
Location: Russia

macomics 20 Oct 2025, 06:50

let n = 9 = 1001b; x = 3
p = x^8 * x
let a = x * x
p = a^4 * x
let b = a * a; (or x*x*x*x)
p = b^2 * x
let c = b * b
p = c * x

in total (4 mul)
a = x * x
b = a * a
c = b * b
p = c * x

verify: 3^9=19683 | a=3*3=9; b=9*9=81; c=81*81=6561; p=6561*3=19683
---------------------

let n = 27 = 11011b; x = 2
p = x^16 * x^8 * x^2 * x
let a = x * x
p = a^8 * a^4 * a * x
let b = a * a
p = b^4 * b^2 * a * x
let c = b * b
p = c^2 * c * a * x
let d = c * c
p = d * c * a * x

in total (7 mul)
a = x * x
b = a * a
c = b * b
d = c * c
p = d * c * a * x

verify: 2^27=134217728 | a=2*2=4; b=4*4=16; c=16*16=256; d=256*256=65536; p=65536*256*4*2=134217728

https://dzen.ru/b/ZLRuI_S-kniRjdTN

ADD: fast exponentiation polynomial (in common)
p=(n[high_bit]*x^(high_bit+1)) * (n[high_bit-1]*x^high_bit) * ... * (n[2]*x^3) * (n[1]*x^2) * (n[0]*x^1)

Code:

// Pseudocode 2^27
const x = 2, n = 27;
const v = highBitSet(n); // v = 4
let p = 1;
for (let i = 0, u = x; i <= v; i += 1) {
  if (n & 2^i !== 0) p *= u;
  if (i < v) u *= u; // reduce one extra mul
}

Last edited by macomics on 20 Oct 2025, 07:10; edited 3 times in total

20 Oct 2025, 06:50

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 20 Oct 2025, 07:04

my code2. 2^9=512

Code:

   x dd 2.0
    mov eax,8+1
    mov ecx, eax        ; Save original n
    
    ; bsr(n)
    bsr ebx, eax        ; EBX = position of highest set bit

    ; popcnt(n)  
    popcnt eax, ecx     ; EAX = number of set bits


 rept 1 {
        movss xmm0, [x]
        movss xmm1, xmm0
        mov ecx,ebx
aaLp:   mulss xmm1, xmm1
        dec ecx
        jnz aaLp

        cmp al,1
        jbe @f
        mov ecx,eax
        dec ecx
baLp:
        mulss xmm1, xmm0
        dec ecx
        jnz baLp
@@:
}

20 Oct 2025, 07:04

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 20 Oct 2025, 07:14

Intel not doing his job well !
The shame not have on modern CPU pow asm instruction!
Must be one asm command mulxss xmm1,reg or xmm or memory.
mulxss equivalent my code from previous message.

20 Oct 2025, 07:14

macomics

Joined: 26 Jan 2021
Posts: 1208
Location: Russia

macomics 20 Oct 2025, 07:21

Code:

.data
  const1 dd 1.0
  x dd 2.0
  n dd 27
  p dd ?
.code
  movss xmm1, [x] ; u
  movss xmm0, [const1] ; p
  mov ecx, [n] ; 27
  jecxz @exit
  cmp ecx, 1
  ja @f
  movss [p], xmm1
  jmp @quit
  @@:
    shr ecx, 1
    jnc @next
      mulss xmm0, xmm1
    @next:
    mulss xmm1, xmm1
    cmp ecx, 1
    ja @b
  mulss xmm0, xmm1
  @exit:
  movss [p], xmm0
  @quit:

ADD:

Roman wrote:

mulxss equivalent my code from previous message.

@Ronan, your code is incorrect!

Last edited by macomics on 20 Oct 2025, 07:52; edited 5 times in total

20 Oct 2025, 07:21

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 20 Oct 2025, 07:31

Intel will write correct:)

20 Oct 2025, 07:31

Roman

Joined: 21 Apr 2012
Posts: 2048

Roman 20 Oct 2025, 07:40

Thanks macomics for good code.

20 Oct 2025, 07:40

Goto page 1, 2 Next

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum