flat assembler
Message board for the users of flat assembler.
![]() Goto page 1, 2 Next |
Author |
|
revolution 16 Oct 2025, 23:42
A power of 6 can be done with three multiplies.
a = x*x ; x^2 b = a*x ; x^3 c = b*b ; x^6 |
|||
![]() |
|
Roman 17 Oct 2025, 00:08
Sorry my mistake.
Code: ;x^8 align 32 x dd 1f,2f,3f,4f,5f,6f,7f,8f vmovaps ymm0,yword [x] vmulps ymm0, ymm0, ymm0 ; x^2 vmulps ymm0, ymm0, ymm0 ; x^4 vmulps ymm0, ymm0, ymm0 ; x^8 ;after in ymm0 1f^8 ,2f^8 ,3f^8 ,4f^8 ,5f^8 ,6f^8 ,7f^8 ,8f^8 Code: x dd 3.0 movss xmm1,[x] movss xmm0,xmm1 mulss xmm0,xmm0 ;x^2 mulss xmm0,xmm0 ;x^4 mulss xmm1,xmm0 ;x^5 xmm1=3^5=243 |
|||
![]() |
|
revolution 17 Oct 2025, 06:30
In general x-to-power-of-n can be computed in no more than [ bsr(n) + popcnt(n) - 1 ] multiplies.
|
|||
![]() |
|
Roman 17 Oct 2025, 14:37
Quote:
Very interesting see example how do this. |
|||
![]() |
|
Tomasz Grysztar 17 Oct 2025, 17:14
Roman wrote:
![]() ![]() So if we have the square powers of x ready, we only need to perform 2 more multiplications - this is the "popcnt(n) - 1" portion of revolution's formula. Now, taking into consideration that ![]() ![]() For n = 13 we get this sequence of multiplications: a = x*x b = a*a c = b*b x^13 = c*b*x For final calculation you take the product of all the consecutive squares and remove the ones that correspond to zeros in the binary representation of n. Here, because 13 = 1101b, it meant removing "a" from "c*b*a*x" (as "a" corresponds to the only zero in 1101b). |
|||
![]() |
|
Roman 18 Oct 2025, 08:13
Quote:
I thinked this code help do two multiplication. But if not, this code only confusing and complicating. But not help. |
|||
![]() |
|
Furs 18 Oct 2025, 21:40
Roman wrote:
It doesn't do the calculation itself. |
|||
![]() |
|
Roman 19 Oct 2025, 00:18
And for this reason I better do this instead bsr(n) + popcnt(n) - 1
Code: movss xmm0, [x] movss xmm1, xmm0 mov ecx, pow-1 Lp: mulss xmm1, xmm0 dec ecx jnz Lp |
|||
![]() |
|
Furs 19 Oct 2025, 21:24
Roman wrote: And for this reason I better do this instead bsr(n) + popcnt(n) - 1 |
|||
![]() |
|
Roman 20 Oct 2025, 04:50
O.
I understand. Code: x dd 2.0 mov eax,8 mov ecx, eax bsr ebx, eax ; EBX = position of highest set bit popcnt eax, ecx ; EAX = number of set bits ;add eax, ebx ;dec eax movss xmm0, [x] movss xmm1, xmm0 lea ecx,[ eax-1+ebx] aaLp: mulss xmm1, xmm1 dec ecx jnz aaLp xmm1=2f^8=256f |
|||
![]() |
|
Roman 20 Oct 2025, 06:31
ok. How I must do if 2^10 ?
I try my code and get 2^16=65536 but this is not right. |
|||
![]() |
|
revolution 20 Oct 2025, 06:44
|
|||
![]() |
|
Roman 20 Oct 2025, 06:47
I write this code. its ok for 2^10=1024
Code: x dd 2.0 mov eax,8+2 mov ecx, eax ; Save original n ; bsr(n) bsr ebx, eax ; EBX = position of highest set bit ; popcnt(n) popcnt eax, ecx ; EAX = number of set bits movss xmm0, [x] movss xmm1, xmm0 mov ecx,ebx aaLp: mulss xmm1, xmm1 dec ecx jnz aaLp mov ecx,eax baLp: mulss xmm1, xmm0 dec ecx jnz baLp |
|||
![]() |
|
macomics 20 Oct 2025, 06:50
let n = 9 = 1001b; x = 3
p = x^8 * x let a = x * x p = a^4 * x let b = a * a; (or x*x*x*x) p = b^2 * x let c = b * b p = c * x in total (4 mul) a = x * x b = a * a c = b * b p = c * x verify: 3^9=19683 | a=3*3=9; b=9*9=81; c=81*81=6561; p=6561*3=19683 --------------------- let n = 27 = 11011b; x = 2 p = x^16 * x^8 * x^2 * x let a = x * x p = a^8 * a^4 * a * x let b = a * a p = b^4 * b^2 * a * x let c = b * b p = c^2 * c * a * x let d = c * c p = d * c * a * x in total (7 mul) a = x * x b = a * a c = b * b d = c * c p = d * c * a * x verify: 2^27=134217728 | a=2*2=4; b=4*4=16; c=16*16=256; d=256*256=65536; p=65536*256*4*2=134217728 https://dzen.ru/b/ZLRuI_S-kniRjdTN ADD: fast exponentiation polynomial (in common) p=(n[high_bit]*x^(high_bit+1)) * (n[high_bit-1]*x^high_bit) * ... * (n[2]*x^3) * (n[1]*x^2) * (n[0]*x^1) Code: // Pseudocode 2^27 const x = 2, n = 27; const v = highBitSet(n); // v = 4 let p = 1; for (let i = 0, u = x; i <= v; i += 1) { if (n & 2^i !== 0) p *= u; if (i < v) u *= u; // reduce one extra mul } Last edited by macomics on 20 Oct 2025, 07:10; edited 3 times in total |
|||
![]() |
|
Roman 20 Oct 2025, 07:04
my code2. 2^9=512
Code: x dd 2.0 mov eax,8+1 mov ecx, eax ; Save original n ; bsr(n) bsr ebx, eax ; EBX = position of highest set bit ; popcnt(n) popcnt eax, ecx ; EAX = number of set bits rept 1 { movss xmm0, [x] movss xmm1, xmm0 mov ecx,ebx aaLp: mulss xmm1, xmm1 dec ecx jnz aaLp cmp al,1 jbe @f mov ecx,eax dec ecx baLp: mulss xmm1, xmm0 dec ecx jnz baLp @@: } |
|||
![]() |
|
Roman 20 Oct 2025, 07:14
Intel not doing his job well !
The shame not have on modern CPU pow asm instruction! Must be one asm command mulxss xmm1,reg or xmm or memory. mulxss equivalent my code from previous message. |
|||
![]() |
|
macomics 20 Oct 2025, 07:21
Code: .data const1 dd 1.0 x dd 2.0 n dd 27 p dd ? .code movss xmm1, [x] ; u movss xmm0, [const1] ; p mov ecx, [n] ; 27 jecxz @exit cmp ecx, 1 ja @f movss [p], xmm1 jmp @quit @@: shr ecx, 1 jnc @next mulss xmm0, xmm1 @next: mulss xmm1, xmm1 cmp ecx, 1 ja @b mulss xmm0, xmm1 @exit: movss [p], xmm0 @quit: ADD: Roman wrote: mulxss equivalent my code from previous message. Last edited by macomics on 20 Oct 2025, 07:52; edited 5 times in total |
|||
![]() |
|
Roman 20 Oct 2025, 07:31
Intel will write correct:)
|
|||
![]() |
|
Roman 20 Oct 2025, 07:40
Thanks macomics for good code.
|
|||
![]() |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.