flat assembler
Message board for the users of flat assembler.
Index
> Windows > Sieve of Eratosthenes crash |
Author |
|
iklin 24 Dec 2004, 18:46
You use JL and JGE instead of JB and JAE.
Try to look into fasm's doc. |
|||
24 Dec 2004, 18:46 |
|
Kris_A 24 Dec 2004, 19:11
Worked great, thanks I'll have to remember that in the future...
|
|||
24 Dec 2004, 19:11 |
|
Madis731 26 Dec 2004, 19:40
Hi, if you're interested I have a 16% speed improvement to your code without needing to change vary much.
It starts like this: Code: .code start: ; Note: EAX reserved for MUL\loops\etc MOV ebx, 2 ; Prime counter set to min=2 MOV edi, 5 ; Outer loop ctr 2-4 are checked MOV edx, 0 ; Inner loop ctr ;------------------------------ ; Reset sieve array ;------------------------------ xor eax,eax ;It's good to zero out eax too mov [sieve+eax*4+00],1 mov [sieve+eax*4+04],1 mov [sieve+eax*4+08],0 mov [sieve+eax*4+12],0 mov [sieve+eax*4+16],1 mov [sieve+eax*4+20],0 mov eax,6 myloop: mov [sieve+eax*4+00],1 mov [sieve+eax*4+04],0 mov [sieve+eax*4+08],1 mov [sieve+eax*4+12],1 mov [sieve+eax*4+16],1 mov [sieve+eax*4+20],0 add eax,6 cmp eax,size - (size mod 6) jc myloop ...but if you need further improvement you should make your initialized table 2 times smaller so you check ONLY odd numbers and this way you can make your code 50% faster. You might want to try filtering out %3-s at the start but it only gives you about 15% more performance (because 6, 12 etc are already divizable by two). P.S. BTW, nice code you got there. How did you come to that algorithm - something on the net or your own ideas? |
|||
26 Dec 2004, 19:40 |
|
r22 27 Jan 2005, 02:09
;something like this?
XOR eax,eax PXOR mm0,mm0 MOVQ qword[sieve+eax*8],mm0 MOVQ qword[sieve+eax*8+8],mm0 MOVQ qword[sieve+eax*8+16],mm0 MOVQ qword[sieve+eax*8+24],mm0 ADD eax,4 reset: MOVQ qword[sieve+eax*8],mm0 MOVQ qword[sieve+eax*8+8],mm0 MOVQ qword[sieve+eax*8+16],mm0 MOVQ qword[sieve+eax*8+24],mm0 ADD eax,4 CMP eax, (size / 8) ;better used on large SIZEs JB reset |
|||
27 Jan 2005, 02:09 |
|
Madis731 27 Jan 2005, 10:41
Yes MMX gains speed even more, but my idea was to show the techique
behind my algorith. That was to precalculate some known iterations like even numbers (2,4,6,...) and mod(3)=0 numbers like 3,6,9,... Your code combined with mine makes the code smaller and it has less moves look@this: Code: .code start: ; Note: EAX reserved for MUL\loops\etc MOV ebx, 2 ; Prime counter set to min=2 MOV edi, 5 ; Outer loop ctr 2-4 are checked MOV edx, 0 ; Inner loop ctr ;------------------------------ ; Reset sieve array ;------------------------------ pattern1 dq (1 shl 32 + 1) ;2^32+1 pattern2 dq (0 shl 32 + 0) ;Zero pattern3 dq (0 shl 32 + 1) ;One movq mm0,[pattern1] movq mm1,[pattern2] movq mm2,[pattern3] xor eax,eax ;It's good to zero out eax too mov [sieve+eax*8+00],mm0 mov [sieve+eax*8+08],mm1 mov [sieve+eax*8+16],mm2 mov eax,3 myloop: mov [sieve+eax*8+00],mm2 mov [sieve+eax*8+08],mm0 mov [sieve+eax*8+16],mm2 add eax,3 cmp eax,size - (size mod 6) jc myloop |
|||
27 Jan 2005, 10:41 |
|
r22 28 Jan 2005, 03:30
the benchmark on that must be something compared to the original
|
|||
28 Jan 2005, 03:30 |
|
Madis731 28 Jan 2005, 13:32
Sorry, I didn't follow - was it a positive or negative comment^o)
Kris_A's original or some other original on google? |
|||
28 Jan 2005, 13:32 |
|
codelab 02 Oct 2006, 20:47
Hi, all, Another optimization of primesearching:
Code: ; simple prime search by masking out divisible numbers with increasing step thus leaving primes ; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ;2: X X X X X X X X X ;3: X X X X X X ;5: X X X include 'win32ax.inc' maxp=1000000 sqrmaxp=1000 .code sieve: mov esi, pbuf mov eax, 2 stepfill: mov ecx, eax l1: add ecx, eax cmp ecx, maxp jge l2 mov byte [esi+ecx],1 jmp l1 l2: inc eax cmp byte [esi+eax],0 jne l2 cmp eax, sqrmaxp jle stepfill invoke ExitProcess,0 section '.data' data readable writeable pbuf db 1,1,maxp-2 dup(0) .end sieve had fewer iterations caused by the startstep is less or equal to sqrt(prime range) |
|||
02 Oct 2006, 20:47 |
|
codelab 03 Oct 2006, 09:38
Im terrible sorry, had a second dwell about it, starting values for stepfilling shall be a prime squared as should JGE l2 be JG l2:
Code: ; simple prime search by masking out divisible numbers ; with increasing step thus leaving primes ; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ;2: X X X X X X X X X ;3: X X X X X X ;5: X X X include 'win32ax.inc' maxp=1000000 sqrmaxp=1000 .code sieve: mov esi, pbuf mov edi, 2 stepfill: mov eax, edi mul eax mov ecx, eax l1: cmp ecx, maxp jg l2 mov byte [esi+ecx],1 add ecx, edi jmp l1 l2: inc edi cmp byte [esi+edi],0 jne l2 cmp edi, sqrmaxp jle stepfill invoke ExitProcess,0 section '.data' data readable writeable pbuf db 1,1,maxp-2 dup(0) .end sieve |
|||
03 Oct 2006, 09:38 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.