flat assembler
Message board for the users of flat assembler.

Index > Tutorials and Examples > [Algorithms] Finding primes ...

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 4227
Location: vpcmpistri
bitRAKE 21 Mar 2025, 13:13
Typically, in practical use, only a couple primes are needed - usually as part of a larger algorithm and during program initialization. We can get away with a much simpler function than if we were trying to list a bunch of primes.
Code:
; Reduce RCX if not prime (or two for 0 and 1). (no data version)
find_prime:
        cmp rcx, 3              ; Special cases are clamped to prime two.
        jc .two
        bt ecx, 0               ; Adjust for even numbers, and counteract
        adc rcx, 1              ; below subtraction by two.
        jc .over                ; Two overflow case require special handling.
.next_candidate:
        push 1
        push 1
        pop r9                  ; 
        pop r8                  ; N, divisor
        sub rcx, 2              ; prior odd canidate

.try:   lea rdx, [4*r8 + 4]     ; next odd square, (N+2)² =  +4N +4
        sub r8, -2              ; next odd number, N+2
        add r9, rdx
        jc .prime
        mov rax, rcx
        xor edx, edx
        cmp rcx, r9             ; If the square of the divisor exceeds
        jc .prime               ; the number it must be prime.
        div r8
        test rdx, rdx
        jnz .try

        cmp rcx, 3+1
        jnc .next_candidate
        push 3
        pop rcx
.prime: retn

.two:   push 2
        pop rcx
        retn

.over:  push (1 shl 64) - 59 ; largest prime < 2^64
        pop rcx
        retn

; ECX is always prime, CF=1    
Just to be clear, this is a very slow method to discover many primes, but if you need a couple - it's perfectly fine for the whole 64-bit range. It's designed to always return a prime. So, for zero and one the prime two is returned. Note how all the overflow conditions are handled to support the full unsigned range.

At the other end of the spectrum is the prime sieve.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 21 Mar 2025, 13:13
View user's profile Send private message Visit poster's website Reply with quote
Mike Gonta



Joined: 26 Dec 2010
Posts: 245
Mike Gonta 22 Mar 2025, 13:27
Here is a simple algorithm from the-idiom.
Code:
to decide if a @number is prime is:
  it's false if the number is < 2;
  it's true if the number is 2;
  put the number - 1 into another number;
  loop
    it's false if the number is divisible by the other number;
    subtract 1 from the other number;
    it's true if the other number is 1;
  repeat;    

_________________
Mike Gonta
look and see - many look but few see

https://mikegonta.com
Post 22 Mar 2025, 13:27
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4227
Location: vpcmpistri
bitRAKE 22 Mar 2025, 20:51
That's essentially it, but I've halved the loop and it works bottom up - because of the likelihood of divisibility by smaller numbers. Working bottom up also requires an upper bound - the square root of the candidate number.

The prime/2 plus the prime gap times the first divisor determines how long it takes to find a prime. This can be substantial.
Code:
        or rbx, -1
.more:  mov rcx, rbx
        call find_prime

; The `wsprintf` function supports the "I64" prefix for use with 64-bit
; values - otherwise they will be truncated. (Use with: d, i, o, u, x, or X.)

        mov r8, rcx
        wsprintfA & .buffer, <A "%I64u, ">, r8
        xchg r8d, eax ; characters
        WriteConsoleA [.hStdOut], & .buffer, r8d, 0, 0

        shr rbx, 1
        jnz .more    
... taking noticeable time on the top end. If one was expecting many values in this range, a better choice would be a strong pseudoprime test.
Wikipedia wrote:
For example, there is no composite < 2^64 that is a strong pseudoprime to all of the seven bases 2, 325, 9375, 28178, 450775, 9780504, and 1795265022.
Implementation is much more complex, though. And it's slow enough that trial division is still faster on the lower end.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 22 Mar 2025, 20:51
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4227
Location: vpcmpistri
bitRAKE 23 Mar 2025, 21:54
Understanding the Prime-Finding Algorithm: From Theory to Implementation

When developing algorithms to find prime numbers, we often begin with simple approaches and refine them through several optimizations. The assembly code provided is a sophisticated implementation of prime number detection, incorporating multiple clever optimizations. Let's break down how this algorithm was developed, starting from first principles.

The Fundamental Concept: Testing for Primality

A prime number is defined as a natural number greater than 1 that cannot be formed by multiplying two smaller natural numbers. The most straightforward approach to determine if a number is prime is to check if it's divisible by any number from 2 up to the number itself (excluding 1 and itself).

However, we can immediately make two key observations:
  1. We only need to check divisibility up to the square root of the number
  2. If a number isn't prime, we can find the next prime by decrementing and testing again

Initial Optimization: Square Root Boundary

The first significant optimization comes from mathematics: if a number n is not prime, it must have a divisor less than or equal to √n. This dramatically reduces our search space.

In our assembly code, this principle appears as:
Code:
        cmp rcx, r9             ; If the square of the divisor exceeds
        jc .prime               ; the number it must be prime.    

Arithmetic Progression Optimization

When testing consecutive odd numbers, calculating the square of each divisor would be inefficient. Instead, the algorithm uses an arithmetic progression to track the squares:
Code:
        lea rdx, [4*r8 + 4]     ; next odd square, (N+2)² =  +4N +4
        sub r8, -2              ; next odd number, N+2
        add r9, rdx    

This clever approach avoids multiplication operations. If we have the square of n, we can compute the square of n+2 using:
(n+2)² = n² + 4n + 4

By tracking both the current divisor (n) and its square (n²), we can efficiently generate the next values using addition.

Focusing on Odd Numbers

Another key optimization is testing only odd divisors (except for the special case of 2):
Code:
        bt ecx, 0               ; Adjust for even numbers, and counteract
        adc rcx, 1              ; below subtraction by two.    

This effectively halves the number of divisibility tests needed. The algorithm uses bit testing (bt) to check if the number is even, and adjusts accordingly.

Handling Edge Cases

The algorithm carefully handles several edge cases:

1. Numbers less than 3 are clamped to the prime 2:
Code:
        cmp rcx, 3              ; Special cases are clamped to prime two.
        jc .two    

2. Overflow conditions are properly managed:
Code:
        jc .over                ; Two overflow case require special handling.    


3. The largest prime less than 2⁶⁴ is returned for overflow cases:
Code:
.over:  push (1 shl 64) - 59    ; largest prime < 2^64
        pop rcx
        retn    

The Complete Algorithm Flow
  1. Initially check for special cases (numbers < 3, overflows)
  2. Ensure we're working with odd numbers
  3. For each candidate:
    • Begin with divisor 1 and prepare to increment to 3
    • For each odd divisor:
      1. Compute the next odd square using arithmetic progression
      2. Check if we've exceeded the square root boundary
      3. Perform division to test for primality
      4. If divisible, move to the next candidate
      5. If not divisible by any value up to square root, we've found a prime

Efficiency Considerations

The algorithm makes excellent use of register operations and minimizes memory access. By using arithmetic progressions instead of multiplication, it avoids costly operations. The focus on odd numbers immediately halves the work compared to naive approaches.

Final Thoughts

This assembly implementation demonstrates how mathematical understanding combined with low-level optimization techniques can produce highly efficient algorithms. The progression from the basic primality test to this optimized version shows the value of refining algorithms through mathematical insights and careful implementation.

What makes this algorithm particularly elegant is how it combines theoretical insights (like the square root boundary and odd-number testing) with practical assembly-level optimizations like register usage and arithmetic progressions.
Post 23 Mar 2025, 21:54
View user's profile Send private message Visit poster's website Reply with quote
six_L



Joined: 03 Jan 2005
Posts: 15
six_L 24 Mar 2025, 17:08
Hello,bitRAKE
interesting the algorithm.
Quote:
97, 97, 97, 97, 89, 89, 89, 89, 89, 89,
89, 89, 83, 83, 83, 83, 83, 83, 79, 79,
79, 79, 73, 73, 73, 73, 73, 73, 71, 71,
67, 67, 67, 67, 61, 61, 61, 61, 61, 61,
59, 59, 53, 53, 53, 53, 53, 53, 47, 47,
47, 47, 47, 47, 43, 43, 43, 43, 41, 41,
37, 37, 37, 37, 31, 31, 31, 31, 31, 31,
29, 29, 23, 23, 23, 23, 23, 23, 19, 19,
19, 19, 17, 17, 13, 13, 13, 13, 11, 11,
7, 7, 7, 7, 5, 5, 3, 3, 2, 2,

input 100, why did output many repetitive primes?
Post 24 Mar 2025, 17:08
View user's profile Send private message Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 515
Location: Czech republic, Slovak republic
Feryno 24 Mar 2025, 17:39
Great example and great explanation!
It is optimized for minimal size of the code.
If someone needs speed optimization, the loops should begin at 16 bytes boundary and also the used push/pop instructions are slower than a mov instruction. Something like this may run faster in some situations - when the loops are repeated a lot of times:

Code:
align 10h
find_prime:
        cmp rcx, 3              ; Special cases are clamped to prime two.
        jc .two
        bt ecx, 0               ; Adjust for even numbers, and counteract
        adc rcx, 1              ; below subtraction by two.
        jc .over                ; Two overflow case require special handling.
        mov r10d, 1             ; processor zero extends this mov from 32 to 64 bits

align 10h
.next_candidate:
        mov r9d, r10d           ; 
        mov r8d, r10d           ; N, divisor
        sub rcx, 2              ; prior odd candidate

align 10h
.try:   lea rdx, [4*r8 + 4]     ; next odd square, (N+2)² =  +4N +4
    


IIRC moves like mov r10d, 1 are faster than mov r10, 1 and then perhaps mov r8d, r10d faster than mov r8, r10 but I'm not sure about this. Moves that do not require REX prefix (like mov edx, eax) are 1 byte smaller than those which do require it (mov r8d, eax) - I'm sure at least in this. When the alignment adds a lot of dummy NOP instructions, the code size increases and then the CPU uses more time to load it into cache so in some situations smaller code with unaligned loops runs faster.

I remember a calculation of all primes upto some number using Eratosthenes sieve which I did decades ago and used maybe 8 or 16 MB of memory for the sieve (half of the RAM installed in the machine). I did that in not too much clever way because storing all numbers as a bit array, I could double the calculation range when storing only odd numbers.
Post 24 Mar 2025, 17:39
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4227
Location: vpcmpistri
bitRAKE 24 Mar 2025, 21:11
six_L wrote:
input 100, why did output many repetitive primes?
The purpose of the algorithm is to find one prime at or below the input number (or two). Often in algorithms we need a prime number of a certain bit length. So, we can generate a random number of that bit length and then the prime less than it. For example, a layered hash table.

That's also the reason I've coded it for size - to amplify the fact it is not for generating a list of primes.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup


Last edited by bitRAKE on 25 Mar 2025, 01:15; edited 3 times in total
Post 24 Mar 2025, 21:11
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4227
Location: vpcmpistri
bitRAKE 24 Mar 2025, 21:15
Feryno wrote:
Great example and great explanation!
A.I. is getting so good. The model (Claude 3.7) made a couple errors that I corrected, and I made the formatting of the text more condensed.

Is there a difference between NOP spacing the code, and just using sufficient encodings to get the alignment? fasm2 allows such flexibility, but I haven't coded this for speed, per se.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 24 Mar 2025, 21:15
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4227
Location: vpcmpistri
bitRAKE 24 Mar 2025, 23:15
Claude 3.7 missed a couple things of note: the numbers 3-8 directly return the close odd primes: 3,5,7 - without division. I count that as a corner case as well. The carry being set is superfluous, but a product of my design methodology - allowing the algorithm to be adapted to a number of scenarios. For example, I don't see any point in returning 0|1. Yet, a use might need to error on their use - easy to make that change.
Post 24 Mar 2025, 23:15
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1149
Location: Russia
macomics 25 Mar 2025, 03:42
@bitRAKE & @Feryno, this is how it happens to align start of the loop to 16 bytes.
Code:
; Reduce RCX if not prime (or two for 0 and 1). (no data version)
align 16
find_prime:
        ; cmp  rcx, 3           ; Special cases are clamped to prime two.
        db 0x48, 0x81, 0xF9, 0x03, 0x00, 0x00, 0x00
        jc  .two
        push r10
        ; push 1
        ; pop  r10
        db 0x49, 0xBA, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
        bt   ecx, 0             ; Adjust for even numbers, and counteract
        adc  rcx, r10           ; below subtraction by two.
        jc  .over               ; Two overflow case require special handling.
        xchg ax, ax             ; 2 bytes NOP

; align 16
.next_candidate:
        mov  r8, r10            ; N, divisor
        mov  r9, r10            ; 
        ; sub  rcx, 2           ; prior odd canidate
        db 0x48, 0x81, 0xE9, 0x02, 0x00, 0x00, 0x00
        db 0x66, 0x67, 0x90     ; 3 bytes NOP

; align 16
.try:
        lea  rdx, [4 * r8 + 4]  ; next odd square, (N+2)² =  +4N +4
        add  r8, 2              ; next odd number, N+2
        add  r9, rdx
        jc  .prime
        mov  rax, rcx
        xor  edx, edx
        cmp  rcx, r9            ; If the square of the divisor exceeds
        jc  .prime              ; the number it must be prime.
        div  r8
        test rdx, rdx
        jnz .try

        cmp  rcx, 3+1
        jnc .next_candidate
        lea  rcx, [3 * r10]

.prime:
        pop  r10
        retn

.over:
        push -59 ; (1 shl 64) - 59 ; largest prime < 2^64
        pop  rcx
        pop  r10
        retn

.two:
        push 2
        pop  rcx
        retn

; ECX is always prime, CF=1    
Post 25 Mar 2025, 03:42
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1149
Location: Russia
macomics 25 Mar 2025, 04:11
without r10
Code:
; Reduce RCX if not prime (or two for 0 and 1). (no data version)
align 16
find_prime:
        cmp  rcx, 3             ; Special cases are clamped to prime two.
        jc  .two
        bt   ecx, 0             ; Adjust for even numbers, and counteract
        adc  rcx, 1             ; below subtraction by two.
        jc  .over               ; Two overflow case require special handling.

; align 16
.next:
        mov  r8,  1             ; N, divisor
        mov  r9, r8             ; 
        sub  rcx, 2             ; prior odd canidate
        xchg ax, ax             ; 2 bytes NOP

; without NOP
;       sub  rcx, r8
;       sub  rcx, r9

; align 16
.try:
        lea  rdx, [4 * r8 + 4]  ; next odd square, (N+2)² =  +4N +4
        add  r8,  2             ; next odd number, N+2
        add  r9, rdx
        jc  .prime
        mov  rax, rcx
        xor  edx, edx
        cmp  rcx, r9            ; If the square of the divisor exceeds
        jc  .prime              ; the number it must be prime.
        div  r8
        test rdx, rdx
        jnz .try

        cmp  rcx, 3+1
        jnc .next
        mov  rcx, 3

; align 16
.prime:
        retn

align 16
.over:
        ; push -59 ; (1 shl 64) - 59 ; largest prime < 2^64
        ; pop  rcx
        mov rcx, -59
        retn

align 16
.two:
        mov rcx, 2
        retn

; ECX is always prime, CF=1    
Post 25 Mar 2025, 04:11
View user's profile Send private message Reply with quote
six_L



Joined: 03 Jan 2005
Posts: 15
six_L 25 Mar 2025, 09:59
Hello,bitRAKE
Thanks for your respones.

when I input (2^64 - 1 = 0FFFFFFFFFFFFFFFEh = 18446744073709551614d ), It was crashed.

Regards
six_L
Post 25 Mar 2025, 09:59
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4227
Location: vpcmpistri
bitRAKE 25 Mar 2025, 10:34
six_L wrote:
when I input (2^64 - 1 = 0FFFFFFFFFFFFFFFEh = 18446744073709551614d ), It was crashed.
How is that possible?
2^64-1 = 0xFFFFFFFFFFFFFFFF, which overflows, returning 2^64 - 59
0xFFFFFFFFFFFFFFFE, gets processed as 0xFFFFFFFFFFFFFFFD, which is divisible by 13, ... finally it returns 2^64 - 59.

I have no clue what you are experiencing.

Believe it or not, I did write test cases for this algorithm - just as a sanity check:
Code:
; https://en.wikipedia.org/wiki/Prime-counting_function
; Some known values, primes under a power of ten:
{const:8} label π10_Table:8
.temp = 1
iterate p,\
        4,\
        25,\
        168,\
        1229,\
        9592,\
        78498,\
        664579,\
        5761455,\
        50847534,\
        455052511,\
        4118054813,\
        37607912018,\
        346065536839,\
        3204941750802,\
        29844570422669,\
        279238341033925,\
        2623557157654233,\
        24739954287740860,\
        234057667276344607,\
        2220819602560918840,\
        21127269486018731928,\
        201467286689315906290,\
        1925320391606803968923,\
        18435599767349200867866,\
        176846309399143769411680,\
        1699246750872437141327603,\
        16352460426841680446427399,\
        157589269275973410412739598,\
        1520698109714272166094258063
        .temp = temp * 10
        if .temp < 1 shl 64
                {const:8} dq p
        else
                break
        end if
end iterate    
Code:
{const:16} label Two_Table:1
; Visit the Prime Pages: https://t5k.org/lists/2small/0bit.html
; For given <power> of two, the negative <offsets> are the distance to primes:
; (i.e. 2^8 - 5 = 251 is prime.)
iterate <power, offsets>,\
        5,      <1,3,9,13,15,19,21,25,27,29>,\
        6,      <3,5,11,17,21,23,27,33,35,41>,\
        7,      <1,15,19,21,25,27,31,39,45,49>,\
        8,      <5,15,17,23,27,29,33,45,57,59>,\
        9,      <3,9,13,21,25,33,45,49,51,55>,\
        10,     <3,5,11,15,27,33,41,47,53,57>,\
        11,     <9,19,21,31,37,45,49,51,55,61>,\
        12,     <3,5,17,23,39,45,47,69,75,77>,\
        13,     <1,13,21,25,31,45,69,75,81,91>,\
        14,     <3,15,21,23,35,45,51,65,83,111>,\
        15,     <19,49,51,55,61,75,81,115,121,135>,\
        16,     <15,17,39,57,87,89,99,113,117,123>,\
        17,     <1,9,13,31,49,61,63,85,91,99>,\
        18,     <5,11,17,23,33,35,41,65,75,93>,\
        19,     <1,19,27,31,45,57,67,69,85,87>,\
        20,     <3,5,17,27,59,69,129,143,153,185>,\
        21,     <9,19,21,55,61,69,105,111,121,129>,\
        22,     <3,17,27,33,57,87,105,113,117,123>,\
        23,     <15,21,27,37,61,69,135,147,157,159>,\
        24,     <3,17,33,63,75,77,89,95,117,167>,\
        25,     <39,49,61,85,91,115,141,159,165,183>,\
        26,     <5,27,45,87,101,107,111,117,125,135>,\
        27,     <39,79,111,115,135,187,199,219,231,235>,\
        28,     <57,89,95,119,125,143,165,183,213,273>,\
        29,     <3,33,43,63,73,75,93,99,121,133>,\
        30,     <35,41,83,101,105,107,135,153,161,173>,\
        31,     <1,19,61,69,85,99,105,151,159,171>,\
        32,     <5,17,65,99,107,135,153,185,209,267>,\
        33,     <9,25,49,79,105,285,301,303,321,355>,\
        34,     <41,77,113,131,143,165,185,207,227,281>,\
        35,     <31,49,61,69,79,121,141,247,309,325>,\
        36,     <5,17,23,65,117,137,159,173,189,233>,\
        37,     <25,31,45,69,123,141,199,201,351,375>,\
        38,     <45,87,107,131,153,185,191,227,231,257>,\
        39,     <7,19,67,91,135,165,219,231,241,301>,\
        40,     <87,167,195,203,213,285,293,299,389,437>,\
        41,     <21,31,55,63,73,75,91,111,133,139>,\
        42,     <11,17,33,53,65,143,161,165,215,227>,\
        43,     <57,67,117,175,255,267,291,309,319,369>,\
        44,     <17,117,119,129,143,149,287,327,359,377>,\
        45,     <55,69,81,93,121,133,139,159,193,229>,\
        46,     <21,57,63,77,167,197,237,287,305,311>,\
        47,     <115,127,147,279,297,339,435,541,619,649>,\
        48,     <59,65,89,93,147,165,189,233,243,257>,\
        49,     <81,111,123,139,181,201,213,265,283,339>,\
        50,     <27,35,51,71,113,117,131,161,195,233>,\
        51,     <129,139,165,231,237,247,355,391,397,439>,\
        52,     <47,143,173,183,197,209,269,285,335,395>,\
        53,     <111,145,231,265,315,339,343,369,379,421>,\
        54,     <33,53,131,165,195,245,255,257,315,327>,\
        55,     <55,67,99,127,147,169,171,199,207,267>,\
        56,     <5,27,47,57,89,93,147,177,189,195>,\
        57,     <13,25,49,61,69,111,195,273,363,423>,\
        58,     <27,57,63,137,141,147,161,203,213,251>,\
        59,     <55,99,225,427,517,607,649,687,861,871>,\
        60,     <93,107,173,179,257,279,369,395,399,453>,\
        61,     <1,31,45,229,259,283,339,391,403,465>,\
        62,     <57,87,117,143,153,167,171,195,203,273>,\
        63,     <25,165,259,301,375,387,391,409,457,471>,\
        64,     <59,83,95,179,189,257,279,323,353,363>,\
\; Not used for testing, currently:
        65,     <49,79,115,141,163,229,301,345,453,493>,\
        66,     <5,45,173,203,275,297,387,401,443,495>,\
        67,     <19,31,49,57,61,75,81,165,181,237>,\
        68,     <23,83,125,147,149,167,285,315,345,357>,\
        69,     <19,91,93,103,129,153,165,201,255,385>,\
        70,     <35,71,167,215,263,267,273,447,473,585>,\
        71,     <231,325,411,435,441,465,559,577,601,721>,\
        72,     <93,107,129,167,249,269,329,347,429,473>,\
        73,     <69,181,199,273,319,433,475,501,523,645>,\
        74,     <35,45,57,135,153,237,257,275,461,465>,\
        75,     <97,207,231,271,279,289,325,381,409,427>,\
        76,     <15,63,117,123,143,189,215,267,285,347>,\
        77,     <33,43,145,163,195,261,295,379,433,451>,\
        78,     <11,95,111,123,147,153,191,263,303,507>,\
        79,     <67,199,249,277,355,367,405,447,477,511>,\
        80,     <65,93,117,143,285,317,549,645,765,933>,\
        81,     <51,63,163,205,333,349,429,433,481,553>,\
        82,     <57,113,185,315,363,365,375,453,623,635>,\
        83,     <55,97,117,121,139,285,307,405,429,561>,\
        84,     <35,69,213,215,333,399,525,563,587,597>,\
        85,     <19,61,181,295,411,433,469,519,531,823>,\
        86,     <35,41,65,71,113,255,261,293,357,461>,\
        87,     <67,129,181,195,201,217,261,277,289,339>,\
        88,     <299,455,483,563,605,719,735,743,753,797>,\
        89,     <1,21,31,49,69,99,103,265,321,441>,\
        90,     <33,41,53,75,227,263,273,291,297,317>,\
        91,     <45,81,111,201,315,339,567,619,655,771>,\
        92,     <83,149,197,317,363,419,485,497,519,537>,\
        93,     <25,51,79,105,273,405,489,553,571,579>,\
        94,     <3,11,105,173,273,297,321,395,407,431>,\
        95,     <15,37,211,339,387,415,441,447,555,561>,\
        96,     <17,87,93,147,165,189,237,243,315,347>,\
        97,     <141,165,349,399,453,595,729,741,859,885>,\
        98,     <51,65,107,117,141,227,273,363,471,525>,\
        99,     <115,145,247,319,381,427,675,717,1207,1231>,\
        100,    <15,99,153,183,267,285,357,479,603,833>,\

; Note: the largest gap present above is 84. So, we can represent all the
; offsets as byte values: start at zero and every byte is added to the prior
; total, which is to be subtracted from the power of two.

        assert power - 4 = % ; no missing lines

; Note: Only values of 64-bits are less are being used.
        if power < 65
                .last = 0
;               db power
                iterate O,offsets
                        {const:16} db O - .last
                        .last = O
                end iterate
                {const:16} align 16 ; shift power to find offsets
        end if

; Range checking, used to debug list:
        .last = 0
        iterate O,offsets
                assert .last < O & %% = 10 ; increasing values
                if O - .last > 255 ; byte value?
                        repeat 1,p:power,o:O
                        display 10,9,'warning: 2^',`p,'-',`o,' exceeds byte scheme.'
                        end repeat
                end if
                .last = O
        end iterate
end iterate    
... covering both power of ten (counts) and power of two (skips) and bounds checking. fasm2 makes this easier.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 25 Mar 2025, 10:34
View user's profile Send private message Visit poster's website Reply with quote
six_L



Joined: 03 Jan 2005
Posts: 15
six_L 25 Mar 2025, 12:45
Hello,bitRAKE
It's my fault, the output buffer definition is too small.

A great algorithm !
All is fine.
18446744073709551614 >= Prime ( 18446744073709551557 )

if using the Eratosthenes sieve algorithm, It consumes a lot of memory[1048575(T)].

Quote:
2^64 = 18446744073709551615
18446744073709551615/2 = 9223372036854775807
9223372036854775807/8 = 1152921504606846975 (Bytes)
1152921504606846975 (Bytes)/1024 = 1125899906842623(KB)
1125899906842623(KB)/1024 = 1099511627775(M)
1099511627775(M)/1024 = 1073741823(G)
1073741823(G)/1024 = 1048575(T)

Quote:
Intel(R) Core(TM) i5-9400H CPU @ 2.50GHz
in: 0FFFFFFFFFFFFFFFEh
133597482 clock cycles, (bitRAKE:find_prime)x1, Prime: 18446744073709551557


Thank again.
Post 25 Mar 2025, 12:45
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4227
Location: vpcmpistri
bitRAKE 25 Mar 2025, 20:38
six_L wrote:
if using the Eratosthenes sieve algorithm, It consumes a lot of memory
I linked my bit sieve in the first post - very little memory and great for listing primes, or testing primality in a loop.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 25 Mar 2025, 20:38
View user's profile Send private message Visit poster's website Reply with quote
six_L



Joined: 03 Jan 2005
Posts: 15
six_L 26 Mar 2025, 02:55
Hello,bitRAKE
Quote:
At the other end of the spectrum is the prime sieve.

Thanks you for the link.

I haven't figured out your macro yet.
1, PRIME_LIMIT = ?
2, How do use the macros "PRIME_TAB_Initialize" and Print a prime?
Post 26 Mar 2025, 02:55
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4227
Location: vpcmpistri
bitRAKE 26 Mar 2025, 04:02
It's a little convoluted because it does positive and negative logic (primes can be 0-bits or 1-bits)*.

Looking at problems 3, we can see initialize requires:
Code:
; somewhere you can put data:
PRIME_LIMIT = 1_000_000 ; the limit of covered numbers
INCLUDE "prime_sieve.inc"

; in the code section:
PRIME_TAB_Initialize    
... and then NEXT_PRIME REG64 acts as an iterator with the register to index into the prime table. If REG64 is the table iterator then REG64*2+1 is the prime number to print -- that is how the bit indices map to number values.

It might be helpful to examine its use in other problems, too.

I'm happy to create a simplified example of just the bit sieve. You'd be surprised how fast a billion primes can be generated. Just post a template and I'll adapt the code to it.

* If you strip out the neg_logic (just assume it's zero) then it'll simplify the code. You could search the board as well - I've posted it here long ago without the polarity diversity.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 26 Mar 2025, 04:02
View user's profile Send private message Visit poster's website Reply with quote
Mike Gonta



Joined: 26 Dec 2010
Posts: 245
Mike Gonta 26 Mar 2025, 21:26
From the-idiom - what's your license bitRAKE?
FASMg is the hidden back end to generate the-idiom code from assembly language.
Code:
to decide if a @number is prime is:       
  $8B442404;                         //   mov eax, [esp+4]
  $83F802;                           //   cmp eax, 2
  $743B;                             //   je .exit
  $83F801;                           //   cmp eax, 1
  $7504;                             //   jne .1
  $F7D0;                             //   not eax
  $EB30;                             //   jmp .cmp_exit
                                     // .1:
  $0FBAE000;                         //   bt eax, 0
  $7327;                             //   jnc .setc_exit
  $89C1;                             //   mov ecx, eax
  $BE01000000;                       //   mov esi, 1
  $89F7;                             //   mov edi, esi
                                     // .loop:
  $8D14B504000000;                   //   lea edx, [esi*4+4]
  $83C602;                           //   add esi, 2
  $01D7;                             //   add edi, edx
  $7210;                             //   jc .setc_exit
  $89C8;                             //   mov eax, ecx
  $31D2;                             //   xor edx, edx
  $39F8;                             //   cmp eax, edi
  $7208;                             //   jc .setc_exit
  $F7F6;                             //   div esi
  $89D0;                             //   mov eax, edx
  $85C0;                             //   test eax, eax
  $75E2;                             //   jne .loop
                                     // .setc_exit:
  $0F92C0;                           //   setc al
                                     // .cmp_exit:
  $3C01;                             //   cmp al, 1
                                     // .exit:
  $C20400;                           //   ret 4    

_________________
Mike Gonta
look and see - many look but few see

https://mikegonta.com
Post 26 Mar 2025, 21:26
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4227
Location: vpcmpistri
bitRAKE 26 Mar 2025, 23:07
Mike Gonta wrote:
From the-idiom - what's your license bitRAKE?
Public Domain.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 26 Mar 2025, 23:07
View user's profile Send private message Visit poster's website Reply with quote
six_L



Joined: 03 Jan 2005
Posts: 15
six_L 27 Mar 2025, 10:48
Hello,bitRAKE
bitRAKE wrote:
You could search the board as well - I've posted it here long ago without the polarity diversity.


I have still not understood how the P003.asm works.
I searched the board again and again, could not find the worked codes(exe) of P003.asm.
Would you give me a direct link?

Regards
six_L
Post 27 Mar 2025, 10:48
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.