flat assembler
Message board for the users of flat assembler. Index > Main > the old square root problem, again
Author
 Thread  magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr 14 Aug 2015, 22:09
Hi all,

Instead of trying to re-invent the wheel...
I have an integer, x, for argument sake let x = 123456789 and I want to know if it is a perfect square (i.e. it's square root is an integer).
Assume I have list of square integers that includes all squares from 1 to x^4.
What is the best way of finding a reasonable approximation for a < sqrt(x) and a b > sqrt(x) where a and b are integers?

If these are reasonable approximations then a simple and quick rep, scas will find whether x is a perfect square or not. There would be no need to check sqrt whether by fp or gmp.

Any ideas would be greatly appreciated,

magic² 14 Aug 2015, 22:09  revolution When all else fails, read the source Joined: 24 Aug 2004 Posts: 19871 Location: In your JS exploiting you and your system revolution 14 Aug 2015, 22:47 There is a whole topic about this. Started by you. http://board.flatassembler.net/topic.php?t=13324 14 Aug 2015, 22:47
magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr 14 Aug 2015, 23:18
revolution wrote:

http://board.flatassembler.net/topic.php?t=13324

Hi revolution This is not the same question, hence the new thread. In the other thread I was asking the best way to *calculate* if a number was a perfect square.
In this case I am not trying to find the square root, just a 'reasonable' a < square root and a 'reasonable' b > square root. Then, as I said in my OP, if I have a list of squares from a to b, a simple scan would find if the number is in the list. without having to scan from 1² to x/2.

Bearing in mind that I need to do this for huge numbers (10^30+) a huge number of times, then as an example, I can say that for integer x then 1 <= sqrt x <= x/2 which is fine for small values of x, i.e. 1 <= sqrt 1,000,000 <= 500,000 leaves a relatively quick rep, scas of a list of squares between these values. When dealing with x = 490,000,000,000,000,000,000,000,000,000 then rep,scas from 1 to 245,000,000,000,000,000,000,000,000,000 on a list of squares is a long way to go to get the correct value of 700,000,000,000,000.
What I am asking is, what are better ways of getting estimates for a and b rather than the trivial a=1 <= sqrt x <= b=x/2

magic² 14 Aug 2015, 23:18  redsock

Joined: 09 Oct 2009
Posts: 407
Location: Australia
redsock 15 Aug 2015, 06:15
does the normal "fastest integer square root" method still do the trick (regardless of whether translated for bigints or not)?

http://www.codecodex.com/wiki/Calculate_an_integer_square_root ? 15 Aug 2015, 06:15  redsock

Joined: 09 Oct 2009
Posts: 407
Location: Australia
redsock 15 Aug 2015, 07:30
Sitting here on my day off (Saturday night), and thought I'd have a quick play with this to see what you're really up against I wrote this code against my own HeavyThing library's bigint implementation (https://2ton.com.au/HeavyThing/), and the code looks like:
Code:
```include '../../ht_defaults.inc'
include '../../ht.inc'

; two arguments: rdi == destination bigint, rsi == source bigint (not destroyed)
; performs an integer square root (truncating the decimals)
falign
isquare:
prolog  isquare
push    r12 r13 r14 r15 rdi
mov     rdi, rsi
call    bigint\$new_copy
mov     r12, rax                        ; x = source
call    bigint\$new
mov     r13, rax                        ; y = 0
call    bigint\$new
mov     r14, rax
call    bigint\$new
mov     r15, rax                        ; temp = 0
mov     rdi, r12
call    bigint\$bitcount
and     eax, not 1
jz      .alldone
mov     rdi, r14
mov     esi, eax
call    bigint\$bitset                   ; z = highest power of four <= source
calign
.outer:
mov     rdi, r15
mov     rsi, r13
call    bigint\$assign
mov     rdi, r15
mov     rsi, r14
call    bigint\$add_unsigned             ; temp = y + z
mov     rdi, r12
mov     rsi, r15
call    bigint\$compare_unsigned
cmp     eax, 0
jl      .nomod
mov     rdi, r12
mov     rsi, r15
call    bigint\$subtract                 ; x -= temp
mov     rdi, r15
mov     rsi, r14
call    bigint\$assign
mov     rdi, r15
mov     esi, 1
call    bigint\$shl
mov     rdi, r13
mov     rsi, r15
call    bigint\$add_unsigned             ; y += z * 2
.nomod:
mov     rdi, r13
mov     esi, 1
call    bigint\$shr                      ; y /= 2
mov     rdi, r14
mov     esi, 2
call    bigint\$shr                      ; z /= 4
mov     rdi, r14
call    bigint\$is_zero
test    eax, eax
jz      .outer
.alldone:
pop     rdi
mov     rsi, r13
call    bigint\$assign
; cleanup
mov     rdi, r15
pop     r15
call    bigint\$destroy
mov     rdi, r14
pop     r14
call    bigint\$destroy
mov     rdi, r13
pop     r13
call    bigint\$destroy
mov     rdi, r12
pop     r12
call    bigint\$destroy
epilog

public _start
falign
_start:
call    ht\$init

; timing test for 1,000,000 of these operations:
mov     ebx, 1000000

mov     rdi, .test
mov     esi, .testlen
call    bigint\$new_encoded
mov     r12, rax                ; our source
call    bigint\$new
mov     r13, rax                ; our destination
calign
.timing_loop:
mov     rdi, r13
mov     rsi, r12
call    isquare
sub     ebx, 1
jnz     .timing_loop

; debug our result (compare with wolframalpha to base 16 if you are interested)
mov     rdi, r13
call    bigint\$debug

mov     eax, syscall_exit
xor     edi, edi
syscall
dalign
.test:
; we want: 490,000,000,000,000,000,000,000,000,000, which is (in big endian form):
db      0x06, 0x2f, 0x46, 0x80, 0x09, 0xe5, 0x15, 0x74, 0x94, 0x10, 0x00, 0x00, 0x00
; or little endian:
; db    0x00, 0x00, 0x00, 0x10, 0x94, 0x74, 0x15, 0xe5, 0x09, 0x80, 0x46, 0x2f, 0x06
.testlen = \$ - .test

include '../../ht_data.inc'
```
Note that you'll likely have to adjust the include '../../ht goods if you move it somewhere else relative to my library and want to play around with it.

Anyway, 1M tests on my local dev environment doing the big integer square root only (older 3930k virtual machine) runs the entire test in 4.14 seconds, or 241,545 per second for a single thread.

(and that is with your example number of 490,000,000,000,000,000,000,000,000,000)

And then if I modify the code again to do the square (only the timing loop itself with the relevant modification bit is included here):
Code:
```calign
.timing_loop:
mov     rdi, r13
mov     rsi, r12
call    isquare

; modified bit to square the result so we can compare it to our source
mov     rdi, r13
call    bigint\$square
; and final comparison against the source to see if it was a perfect square or not
mov     rdi, r12
mov     rsi, r13
call    bigint\$compare_unsigned
; if eax == 0 here, perfect square it is
; end modified bit

sub     ebx, 1
jnz     .timing_loop

; debug our result (compare with wolframalpha to base 16 if you are interested)
; (which in our modified form will be equal to our input)
mov     rdi, r13
call    bigint\$debug

mov     eax, syscall_exit
xor     edi, edi
syscall
```

And this runs the entire 1M test in 4.215 seconds, or 237,247 per second to do the "is it a perfect square root" per your commentary...

Obviously the square operation is not expensive for such small numbers as your example...

That algorithm (the one I just hacked together for my bigint isquare) is by no means heavily optimised, and doesn't even follow the current accepted fastest method.

Did I misinterpret your Q, or am I on the right track here. 15 Aug 2015, 07:30  magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr 15 Aug 2015, 09:22
Hi Redsock,

I haven't had a chance to look through your code yet but thanks for the time you put in From your text it seems that you think I am trying to 'calculate' the square root. Here is a *small* example...
Assume I have

x dd 489
sqrs dd 1, 4, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961

I am trying to figure if there is a way to get a good approximation for the square root of x, for example sqrt(289) < sqrt(x) < sqrt(625)
I could then just do a repne scasd through 324, 361, 400, 441, 484, 529, 576 to see if x is in the list, i.e. a perfect square. There would be no need to scan through the whole array, or to take any square roots.

Thanks,

magic² 15 Aug 2015, 09:22  redsock

Joined: 09 Oct 2009
Posts: 407
Location: Australia
redsock 15 Aug 2015, 09:27
Isn't "integer square roots" equal to approximated results to begin with?

I kinda see where you are going, so perhaps the Q is better left to hardcore math heads rather than programmer heads like me You are looking for an equation to reduce the complexity of approximating square roots (which again IMO integer square roots already are) sorry I wasn't of more service. Please update here when you find the answer as now I am invested in the outcome!

Cheers 15 Aug 2015, 09:27  magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr 15 Aug 2015, 16:16
Hi Redsock,

Here's the actual problem I'm trying to overcome. I require numbers that are the sum of 2 squares in *at least* four ways. Although these can be found with prime factors of the form 4k + 3, when these factors have an even exponent, and can also have multiple factors of 2, I am interested in a very particular type. Any number that only has distinct prime factors of the form 4k + 1. If a number has 3 of these distinct factors it will be the sum of 2 squares in exactly four ways. The lowest of these is

1105 = 33^2 + 4^2 = 32^2 + 9^2 = 31^2 + 12^2 = 24^2 + 23^2

with more that 3 distinct prime factors 4k+1 the result with be the sum of 2 squares in an increasing amount of ways, which I also want.

Now, the problem...
(1) I can produce a list of various m^2 + n^2 = x, then check if x has factors of the of the required format, then find other m^2 + n^2 that equal the same x. This is very inefficient.
(2) I can multiply prime numbers of the required format to give me an x.
i.e. 5 * 13 * 17 = 1105. I know these numbers have the required property but do not know what the four ways that they are the sum of 2 squares is without checking.

These can be found in 2 ways (starting with a list of square numbers from 1 up)...
(2a) I can subtract increasing m^2 (m starting at 1) from x up to a limit of sqrt(x/2) and check if x - m^2 is a perfect square, or
(2b) I think this will be my preferred option unless someone proposes a better solution... as (2a) subtract increasing m^2 (m starting at 1) from x but this time up to a limit where x < m^2 and store each result of x - m^2, call these numbers n^2. The if I check each n^2 against the limit reached for m^2 in decreasing values, every value of n^2 that occurs in m^2 will be known to be a perfect square.

In English it sounds a lot more complex than it actually is. Here is a run for x = 5*13*17 = 1105

1105 - 1^2 = 1104
1105 - 2^2 = 1101
1105 - 3^2 = 1096
1105 - 4^2 = 1089
1105 - 5^2 = 1080
1105 - 6^2 = 1069
1105 - 7^2 = 1056
1105 - 8^2 = 1041
1105 - 9^2 = 1024
1105 - 10^2 = 1005
1105 - 11^2 = 984
1105 - 12^2 = 961
1105 - 13^2 = 936
1105 - 14^2 = 909
1105 - 15^2 = 880
1105 - 16^2 = 849
1105 - 17^2 = 816
1105 - 18^2 = 781
1105 - 19^2 = 744
1105 - 20^2 = 705
1105 - 21^2 = 664
1105 - 22^2 = 621
1105 - 23^2 = 576
1105 - 24^2 = 529
1105 - 25^2 = 480
1105 - 26^2 = 429
1105 - 27^2 = 376
1105 - 28^27 = 321
1105 - 29^2 = 264
1105 - 30^3 = 205
1105 - 31^2 = 144
1105 - 32^2 = 81
1105 - 33 ^2 = 16

checking values on the rhs against values in the middle the only numbers that appear in both are 16, 81, 144, 529, 576, 961, 1024, 1089. These give for four required sets.
I think this will be the most efficient as it only uses subtraction and checking.
However, when I get to *large* numbers (at *least* 10^25), which I will, this is still a very long-winded approach.

Unfortunately, I do not believe it is possible to *calculate* what the numbers are in the different ways a particular number is the sum of 2 squares.

i.e. I can say with certainty that

982,451,653 * 982,451,629 * 982,451,609 = 948,273,322,896,298,772,346,363,833

is the sum of 2 squares in exactly four ways, finding these four ways is not so easy, lol.

Thanks again and sorry for the lllooooonnnngggggggggg post,

magic² 15 Aug 2015, 16:16  Xorpd! Joined: 21 Dec 2006 Posts: 161 Xorpd! 15 Aug 2015, 20:19 While I was riding the bus, I saw an address that was 6161. Obvious prime factorization was 61*101. We have 61 = 5²+6² = |5+6i|², 101 = 1²+10² = |1+10i|^2. Then (5+6i)(1+10i) = -55+56i, so 6161 = |55+56i|² = 55²+56². Again, (5+6i)(1-10i) = 65-44i, so 6161 = |65+44i|² = 65²+44². Easy enough to do this in your head provided you are using public transportation and not driving. So if you can factor the number you need only solve three relatively easy problems instead of one really hard problem. But perhaps I don't understand the purpose of your research? 15 Aug 2015, 20:19
magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr 16 Aug 2015, 19:35
You're a genius Xorpd! I had tried something similar but kept tripping myself up.
Your solution also gives me the following for a number with 3 factors...
Let x = (a^2+b^2)(c^2+d^2)(e^2+f^2) then

For the big example I gave
x = 982,451,653 * 982,451,629 * 982,451,609 = 948,273,322,896,298,772,346,363,833

we get

x = 27817821433933^2 + 13207654355188^2
x = 30453641615477^2 + 4566074380952^2
x = 29922054378227^2 + 7276261724452^2
x = 30745933821163^2 + 1720719721792^2 Many thanks,

magic² 16 Aug 2015, 19:35  Xorpd! Joined: 21 Dec 2006 Posts: 161 Xorpd! 16 Aug 2015, 21:40 I offer a brief bibliography that may increase your enjoyment of this sort of problem. Recreations in the Theory of Numbers: the Queen of Mathematics Entertains, A. H. Beiler Introduction to Number Theory, James E. Shockley An Introduction to the Theory of Numbers, G. H. Hardy and Edward M. Wright Advanced Number Theory, Harvey Cohn 16 Aug 2015, 21:40
magicSqr

Joined: 27 Aug 2011
Posts: 105
magicSqr 16 Aug 2015, 22:11
Thanks again  16 Aug 2015, 22:11  Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First  Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum