flat assembler
Message board for the users of flat assembler.

 Index > Windows > Specific usage of bignum multiplication Goto page 1, 2  Next
Author
AlexP

Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
I've read all that I can find in two day's time for the most efficient way of multiplying together two bignum's, but most are for a certain range (way above my applications). So, I decided that it would be best to start a thread devoted to this specific matter, not a generic one like the bignum thread or the several division threads.

My range is for the RSA public key (n = pq) multiplication, where the size of n is between 512 and 16384 bits. This gives the bit length of p and q (assuming that they are of equal length) to be between 256 and 8192 bits. This gives the number of applications of whatever multplication algorithm I use to be between 8 and 256 (assuming it inputs two dwords). My question is what is the most efficient way to multiply the two together for this range?

I have looked up every algo I could find, and I would not like to do an FFT implementation for such a low range, or a SchÃ¶nhage-Strassen algorithm. I would rather have either of these choices, and I really need help on the subject because I've never done this before:

http://en.wikipedia.org/wiki/Multiplication_algorithm

1) "Schoolboy multiplication", aka multiply every dword in one array with every dword in the other, running time is O(n^2) I believe. An example is in the bignum thread.
http://board.flatassembler.net/topic.php?p=65726

2) "Peasant algorithm", quoted as "For each '1' bit in the multiplier, shift the multiplicand an appropriate amount and then sum the shifted values."
http://en.wikipedia.org/wiki/Peasant_multiplication

3) "Karatsuba", running time O(N^1.58) or so, it uses a divide-and-conquer method for each of two operands.
http://en.wikipedia.org/wiki/Karatsuba_algorithm

4) "Shift+Add", I do believe this is the peasant algorithm...
http://courses.cs.vt.edu/~cs1104/BuildingBlocks/multiply.040.html

Well, I would love advice from someone who has done this before. As I said, I do not want to choose anything more complex because I've read that it only saves time for VERY large numbers, like tens of thousands of digits. Anything at all would be helpful!
14 Apr 2008, 01:46
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17478
revolution
I used the basic schoolboy for the small numbers used in RSA. Because of the overhead, Karatsuba doesn't become faster until a few times bigger numbers. FFT does not become useful until ~10000 digits (depending upon the implementation).
14 Apr 2008, 02:43
AlexP

Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Yeah, I read that FFT is for tens of thousands of digits... I'm going to stick with the schoolboy, but I will do time tests to check if the shift/add (aka peasant, aka egyption, aka russian, you get it..) algorithm would work any better. Might as well, I never settle for something that's easy .

PS: Do you think it was a good idea to allow RSA key sizes up to 16,384 bits? I don't know how long encryption/decryption would take for that size, but I'll try it out whenever I get it working...
14 Apr 2008, 02:48
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17478
revolution
For the key size, sure. I set mine for any length and the code automatically chooses the multiplication algo based upon the size of the inputs. It switches to Karatsuba at a certain point and then to FFT after that. I tested some very large pq pairs just for fun, but it did take a while (many days) to find the primes when I got to >20000 digits for p and q. Usually the GCD(p,q) was very large and I would then search for another pair.

Forget about the peasant algo, it can't be faster than the schoolboy mult, too much overhead and not enough work done for each pass.
14 Apr 2008, 02:53
AlexP

Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Okay, and I didn't see FFT or Karatsuba in your code... I guess I wasn't reading it right, could you explain what the Karatsuba looks like? (In the macro code you used)
14 Apr 2008, 02:56
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17478
revolution
Oh no, not in the macros I posted here. The code I am talking about is asm code, which has not been posted here.
14 Apr 2008, 02:58
AlexP

Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Okay, so your code that is not posted I'm assuming you will not even give me a glimpse of?

Quote:
Usually the GCD(p,q) was very large and I would then search for another pair.
Isn't the GCD of any two prime numbers 1? Wait, that might be something else...
14 Apr 2008, 02:59
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17478
revolution
AlexP wrote:
Okay, so your code that is not posted I'm assuming you will not even give me a glimpse of?
It is owned by my company. I cannot post it.
AlexP wrote:
Quote:
Usually the GCD(p,q) was very large and I would then search for another pair.
Isn't the GCD of any two prime numbers 1? Wait, that might be something else...
Sorry, I meant GCD(p-1,q-1) for the totient function.
14 Apr 2008, 03:11
AlexP

Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Okay, just making sure you weren't crazy about GCD(p,q).

Well, maybe one day I can spend my time working like you do, and I can have my own RSA lib!

Q1: Does your company allow you to use your own code? (fear of losing it/decompilation?) Or do they do like the movie Paycheck and wipe your memory after an assignment .

Q2: How early did you start work like this?! I mean, were you > 15 years of age when you started out in this line of study?
14 Apr 2008, 03:19
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17478
revolution
A1: I usually don't offer my own code.

A2: 13yrs old when I started with Z80 assembly to compute a few thousand digits of PI.
14 Apr 2008, 03:33
AlexP

Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
R1: Ohhh, so they force it from you? (I'm kidding...)

R2: Darn! Well, maybe I'm not too late to start learning this.... I did try the digits of PI thing, but all I had to go on was some C code, and it didn't work ... It was my first main project in ASM too..
14 Apr 2008, 11:54
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17478
revolution
AlexP wrote:
R2: Darn! Well, maybe I'm not too late to start learning this.... I did try the digits of PI thing, but all I had to go on was some C code, and it didn't work ... It was my first main project in ASM too..
Follow the algo's, not someone's buggy C code.
14 Apr 2008, 12:06
AlexP

Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Actually it was C# code, and I gave up that project because it was my first, and I could not get it working! < I wonder why >. I will re-visit it eventually if it hits my interest...

Question: The Miller-Rabin test seems nice, it will be tough to actually implement though, but how many applications do you think I should apply? I mean, if I'm scanning a range of thousands or tens of thousands of numbers (maybe more...) and testing for primality, how many times do you think I should use it?
14 Apr 2008, 14:18
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17478
revolution
AlexP wrote:
Question: The Miller-Rabin test seems nice, it will be tough to actually implement though, but how many applications do you think I should apply? I mean, if I'm scanning a range of thousands or tens of thousands of numbers (maybe more...) and testing for primality, how many times do you think I should use it?
To achieve a probability of error less than 2^100 for a random k-bit integer, one should choose the number t of trials according to the following list.
Code:
```k in bits: t
0- 160:34
161- 163:33
164- 166:32
167- 169:31
170- 173:30
174- 177:29
178- 181:28
182- 185:27
186- 190:26
191- 195:25
196- 201:24
202- 208:23
209- 215:22
216- 222:21
223- 231:20
232- 241:19
242- 252:18
253- 264:17
265- 278:16
279- 294:15
295- 313:14
314- 334:13
335- 360:12
361- 392:11
393- 430:10
431- 479: 9
480- 542: 8
543- 626: 7
627- 746: 6
747- 926: 5
927-1232: 4
1233-1853: 3
1854-  oo: 2    ```
14 Apr 2008, 15:35
r22

Joined: 27 Dec 2004
Posts: 805
r22
@people who like code

You can always use SSE to speed it up the multiplication.
PMULUDQ
Then once you have the Qwords just use an offset ADC to combine them back into a result.

***DISCLAIMER*** The following was written while at work (in the browser) so it's UNtested. The compiler that resides between my ears has validated it, but its been known to make up opcodes, misspell label names and not check array bounds.

Code:
```.data
align 16
p rd 512
;; d1, 0, d2, 0, d3, 0, etc...
q rd 512
;; d1, 0, d2, 0, d3, 0, etc...
ntemp rd 512 + 16;; padding so carry code doesn't bomb
;; q1l, q1h, q2l, q2h, etc...
nresult rd 512 + 16;; padding so carry code doesn't bomb

.code
Mul8192By8192To16384: ;;SSE2 compatible (for legacies sake)
;; ASSUME nresult is zeroed, and p and q are filled in the correct format

push ebp ebx esi edi

xor esi,esi ;;multiplier index (copied dword) p [dwX][0][dwX][0]

.multiplier:

;;clear ntemp
pxor xmm0,xmm0
mov eax,512*8-16
.clearntemp:
movdqa [ntemp+eax],xmm0
sub eax,16
jns .clearntemp

movd xmm0, [esi+p]
xor edi,edi ;;multiplicand index (unpacked dwords) q [dw1][0][dw2][0]
;;and temp result n index [dq1][dq2]
pshufd xmm0, xmm0, 11000100b ;;copy dword to upper

.multiply:
movdqa xmm1, [edi+q]

;;multiply d2dq
pmuludq xmm1,xmm0
;;store temp result
movdqa [edi+ntemp],xmm1

;;increment
test edi,512*4
jz .multiply

;;compress ntemp and add to nresult
;;start by adding high dword of qword with low dword of next qword and correcting for carry
mov edx,4
mov eax,[edx+ntemp+4]
lea ecx,[edx+8]
.carry:
lea ecx,[ecx+4]
jc .carry
cmp edx,512*4+4
;;compress [d1][d2][~][d3][~][d4] etc... into [d1][d2][d3][d4] etc...
mov edx,8
mov ecx,12
.compress:
mov eax,[ecx]
mov [edx],eax
cmp ecx,512*4+4
jl .compress

mov edi,esi ;;nresult offset
xor ecx,ecx ;;ntemp offset
shl edi,1 ;;/2 qword->dword
clc ;;shl might does this ? already
mov eax,[ecx+ntemp]
lea edx,[edi+4]
.carr:
lea edx,[edx+4]
jc .carr
test edi,512*4

test esi,512*4
jz .multiplier

pop ebp edi esi ebx
ret 0
```

If you notice errors repost the code snippet fixed for the benefit of future readers of the thread. If it's totally unusable I'll just edit/remove it.
14 Apr 2008, 17:08
AlexP

Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Yeah, I was thinking about using large SSE instructions, but I have not studied SSE and decided against it. I will try your code later, thank for the go at it!

PS: Isn't there supposed to be two loops? One inner loop to multiply one dword in array_1 by every dword in array_2, and another outer loop to increment the dword in array_1? This is what I found in another implementation...

Revolution: I should ask, where did you get those statistics from? Last I checked, the Rabin was 1/4 wrong when it reported prime.
14 Apr 2008, 20:35
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17478
revolution
AlexP wrote:
Revolution: I should ask, where did you get those statistics from? Last I checked, the Rabin was 1/4 wrong when it reported prime.
Yes, a common trap for newbies. A little more reading and you will discover that 1/4 is a worst case upper bound. But my source was:
Code:
```IEEE P1363 / D13 (Draft Version 13). Standard Specifications for Public Key Cryptography
Annex A (Informative).
Number-Theoretic Background.
A.15.2 Testing a Randomly Generated Integer for Primality    ```
If that source is wrong then all the banking websites are going to be easily hacked.
15 Apr 2008, 00:36
AlexP

Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Wow, I didn't know something like that existed. I'll have to read that. And yes, I am a newbie. I just got to the section of the 'pseudoprimes' chapter where I glanced at the other side of the page, and saw '1/4', and the relative equations. Would be nice if they were wrong though . I'll get those specs to study.

I'm glad that someone actually took the incentive to list the steps for Rabin step-by-step . I will have fun figuring out how to do modular exponentiation, but that is what researching is for. The other problem, finding n-1 = w(2^v) has caught my attention as a fun little problem. I'm thinking that there HAS to be some easier way to figure that out in binary... I'll get it sooner or later, thanks again
15 Apr 2008, 00:55
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17478
revolution
If you like the IEEE P1363 then you should also get X9.62 standard. But be careful, both documents are buggy, you need to compare to each other and do your own tests to check which is the correct information. Even where the two docs agree you must also check against other sources for correctness.
15 Apr 2008, 01:08
AlexP

Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Hmm, I tried some x9.31 I believe, apparently I have to buy them?! ...

PS: is X9.62 about ECC?! Everywhere I go I cannot get near an ANSII spec because I have to pay for them, or login. Is there some site I can go to to view ANSII specs?
15 Apr 2008, 01:29
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsCompiler InternalsIDE DevelopmentOS ConstructionNon-x86 architecturesHigh Level LanguagesProgramming Language DesignProjects and IdeasExamples and Tutorials Other----------------FeedbackHeapTest Area
Goto page 1, 2  Next

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum