flat assembler
Message board for the users of flat assembler.

Index > Windows > Specific usage of bignum multiplication

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
I've read all that I can find in two day's time for the most efficient way of multiplying together two bignum's, but most are for a certain range (way above my applications). So, I decided that it would be best to start a thread devoted to this specific matter, not a generic one like the bignum thread or the several division threads.

My range is for the RSA public key (n = pq) multiplication, where the size of n is between 512 and 16384 bits. This gives the bit length of p and q (assuming that they are of equal length) to be between 256 and 8192 bits. This gives the number of applications of whatever multplication algorithm I use to be between 8 and 256 (assuming it inputs two dwords). My question is what is the most efficient way to multiply the two together for this range?

I have looked up every algo I could find, and I would not like to do an FFT implementation for such a low range, or a Schönhage-Strassen algorithm. I would rather have either of these choices, and I really need help on the subject because I've never done this before:

http://en.wikipedia.org/wiki/Multiplication_algorithm

1) "Schoolboy multiplication", aka multiply every dword in one array with every dword in the other, running time is O(n^2) I believe. An example is in the bignum thread.
http://board.flatassembler.net/topic.php?p=65726

2) "Peasant algorithm", quoted as "For each '1' bit in the multiplier, shift the multiplicand an appropriate amount and then sum the shifted values."
http://en.wikipedia.org/wiki/Peasant_multiplication

3) "Karatsuba", running time O(N^1.58) or so, it uses a divide-and-conquer method for each of two operands.
http://en.wikipedia.org/wiki/Karatsuba_algorithm

4) "Shift+Add", I do believe this is the peasant algorithm...
http://courses.cs.vt.edu/~cs1104/BuildingBlocks/multiply.040.html

Well, I would love advice from someone who has done this before. As I said, I do not want to choose anything more complex because I've read that it only saves time for VERY large numbers, like tens of thousands of digits. Anything at all would be helpful!
Post 14 Apr 2008, 01:46
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17669
Location: In your JS exploiting you and your system
revolution
I used the basic schoolboy for the small numbers used in RSA. Because of the overhead, Karatsuba doesn't become faster until a few times bigger numbers. FFT does not become useful until ~10000 digits (depending upon the implementation).
Post 14 Apr 2008, 02:43
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Yeah, I read that FFT is for tens of thousands of digits... I'm going to stick with the schoolboy, but I will do time tests to check if the shift/add (aka peasant, aka egyption, aka russian, you get it..) algorithm would work any better. Might as well, I never settle for something that's easy Smile.

PS: Do you think it was a good idea to allow RSA key sizes up to 16,384 bits? I don't know how long encryption/decryption would take for that size, but I'll try it out whenever I get it working...
Post 14 Apr 2008, 02:48
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17669
Location: In your JS exploiting you and your system
revolution
For the key size, sure. I set mine for any length and the code automatically chooses the multiplication algo based upon the size of the inputs. It switches to Karatsuba at a certain point and then to FFT after that. I tested some very large pq pairs just for fun, but it did take a while (many days) to find the primes when I got to >20000 digits for p and q. Usually the GCD(p,q) was very large and I would then search for another pair.

Forget about the peasant algo, it can't be faster than the schoolboy mult, too much overhead and not enough work done for each pass.
Post 14 Apr 2008, 02:53
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Okay, and I didn't see FFT or Karatsuba in your code... I guess I wasn't reading it right, could you explain what the Karatsuba looks like? (In the macro code you used)
Post 14 Apr 2008, 02:56
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17669
Location: In your JS exploiting you and your system
revolution
Oh no, not in the macros I posted here. The code I am talking about is asm code, which has not been posted here.
Post 14 Apr 2008, 02:58
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Okay, so your code that is not posted I'm assuming you will not even give me a glimpse of?

Quote:
Usually the GCD(p,q) was very large and I would then search for another pair.
Isn't the GCD of any two prime numbers 1? Wait, that might be something else...
Post 14 Apr 2008, 02:59
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17669
Location: In your JS exploiting you and your system
revolution
AlexP wrote:
Okay, so your code that is not posted I'm assuming you will not even give me a glimpse of?
It is owned by my company. I cannot post it.
AlexP wrote:
Quote:
Usually the GCD(p,q) was very large and I would then search for another pair.
Isn't the GCD of any two prime numbers 1? Wait, that might be something else...
Sorry, I meant GCD(p-1,q-1) for the totient function.
Post 14 Apr 2008, 03:11
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Smile Okay, just making sure you weren't crazy about GCD(p,q).

Well, maybe one day I can spend my time working like you do, and I can have my own RSA lib!

Q1: Does your company allow you to use your own code? (fear of losing it/decompilation?) Or do they do like the movie Paycheck and wipe your memory after an assignment Smile.

Q2: How early did you start work like this?! I mean, were you > 15 years of age when you started out in this line of study?
Post 14 Apr 2008, 03:19
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17669
Location: In your JS exploiting you and your system
revolution
A1: I usually don't offer my own code.

A2: 13yrs old when I started with Z80 assembly to compute a few thousand digits of PI.
Post 14 Apr 2008, 03:33
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
R1: Ohhh, so they force it from you? (I'm kidding...)

R2: Darn! Well, maybe I'm not too late to start learning this.... I did try the digits of PI thing, but all I had to go on was some C code, and it didn't work Sad... It was my first main project in ASM too..
Post 14 Apr 2008, 11:54
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17669
Location: In your JS exploiting you and your system
revolution
AlexP wrote:
R2: Darn! Well, maybe I'm not too late to start learning this.... I did try the digits of PI thing, but all I had to go on was some C code, and it didn't work Sad... It was my first main project in ASM too..
Follow the algo's, not someone's buggy C code.
Post 14 Apr 2008, 12:06
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Actually it was C# code, and I gave up that project because it was my first, and I could not get it working! < I wonder why Smile >. I will re-visit it eventually if it hits my interest...

Question: The Miller-Rabin test seems nice, it will be tough to actually implement though, but how many applications do you think I should apply? I mean, if I'm scanning a range of thousands or tens of thousands of numbers (maybe more...) and testing for primality, how many times do you think I should use it?
Post 14 Apr 2008, 14:18
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17669
Location: In your JS exploiting you and your system
revolution
AlexP wrote:
Question: The Miller-Rabin test seems nice, it will be tough to actually implement though, but how many applications do you think I should apply? I mean, if I'm scanning a range of thousands or tens of thousands of numbers (maybe more...) and testing for primality, how many times do you think I should use it?
To achieve a probability of error less than 2^100 for a random k-bit integer, one should choose the number t of trials according to the following list.
Code:
k in bits: t
   0- 160:34
 161- 163:33
 164- 166:32
 167- 169:31
 170- 173:30
 174- 177:29
 178- 181:28
 182- 185:27
 186- 190:26
 191- 195:25
 196- 201:24
 202- 208:23
 209- 215:22
 216- 222:21
 223- 231:20
 232- 241:19
 242- 252:18
 253- 264:17
 265- 278:16
 279- 294:15
 295- 313:14
 314- 334:13
 335- 360:12
 361- 392:11
 393- 430:10
 431- 479: 9
 480- 542: 8
 543- 626: 7
 627- 746: 6
 747- 926: 5
 927-1232: 4
1233-1853: 3
1854-  oo: 2    
Post 14 Apr 2008, 15:35
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
@people who like code

You can always use SSE to speed it up the multiplication.
PMULUDQ
Then once you have the Qwords just use an offset ADC to combine them back into a result.

***DISCLAIMER*** The following was written while at work (in the browser) so it's UNtested. The compiler that resides between my ears has validated it, but its been known to make up opcodes, misspell label names and not check array bounds.

Code:
.data
align 16
p rd 512
;; d1, 0, d2, 0, d3, 0, etc...
q rd 512
;; d1, 0, d2, 0, d3, 0, etc...
ntemp rd 512 + 16;; padding so carry code doesn't bomb
;; q1l, q1h, q2l, q2h, etc...
nresult rd 512 + 16;; padding so carry code doesn't bomb

.code
Mul8192By8192To16384: ;;SSE2 compatible (for legacies sake)
;; ASSUME nresult is zeroed, and p and q are filled in the correct format

push ebp ebx esi edi

xor esi,esi ;;multiplier index (copied dword) p [dwX][0][dwX][0]

.multiplier:

;;clear ntemp
pxor xmm0,xmm0
mov eax,512*8-16
.clearntemp:
movdqa [ntemp+eax],xmm0
sub eax,16
jns .clearntemp

;;load
movd xmm0, [esi+p]
xor edi,edi ;;multiplicand index (unpacked dwords) q [dw1][0][dw2][0]
;;and temp result n index [dq1][dq2]
pshufd xmm0, xmm0, 11000100b ;;copy dword to upper

.multiply:
movdqa xmm1, [edi+q]

;;multiply d2dq
pmuludq xmm1,xmm0
;;store temp result
movdqa [edi+ntemp],xmm1

;;increment
add edi,16
test edi,512*4
jz .multiply

;;compress ntemp and add to nresult
;;start by adding high dword of qword with low dword of next qword and correcting for carry
mov edx,4
.addwithcarrypropegation:
mov eax,[edx+ntemp+4]
add [edx+ntemp],eax
lea ecx,[edx+8]
.carry:
adc [ecx+ntemp],0
lea ecx,[ecx+4]
jc .carry
add edx,8
cmp edx,512*4+4
jl .addwithcarrypropegation
;;compress [d1][d2][~][d3][~][d4] etc... into [d1][d2][d3][d4] etc...
mov edx,8
mov ecx,12
.compress:
mov eax,[ecx]
mov [edx],eax
add ecx,8
add edx,4
cmp ecx,512*4+4
jl .compress

;;add ntemp to nresult
mov edi,esi ;;nresult offset
xor ecx,ecx ;;ntemp offset
shl edi,1 ;;/2 qword->dword
clc ;;shl might does this ? already
.addwithcarry:
mov eax,[ecx+ntemp]
adc [edi+nresult],eax
lea edx,[edi+4]
.carr:
adc [edx+nresult],0
lea edx,[edx+4]
jc .carr
add ecx,4
add edi,4
test edi,512*4
jz .addwithcarry

add esi,8
test esi,512*4
jz .multiplier

pop ebp edi esi ebx
ret 0
    


If you notice errors repost the code snippet fixed for the benefit of future readers of the thread. If it's totally unusable I'll just edit/remove it.
Post 14 Apr 2008, 17:08
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Yeah, I was thinking about using large SSE instructions, but I have not studied SSE and decided against it. I will try your code later, thank for the go at it!

PS: Isn't there supposed to be two loops? One inner loop to multiply one dword in array_1 by every dword in array_2, and another outer loop to increment the dword in array_1? This is what I found in another implementation...

Revolution: I should ask, where did you get those statistics from? Last I checked, the Rabin was 1/4 wrong when it reported prime.
Post 14 Apr 2008, 20:35
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17669
Location: In your JS exploiting you and your system
revolution
AlexP wrote:
Revolution: I should ask, where did you get those statistics from? Last I checked, the Rabin was 1/4 wrong when it reported prime.
Yes, a common trap for newbies. A little more reading and you will discover that 1/4 is a worst case upper bound. But my source was:
Code:
IEEE P1363 / D13 (Draft Version 13). Standard Specifications for Public Key Cryptography
Annex A (Informative).
Number-Theoretic Background.
A.15.2 Testing a Randomly Generated Integer for Primality    
If that source is wrong then all the banking websites are going to be easily hacked.
Post 15 Apr 2008, 00:36
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Wow, I didn't know something like that existed. I'll have to read that. And yes, I am a newbie. I just got to the section of the 'pseudoprimes' chapter where I glanced at the other side of the page, and saw '1/4', and the relative equations. Would be nice if they were wrong though Smile. I'll get those specs to study.

I'm glad that someone actually took the incentive to list the steps for Rabin step-by-step Smile. I will have fun figuring out how to do modular exponentiation, but that is what researching is for. The other problem, finding n-1 = w(2^v) has caught my attention as a fun little problem. I'm thinking that there HAS to be some easier way to figure that out in binary... I'll get it sooner or later, thanks again
Post 15 Apr 2008, 00:55
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17669
Location: In your JS exploiting you and your system
revolution
If you like the IEEE P1363 then you should also get X9.62 standard. But be careful, both documents are buggy, you need to compare to each other and do your own tests to check which is the correct information. Even where the two docs agree you must also check against other sources for correctness.
Post 15 Apr 2008, 01:08
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Hmm, I tried some x9.31 I believe, apparently I have to buy them?! ...

PS: is X9.62 about ECC?! Everywhere I go I cannot get near an ANSII spec because I have to pay for them, or login. Is there some site I can go to to view ANSII specs?
Post 15 Apr 2008, 01:29
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.