flat assembler
Message board for the users of flat assembler.
Index
> Main > More faster realization of CRC32algorithm. Goto page Previous 1, 2 
Author 

rugxulo
There's a thread on news://comp.compression about speedy CRC algorithms (with an ANSI C implementation and a link to Intel's fancy speedup).


12 Mar 2008, 03:22 

bitRAKE
Intel's paper has the assembly code for the SliceBy4 algorithm. Although, they compare warm/cold input data, all results assume the 8k table is in the cache.
Code: CRC.Slice4: mov ecx,p_buf mov edx,buf_length cmp edx,0 je .x add edx,ecx or eax,1 push esi push edi .0: xor eax,[ecx] add ecx,4 movzx esi,al mov edi,[table_o56+esi*4] movzx esi,ah xor edi,[table_o48+esi*4] bswap eax movzx esi,ah xor edi,[table_o40+esi*4] movzx esi,al xor edi,[table_o32+esi*4] cmp ecx,edx jb .0 pop edi pop esi not eax


12 Mar 2008, 05:35 

baldr
Hi there!
Anybody noticed that remainder table can be folded? Sincerely yours, Sergio. 

19 Mar 2008, 19:43 

bitRAKE
what table? how folded? I didn't really notice. Can it be used to improve speed?


20 Mar 2008, 02:42 

baldr
For example, crc32_table[byte1 ^ byte2] == crc32_table[byte1] ^ crc32_table[byte2]; it's by design.
Values in crc32_table are remainders from polynomial division by 0x04C11DB7 modulo2. Intel paper gives partial insight in polynomial math. In short, you can calculate CRC32 for 16bit value using only crc32_table, though look it up several times (three, to be exact). Intel use another approach, they precalculate several tables. Seems strange to me, entire paper is about memory footprint... 

27 Jul 2008, 00:00 

baldr
Is anybody still interested in inner workings
of CRC calculation? 

13 Aug 2008, 21:54 

MCD
I just had a flash of a genius and got an idea about how to speed up CRCcalculations with ring theory from linear algebra, modulo arithmetics and divide & conquer algorithms, but I first have to check this out with pen and paper.
I will post back if I ever get something more tangible and sophisticated. 

13 Aug 2008, 22:26 

MCD
Just say if noone is interested in it since I don't want to post it if noone has any interest


22 Aug 2008, 09:53 

Grom PE
MCD, I'm interested.


22 Aug 2008, 10:12 

MCD
Fine. As I'm currently a "little" busy and I am actually "developping" the algorithm, including it's theoretical concepts (regardless if they actually already exist), I'm going to post my thoughts step by step over the next months
Also, because I want to make it as easily understandable as possible, I'm going to go real slow here and won't show any code untill the very end, which is quiet unusual for me. Also, any questions to this topic or help is apreciated. Code: Notation:  N: set of natural numbers Z: set of integer numbers #: digitwise concatenation of several numbers, e.g. 12#34 = 1234 if a number X is composed of digits like x[n]#x[n1]# ... #x[1]#x[0], then x[i]: denotes the ith digit dup(x,n): = x#x#x#...#x duplicates digit x n times and concatenates them into a number. +: usual addition in N : usual subtraction in N *: usual multiplication in N /: usual division in N (rounded down) mod: remainder of the above division in N ^: bitwise xor, also addition mod 2 in N °: bitwise xormultiply in N (like usual binary integer multiply, but instead of adding bits, just xoring them together) %: bitwise xordivide in N. The inverse of the above multiply. crc: The remainder of the above bitwise xordivide. also A+(B) is written as AB (( For those familiar with abstract algebra:  First, we need to recall some (basic) laws of modulo arithmetic with / and mod, especially its linearity  An euclidian ring is an expansion of a ring with the inverse of the multiply operation, called divide "/" and something like a remainder "mod" of that divide, e.g. where the euclidian algorithm is appropriable.  I use the structural homomorphism between the euclidian domains (Z,+,,*,/,mod) and (N,^,^,°,%,crc) (hope I'm gonna finish my proof) so that all laws from usual modulo arithmetics is also available with crcarithmetic!!!  I try to chose a fortunate polynomial expansion of some number in the euclidian ring of (N,^,^,°,%,crc) into multiple ^ed numbers with crcarithmetic (derived from modarithmetic) such that we can calculate each partialnumber in parallel and combine them at the end again with crcarithmetics. bitRAKE wrote: what table? how folded? I didn't really notice. The CRC of a number to a polynomial, denoted as X CRC P is some kind of a linear mapping, e.g.: Code: homogeneous: (a°X) CRC P = ( a ° (X CRC P) ) CRC P additive: (X ^ Y) CRC P = ( (X CRC P) ^ (Y CRC P) ) CRC P we can combine this to: (a°X ^ Y) CRC P = ( a ° (X CRC P) ^ (Y CRC P) ) CRC P for all a, X and Y out of N, preferably binary numbers. Thus, you can fold CRCcalculations. bitRAKE wrote: Can it be used to improve speed? )) _________________ MCD  the inevitable return of the Mad Computer Doggy __/ .+~ .  Last edited by MCD on 25 Aug 2008, 10:12; edited 4 times in total 

24 Aug 2008, 06:53 

MCD
Fine. Assume you got a number X and another P and you want to compute the remainder of X to P.
Code: Take X=39 and P=7 for example: 39 mod 7 = 4 of course applying modulo operations twice doesn't change the result: Code: (39 mod 7) mod 7 = 4 mod 7 = 4 = 39 mod 7 the general rule to this is: (X mod P) mod P = X mod P This is called idempotence. if you evoke the additive law from modulo arithmetic, you can rewrite this example as: Code: (15 + 24) mod 7 = ( (15 mod 7) + (24 mod 7) ) mod 7 = (8 + 3) mod 7 = 11 mod 7 = 4 = 39 mod 7 the general rule to this is: (X1 + X2) mod P = ((X1 mod P) + (X2 mod P)) mod P if you apply the homogenic law from modulo arithmetic, you can rewrite this example as: Code: (3 * 13) mod 7 = (3 * (13 mod 7)) mod 7 = (3 * 6) mod 7 = 18 mod 7 = 4 = 39 mod 7 the general rule to this is: (a*X) mod P = (a*(X mod P)) mod P we say, the modulo operation is a linear map or linear function modulo P. Now image your number X is composed of 2 digits, x1 and x0 like X= x1#x0. Then you can decompose your number X like this: Code: 39 mod 7 = ( 30 + 9 ) mod 7 = ( 30 mod 7 + 9 mod 7 ) mod 7 = ( (3*10) mod 7 + 9 mod 7 ) mod 7 = ( ((3 mod 7)*10) mod 7 + 9 ) mod 7 = ( (30) mod 7 + 9 ) mod 7 = ( 2 + 9 ) mod 7 = 11 mod 7 = 4 = 39 mod 7 another longer example: 5427 mod 7 = (5000 + 400 + 20 +7) mod 7 = ( (5*1000) mod 7 + (4*100) mod 7 + (2*10) mod 7 + 7 mod 7 ) mod 7 = ( (5 mod 7)*1000 + (4 mod 7)*100 + (2 mod 7)*10 + 7 mod 7 ) mod 7 ;using Horner scheme = ( (( (( ((5 mod 7)*10) mod 7 + 4 mod 7 )*10) mod 7 + 2 mod 7 )*10) mod 7 + 7 mod 7 ) mod 7 = ((((((5 mod 7)#0 mod 7 + 4) mod 7)#0 mod 7 + 2) mod 7)#0 mod 7 + 7) mod 7 We notice 2 interesting properties in the above example's last line which we gonna take advantage of later: I.] each X in "X mod P" above is <= 60 = "highest remainder of all digits in number base system" * "number base" = (P1) * "number base" II.] the number of different X's in above "(Xi mod P)#0 mod P" operations is limited to "number base"+P The general pattern goes like this: Code: x[n]#...#x[1]#x[0] mod P = (((...(x[n] mod P)#0 mod P ... + x[1]) mod P)#0 mod P + x[0]) mod P _________________ MCD  the inevitable return of the Mad Computer Doggy __/ .+~ .  Last edited by MCD on 25 Aug 2008, 03:36; edited 4 times in total 

24 Aug 2008, 14:19 

MCD
We could also use just any polynomial expansion. For example, another one that splits the number into an upper and lower part for parallel multicore optimization, or into a split(1 shl k) expansion like it's needed for SIMDoptimization.
We want the remainder of the decimal number 261783695427. To achieve this, we gonna use a split4 expansion of that number: Code: 261783695427 mod 7 = (200080005000 + 60003000400 + 1000600020 + 700090007) mod 7 = (200080005000 mod 7 + 60003000400 mod 7 + 1000600020 mod 7 + 700090007 mod 7) mod 7 = ( (2*100000000000) mod 7 + (8*10000000) mod 7 + (5*1000) mod 7 + ( 6*10000000000) mod 7 + ( 3*1000000) mod 7 + ( 4*100) mod 7 + ( 1*1000000000) mod 7 + ( 6*100000) mod 7 + ( 2*10) mod 7 + ( 7*100000000) mod 7 + ( 9*10000) mod 7 + ( 7*1) mod 7 ) mod 7 = ( ((2 mod 7)*100000000000) mod 7 + ((8 mod 7)*10000000) mod 7 + ((5 mod 7)*1000) mod 7 + ( (6 mod 7)*10000000000) mod 7 + ( (3 mod 7)*1000000) mod 7 + ( (4 mod 7)*100) mod 7 + ( (1 mod 7)*1000000000) mod 7 + ( (6 mod 7)*100000) mod 7 + ( (2 mod 7)*10) mod 7 + ( (7 mod 7)*100000000) mod 7 + ( (9 mod 7)*10000) mod 7 + ( (7 mod 7)*1) mod 7 ) mod 7 ;using Horner scheme = ( ( ( ( (2 mod 7)*10000 mod 7 + 8 mod 7 )*10000 mod 7 + 5 mod 7 ) mod 7)*1000 mod 7 + ( ( ( (6 mod 7)*10000 mod 7 + 3 mod 7 )*10000 mod 7 + 4 mod 7 ) mod 7)*100 mod 7 + ( ( ( (1 mod 7)*10000 mod 7 + 6 mod 7 )*10000 mod 7 + 2 mod 7 ) mod 7)*10 mod 7 + ( ( ( (7 mod 7)*10000 mod 7 + 9 mod 7 )*10000 mod 7 + 7 mod 7 ) mod 7)*1 mod 7 ) mod 7 = ( (( (( (2 mod 7)*10000 mod 7 + 8 ) mod 7)*10000 mod 7 + 5 ) mod 7)*1000 mod 7 + (( (( (6 mod 7)*10000 mod 7 + 3 ) mod 7)*10000 mod 7 + 4 ) mod 7)*100 mod 7 + (( (( (1 mod 7)*10000 mod 7 + 6 ) mod 7)*10000 mod 7 + 2 ) mod 7)*10 mod 7 + (( (( (7 mod 7)*10000 mod 7 + 9 ) mod 7)*10000 mod 7 + 7 ) mod 7)*1 mod 7 ) mod 7 = ( (( (( (2 mod 7)#0000 mod 7 + 8 ) mod 7)#0000 mod 7 + 5 ) mod 7)#000 mod 7 + (( (( (6 mod 7)#0000 mod 7 + 3 ) mod 7)#0000 mod 7 + 4 ) mod 7)#00 mod 7 + (( (( (1 mod 7)#0000 mod 7 + 6 ) mod 7)#0000 mod 7 + 2 ) mod 7)#0 mod 7 + (( (( (7 mod 7)#0000 mod 7 + 9 ) mod 7)#0000 mod 7 + 7 ) mod 7) mod 7 ) mod 7 now we are actually computing it Code: = ( (( (( 2#0000 mod 7 + 8 ) mod 7)#0000 mod 7 + 5 ) mod 7)#000 mod 7 + (( (( 6#0000 mod 7 + 3 ) mod 7)#0000 mod 7 + 4 ) mod 7)#00 mod 7 + (( (( 1#0000 mod 7 + 6 ) mod 7)#0000 mod 7 + 2 ) mod 7)#0 mod 7 + (( (( 0#0000 mod 7 + 9 ) mod 7)#0000 mod 7 + 7 ) mod 7) mod 7 ) mod 7 = ( (( (( 20000 mod 7 + 8 ) mod 7)#0000 mod 7 + 5 ) mod 7)#000 mod 7 + (( (( 60000 mod 7 + 3 ) mod 7)#0000 mod 7 + 4 ) mod 7)#00 mod 7 + (( (( 10000 mod 7 + 6 ) mod 7)#0000 mod 7 + 2 ) mod 7)#0 mod 7 + (( (( 00000 mod 7 + 9 ) mod 7)#0000 mod 7 + 7 ) mod 7) mod 7 ) mod 7 = ( (( (( 1 + 8 ) mod 7)#0000 mod 7 + 5 ) mod 7)#000 mod 7 + (( (( 3 + 3 ) mod 7)#0000 mod 7 + 4 ) mod 7)#00 mod 7 + (( (( 4 + 6 ) mod 7)#0000 mod 7 + 2 ) mod 7)#0 mod 7 + (( (( 0 + 9 ) mod 7)#0000 mod 7 + 7 ) mod 7) mod 7 ) mod 7 = ( (( (9 mod 7)#0000 mod 7 + 5 ) mod 7)#000 mod 7 + (( (6 mod 7)#0000 mod 7 + 4 ) mod 7)#00 mod 7 + (( (10 mod 7)#0000 mod 7 + 2 ) mod 7)#0 mod 7 + (( (9 mod 7)#0000 mod 7 + 7 ) mod 7) mod 7 ) mod 7 = ( (( 2#0000 mod 7 + 5 ) mod 7)#000 mod 7 + (( 6#0000 mod 7 + 4 ) mod 7)#00 mod 7 + (( 3#0000 mod 7 + 2 ) mod 7)#0 mod 7 + (( 2#0000 mod 7 + 7 ) mod 7) mod 7 ) mod 7 = ... = ( 6#000 mod 7 + 0#00 mod 7 + 0#0 mod 7 + 1 mod 7 ) mod 7 = (1 + 0 + 0 + 1) mod 7 = 2 = 261083695427 mod 7 III.]You may have noticed that however long our number might get, each digit is processed the same way, that is "(D mod P)#0000 mod P" when D denotes each single digit. IV.]Also, every line (row) is processed the same way, except for the very end. This allows us to use SIMD parallelizing. The general pattern to this expansion is: Code: ;if X is a number with n digits indexed from n1 downto 0: X mod P = ( x[n1]# ... #x[1]#x[0] ) mod P = ( (( (( (x[n1] mod P)#0000 mod P + ... + x[7] ) mod P)#0000 mod P + x[3] ) mod P)#000 mod P + (( (( (x[n2] mod P)#0000 mod P + ... + x[6] ) mod P)#0000 mod P + x[2] ) mod P)#00 mod P + (( (( (x[n3] mod P)#0000 mod P + ... + x[5] ) mod P)#0000 mod P + x[1] ) mod P)#0 mod P + (( (( (x[n4] mod P)#0000 mod P + ... + x[4] ) mod P)#0000 mod P + x[0] ) mod P) mod P ) mod P ;or even more general for evey different split(1 shl k) expansion: = ( (((((x[(n1)k+k1] mod P)#dup(k,0) mod P + ... + x[1k+k1]) mod P)#dup(k,0) mod P + x[0k+k1]) mod P)#dup(0,k1) mod P ... + (((((x[(n1)k+ 1] mod P)#dup(k,0) mod P + ... + x[1k+ 1]) mod P)#dup(k,0) mod P + x[0k+ 1]) mod P)#dup(0,1) mod P + (((((x[(n1)k+ 0] mod P)#dup(k,0) mod P + ... + x[1k+ 0]) mod P)#dup(k,0) mod P + x[0k+ 0]) mod P)#dup(0,0) mod P ) mod P V.] Of course, all of the the above and previous rules are independent from the number system used for every variable. Of course we gonna later use binary, hex or even byte, word ord dword "digits". _________________ MCD  the inevitable return of the Mad Computer Doggy __/ .+~ .  

25 Aug 2008, 01:33 

MCD
*And now for something completely different!*
(I skip binary addition here for brevity since you all are supposed to know it) You all probably remember the pen&paper multiply algorithm from school with decimal digits. Of course this works exactly the same for binary digits: Code: First, we need the binary addition and multiplication "tables" (which are among the smallest of every positional number system): +0 1 *0 1 + + 00 1 00 0 11 ¹0 10 1 We take 143d = 11d * 13d = 1011b * 1101b as an example: 1011 *1101  1011 0000 1011 +1011 ¹¹¹¹  10001111 = 143d ;The ¹ should be a small "1" denoting a carry. Just say me if they don't look like small "1"s for you. of course this multiply works with the same principle in every basen positional number system. But we can also compute a multiply related to the ^ modulo 2 add (xor): Code: We need a different "addition" (mod 2, xor) table, but the xor"multiplication" table remains the same as above: ^0 1 °0 1 + + 00 1 00 0 11 0 10 1 Basically, the ^ is just lacking the carry. We take the same example as above: 1011 °1101  1011 0000 1011 ^1011  1111111 = 127d There are many things that work exactly the same with ^ and ° as with +, and *. NOTE:This does NOT mean they must yield the same result for every terms or numbers! We start with similarities between ^ and +: Code: G0: Xoring 2 natural numbers will always yield a natural number exactly as adding 2 integer numbers will alway yield an integer number: 5^7 in N is as true as 5+7 in Z is. This is called the "closure law": for all A,B in N: A^B in N G1: Xoring natural numbers doesn't depend on the order of evaluation (parentheses) exactly as adding integer numbers doesn't depend on the order of its evaluation: 5^(7^3) = 5^4 = 1 = 2^3 = (5^7)^3 is as true as 5+(7+3) = 5+10 = 15 = 12+3 = (5+7)+3 is. This is called the "associative law": for all A,B,C in N: A^(B^C) = A^B^C = (A^B)^C G4: Xoring natural numbers doesn't depend on the order of its operands exactly as adding integer numbers doesn't depend on the order of its operands: 5^7 = 2 = 7^5 is as true as 5+7 = 12 = 7+5 is. This is called the "commutative law": for all A,B in N: A^B = B^A G2: Xoring special (here: a unique) numbers to any other natural number doesn't change it exactly as adding 0 to any integer number doesn't change it: 7^0 = 7 = 0^7 is as true as 7+0 = 7 = 0+7 is. This is called the "existence of a neutral (here: additive identity) element": there exists 0L,0R in N: for all A in N: 0L^A = A = A^0R Because of G4, these 0L and 0R are the same and we denote this unique neutral element as 0. We can now rewrite the above equation: there exists exactly one 0 in N: for all A in N: 0^A = A = A^0 G3: To every natural number we can xor a corresponding (here: unique) natural number to yield 0, exactly as we can add a negative integer number to a corresponding positive integer number and viceversa to yield 0: 7^(7) = 0 = (7)^7 7+(7) = 0 = (7)+7 This is called the "existence of an (here: additive) inverse element": for all A in N: there exist AL,AR in N: AL^A = 0 = A^AR Because of G4, AL and AR are the same and denoted as A. We can rewrite the above equation: for all A in N: there exist exactly one A in N: A^A = 0 = A^A X1: Every number xor'ed 2 times with another number yields the initial number: 7^(7^5) = 5 This is called an "involution" when you apply a function 2 times to get the identity: for all A,B in N: A^(A^B) = B Let B = 0, we get: for all A in N: A^A = 0 <=> A = A We conclude from the last equation that under every involution every element is said to be selfinverse, such that we will spare the  sign under ^ in the future. G1..3 form a structure called a group. G1..4 form a more special structure called an abelian (or a commutativ) group. Now some more laws including the ° and *: Code: R0: The closure law also applies to both ° and *. R1: The associative law also applies to both ° and *: 3°(5°7) = 3°27 = 45 = 15°7 = (3°5)°7 is as true as 3*(5*7) = 3*35 = 105 = 15*7 = (3*5)*7 is. R4: The commutative law also applies to both ° and *: 5°7 = 27 = 7°5 is as true as 5*7 = 35 = 7*5 is. R2: ° and * both have a (unique, because of R4) identity element denoted as 1. NOT R3: ° has no inverse elements in the natural numbers except for 1, thus the identity element 1 is the only unit under °. * has no inverse elements in the integer numbers except for 1 and 1, thus the identity element 1 and its additive inverse 1 are the only units under *. R5: You can factorize and expand any term consisting of ^ and ° as you can factorize and expand any term consisting only of +, and °: 5°(3^7) = 5°4 = 20 = 15 ^ 27 = 5°3 ^ 5°7 101°(11^111) = 101°100 = 10100 = 1111 ^ 11011 = 101°11 ^ 101°111 This is called the "distributive law": for all A,B,C in N: A°(B^C) = A°B ^ A°C Because of G0..4,R0,R1,R5 each of (Z,+,,*) and (N,^,^,°) are structures called a ring. Adding R2,R4 would specialize them into unitary, commutative rings, which are also called a integral domains. Basically all usual calculus with integer numbers is done in an integral domain. Now, if both (Z,+,,*) and (N,+,,*) integral domains would also have a kind of a "nonbijective multiplicative inverse" operation called the "truncated division" and a "nonbijective remainder of the truncated division" operation called the "remainder" such that having both the "truncated quotients" and "remainders" would make the decompositions bijective, then both integral domains would also be euclidian domains which would allow us to transfer all(!) calculations from (Z,+,,*,...) to (N,^,^,°,...)! This is the case for (Z,+,,*,/,mod) > (N,^,^,°,%,crc), so we can simply transfer our modulo calculation method with polynomial expansion over to crc calculation with polynomial expansion, especially our split(1 shl k) expansion which is suitable for SIMD computations. I'm currently workin on a complete proof, but I'm not gonna post it here if noone wants it. P.S.: Just say if I'm too verbose _________________ MCD  the inevitable return of the Mad Computer Doggy __/ .+~ .  

25 Aug 2008, 03:47 

revolution
Did you ever learn something called polynomial multiplication in school? Also known as carryless multiply.
Strange, someone else has just asked me about that! 

25 Aug 2008, 11:29 

iic2
revolution, JIC, don't forget that all new inventions and ideas did not come from text books and (real) computing is only so many years old. Point is... Can it Work??? Did it work ??? A new twist from Jupiter, a shot in the dark that could be made possible.
Even Bill Gates once said "no one will ever need more than 64kb", and days latter, something strange came about. Here are some old files I keep about crc, the stuff everyone may already know about. I post them JIC. But there is one more I got to find based on TASM i think. I'm talking about (D) Thomas, the wizard we know of. Not sure if what im posting is the one. It been a while. Again, they are all well know example from day 1999 or earler. Maybe by next year I can do the math 2.


25 Aug 2008, 18:02 

revolution
iic2: I was just trying to help the poster to read some related topics. It may save the poster from reinventing work already known. That is not to say the work above won't reveal some nice insights into CRC. And I certainly hope that with the appropriate references the poster can accelerate the work.
Last edited by revolution on 26 Aug 2008, 00:07; edited 1 time in total 

25 Aug 2008, 20:22 

iic2
I/anyone who know you, always know you go beyond and force reality to do things as know proper and no one will ever knock that down. OK, never sorry for, but, it was only JIC. Solutions is alway only one step away...


25 Aug 2008, 23:37 

Goto page Previous 1, 2 < Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.