flat assembler
Message board for the users of flat assembler.

Index > Main > More faster realization of CRC32-algorithm.

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Garthower



Joined: 21 Apr 2006
Posts: 158
Location: Ukraine
Garthower 13 Feb 2007, 23:21
Hi all.

I have found here such realization:
Code:
proc CRC32Update uses esi ebx, CRC,Buf,Len

        mov     eax,[CRC]
        push    eax
        mov     eax,dword [eax]
        mov     esi,[Buf]

        mov     ecx,[Len]
        or      ecx,ecx
        jz      .end
.loop:
        mov      edx,eax
        shl      edx,8
        mov      bl,[esi]
        shr      eax,24
        xor      al,bl
        mov      eax,dword [crc32_table + eax * 4]
        xor      eax,edx

        inc      esi
        dec      ecx
        jnz      .loop

.end:
        mov      ecx,eax
        pop      eax
        mov      [eax],ecx

        ret

crc32_table     dd $00000000, $04c11db7, $09823b6e, $0d4326d9
                dd $130476dc, $17c56b6b, $1a864db2, $1e475005
                dd $2608edb8, $22c9f00f, $2f8ad6d6, $2b4bcb61
                dd $350c9b64, $31cd86d3, $3c8ea00a, $384fbdbd
                dd $4c11db70, $48d0c6c7, $4593e01e, $4152fda9
                dd $5f15adac, $5bd4b01b, $569796c2, $52568b75
                dd $6a1936c8, $6ed82b7f, $639b0da6, $675a1011
                dd $791d4014, $7ddc5da3, $709f7b7a, $745e66cd
                dd $9823b6e0, $9ce2ab57, $91a18d8e, $95609039
                dd $8b27c03c, $8fe6dd8b, $82a5fb52, $8664e6e5
                dd $be2b5b58, $baea46ef, $b7a96036, $b3687d81
                dd $ad2f2d84, $a9ee3033, $a4ad16ea, $a06c0b5d
                dd $d4326d90, $d0f37027, $ddb056fe, $d9714b49
                dd $c7361b4c, $c3f706fb, $ceb42022, $ca753d95
                dd $f23a8028, $f6fb9d9f, $fbb8bb46, $ff79a6f1
                dd $e13ef6f4, $e5ffeb43, $e8bccd9a, $ec7dd02d
                dd $34867077, $30476dc0, $3d044b19, $39c556ae
                dd $278206ab, $23431b1c, $2e003dc5, $2ac12072
                dd $128e9dcf, $164f8078, $1b0ca6a1, $1fcdbb16
                dd $018aeb13, $054bf6a4, $0808d07d, $0cc9cdca
                dd $7897ab07, $7c56b6b0, $71159069, $75d48dde
                dd $6b93dddb, $6f52c06c, $6211e6b5, $66d0fb02
                dd $5e9f46bf, $5a5e5b08, $571d7dd1, $53dc6066
                dd $4d9b3063, $495a2dd4, $44190b0d, $40d816ba
                dd $aca5c697, $a864db20, $a527fdf9, $a1e6e04e
                dd $bfa1b04b, $bb60adfc, $b6238b25, $b2e29692
                dd $8aad2b2f, $8e6c3698, $832f1041, $87ee0df6
                dd $99a95df3, $9d684044, $902b669d, $94ea7b2a
                dd $e0b41de7, $e4750050, $e9362689, $edf73b3e
                dd $f3b06b3b, $f771768c, $fa325055, $fef34de2
                dd $c6bcf05f, $c27dede8, $cf3ecb31, $cbffd686
                dd $d5b88683, $d1799b34, $dc3abded, $d8fba05a
                dd $690ce0ee, $6dcdfd59, $608edb80, $644fc637
                dd $7a089632, $7ec98b85, $738aad5c, $774bb0eb
                dd $4f040d56, $4bc510e1, $46863638, $42472b8f
                dd $5c007b8a, $58c1663d, $558240e4, $51435d53
                dd $251d3b9e, $21dc2629, $2c9f00f0, $285e1d47
                dd $36194d42, $32d850f5, $3f9b762c, $3b5a6b9b
                dd $0315d626, $07d4cb91, $0a97ed48, $0e56f0ff
                dd $1011a0fa, $14d0bd4d, $19939b94, $1d528623
                dd $f12f560e, $f5ee4bb9, $f8ad6d60, $fc6c70d7
                dd $e22b20d2, $e6ea3d65, $eba91bbc, $ef68060b
                dd $d727bbb6, $d3e6a601, $dea580d8, $da649d6f
                dd $c423cd6a, $c0e2d0dd, $cda1f604, $c960ebb3
                dd $bd3e8d7e, $b9ff90c9, $b4bcb610, $b07daba7
                dd $ae3afba2, $aafbe615, $a7b8c0cc, $a379dd7b
                dd $9b3660c6, $9ff77d71, $92b45ba8, $9675461f
                dd $8832161a, $8cf30bad, $81b02d74, $857130c3
                dd $5d8a9099, $594b8d2e, $5408abf7, $50c9b640
                dd $4e8ee645, $4a4ffbf2, $470cdd2b, $43cdc09c
                dd $7b827d21, $7f436096, $7200464f, $76c15bf8
                dd $68860bfd, $6c47164a, $61043093, $65c52d24
                dd $119b4be9, $155a565e, $18197087, $1cd86d30
                dd $029f3d35, $065e2082, $0b1d065b, $0fdc1bec
                dd $3793a651, $3352bbe6, $3e119d3f, $3ad08088
                dd $2497d08d, $2056cd3a, $2d15ebe3, $29d4f654
                dd $c5a92679, $c1683bce, $cc2b1d17, $c8ea00a0
                dd $d6ad50a5, $d26c4d12, $df2f6bcb, $dbee767c
                dd $e3a1cbc1, $e760d676, $ea23f0af, $eee2ed18
                dd $f0a5bd1d, $f464a0aa, $f9278673, $fde69bc4
                dd $89b8fd09, $8d79e0be, $803ac667, $84fbdbd0
                dd $9abc8bd5, $9e7d9662, $933eb0bb, $97ffad0c
                dd $afb010b1, $ab710d06, $a6322bdf, $a2f33668
                dd $bcb4666d, $b8757bda, $b5365d03, $b1f740b4
endp
    


But it not so is pleasant to me, since processes for one cycle of one byte. Whether there are at somebody ideas how it to accelerate, and, maybe, examples of realization?
Post 13 Feb 2007, 23:21
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 14 Feb 2007, 00:07
on what CPU should your code run? if it should run only on CPUs that support SSSE3/SSE4 (suplimentary SSE3), than you could do the whole stuff in just ONE instruction! Very Happy
They have the "CRC32" instruction, but I still don't know its usage.

else, I have optimized your code a little:
Code:
proc    CRC32Update uses esi, CRC,Buf,Len
 
        mov     eax,[CRC]
        push    eax
        mov     eax,dword [eax]
        mov     esi,[Buf]
        mov     ecx,[Len]
;<try>
        test    ecx,ecx
        jz      .end
;<or>;most P6 and AMD CPUs like jecxz
;       jecxz   .end
;</end try>
        add     esi,ecx
        neg     ecx
.loop:
        mov     edx,eax
        shr     eax,24
        xor     al,[esi+ecx]
        shl     edx,8
        mov     eax,dword [crc32_table + eax * 4]
        xor     eax,edx
        inc     ecx     ;add ecx,1 if Pentium4
        jnz     .loop

.end:
        pop     esi
        mov     [esi],eax
        ret

crc32_table     dd $00000000, $04c11db7, $09823b6e, $0d4326d9
                dd $130476dc, $17c56b6b, $1a864db2, $1e475005 
                dd $2608edb8, $22c9f00f, $2f8ad6d6, $2b4bcb61 
                dd $350c9b64, $31cd86d3, $3c8ea00a, $384fbdbd 
                dd $4c11db70, $48d0c6c7, $4593e01e, $4152fda9 
                dd $5f15adac, $5bd4b01b, $569796c2, $52568b75 
                dd $6a1936c8, $6ed82b7f, $639b0da6, $675a1011 
                dd $791d4014, $7ddc5da3, $709f7b7a, $745e66cd 
                dd $9823b6e0, $9ce2ab57, $91a18d8e, $95609039 
                dd $8b27c03c, $8fe6dd8b, $82a5fb52, $8664e6e5 
                dd $be2b5b58, $baea46ef, $b7a96036, $b3687d81 
                dd $ad2f2d84, $a9ee3033, $a4ad16ea, $a06c0b5d 
                dd $d4326d90, $d0f37027, $ddb056fe, $d9714b49 
                dd $c7361b4c, $c3f706fb, $ceb42022, $ca753d95 
                dd $f23a8028, $f6fb9d9f, $fbb8bb46, $ff79a6f1 
                dd $e13ef6f4, $e5ffeb43, $e8bccd9a, $ec7dd02d 
                dd $34867077, $30476dc0, $3d044b19, $39c556ae 
                dd $278206ab, $23431b1c, $2e003dc5, $2ac12072 
                dd $128e9dcf, $164f8078, $1b0ca6a1, $1fcdbb16 
                dd $018aeb13, $054bf6a4, $0808d07d, $0cc9cdca 
                dd $7897ab07, $7c56b6b0, $71159069, $75d48dde 
                dd $6b93dddb, $6f52c06c, $6211e6b5, $66d0fb02 
                dd $5e9f46bf, $5a5e5b08, $571d7dd1, $53dc6066 
                dd $4d9b3063, $495a2dd4, $44190b0d, $40d816ba 
                dd $aca5c697, $a864db20, $a527fdf9, $a1e6e04e 
                dd $bfa1b04b, $bb60adfc, $b6238b25, $b2e29692 
                dd $8aad2b2f, $8e6c3698, $832f1041, $87ee0df6 
                dd $99a95df3, $9d684044, $902b669d, $94ea7b2a 
                dd $e0b41de7, $e4750050, $e9362689, $edf73b3e 
                dd $f3b06b3b, $f771768c, $fa325055, $fef34de2 
                dd $c6bcf05f, $c27dede8, $cf3ecb31, $cbffd686 
                dd $d5b88683, $d1799b34, $dc3abded, $d8fba05a 
                dd $690ce0ee, $6dcdfd59, $608edb80, $644fc637 
                dd $7a089632, $7ec98b85, $738aad5c, $774bb0eb 
                dd $4f040d56, $4bc510e1, $46863638, $42472b8f 
                dd $5c007b8a, $58c1663d, $558240e4, $51435d53 
                dd $251d3b9e, $21dc2629, $2c9f00f0, $285e1d47 
                dd $36194d42, $32d850f5, $3f9b762c, $3b5a6b9b 
                dd $0315d626, $07d4cb91, $0a97ed48, $0e56f0ff 
                dd $1011a0fa, $14d0bd4d, $19939b94, $1d528623 
                dd $f12f560e, $f5ee4bb9, $f8ad6d60, $fc6c70d7 
                dd $e22b20d2, $e6ea3d65, $eba91bbc, $ef68060b 
                dd $d727bbb6, $d3e6a601, $dea580d8, $da649d6f 
                dd $c423cd6a, $c0e2d0dd, $cda1f604, $c960ebb3 
                dd $bd3e8d7e, $b9ff90c9, $b4bcb610, $b07daba7 
                dd $ae3afba2, $aafbe615, $a7b8c0cc, $a379dd7b 
                dd $9b3660c6, $9ff77d71, $92b45ba8, $9675461f 
                dd $8832161a, $8cf30bad, $81b02d74, $857130c3 
                dd $5d8a9099, $594b8d2e, $5408abf7, $50c9b640 
                dd $4e8ee645, $4a4ffbf2, $470cdd2b, $43cdc09c 
                dd $7b827d21, $7f436096, $7200464f, $76c15bf8 
                dd $68860bfd, $6c47164a, $61043093, $65c52d24 
                dd $119b4be9, $155a565e, $18197087, $1cd86d30 
                dd $029f3d35, $065e2082, $0b1d065b, $0fdc1bec 
                dd $3793a651, $3352bbe6, $3e119d3f, $3ad08088 
                dd $2497d08d, $2056cd3a, $2d15ebe3, $29d4f654 
                dd $c5a92679, $c1683bce, $cc2b1d17, $c8ea00a0 
                dd $d6ad50a5, $d26c4d12, $df2f6bcb, $dbee767c 
                dd $e3a1cbc1, $e760d676, $ea23f0af, $eee2ed18 
                dd $f0a5bd1d, $f464a0aa, $f9278673, $fde69bc4 
                dd $89b8fd09, $8d79e0be, $803ac667, $84fbdbd0 
                dd $9abc8bd5, $9e7d9662, $933eb0bb, $97ffad0c 
                dd $afb010b1, $ab710d06, $a6322bdf, $a2f33668 
                dd $bcb4666d, $b8757bda, $b5365d03, $b1f740b4 
endp
    

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||


Last edited by MCD on 14 Feb 2007, 00:43; edited 1 time in total
Post 14 Feb 2007, 00:07
View user's profile Send private message Reply with quote
Garthower



Joined: 21 Apr 2006
Posts: 158
Location: Ukraine
Garthower 14 Feb 2007, 00:15
And procedure should be carried out on processors with support SSE3 a maximum.

SSE4 are not supported by any processor since still they are not realized in one chip. There is only a white-paper, that they will be Smile At least up to processors Intel Daul-Core. I don't have information, whether so it with Intel Extreme with four cores.
Post 14 Feb 2007, 00:15
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 14 Feb 2007, 01:02
hmm, maybe unrolling the loop may further fasten it?
Post 14 Feb 2007, 01:02
View user's profile Send private message Reply with quote
Garthower



Joined: 21 Apr 2006
Posts: 158
Location: Ukraine
Garthower 14 Feb 2007, 07:49
Thanks for your posts, but now I understand, that this algorithm on the present to accelerate it will not be possible, it specific and is adhered to 32 bits. But I have thought up analogue. To consider the contorl sum separately on even and not even bytes, and in the end apply XOR of one control sum to another that the end result was 32 bit. Certainly, such algorithm will not be we shall combine with old, but it enables to use SIMD instructions not losing uniqueness of an end result.
Post 14 Feb 2007, 07:49
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 14 Feb 2007, 16:21
How just got a theoretical though of how to speed up the CRC32 algorithm, but I'm not sure if this is possible. You can try the following:

1.) calculate 2 CRC32 values in paralle, where 1 calculates the CRC32 of all even and the other calculates the CRC32 of all odd dwords, so that both data sources are interleaved (like the left and right signals from stereo PCM data usually are)

2.) next final step is to combine those 2 CRC32 values

the problem about this is that I don't know how to properly combine those CRC32 values neither do I know if this is theoretically possible? I'm not much into hash/checksum/cryptography, so I can't say/show that.
Post 14 Feb 2007, 16:21
View user's profile Send private message Reply with quote
Goplat



Joined: 15 Sep 2006
Posts: 181
Goplat 15 Feb 2007, 02:21
CRC has to be done sequentially; there isn't much room for algorithmic optimization. You could process 2 bytes at a time instead of 1 but that would require a 65536 entry table (256kB), which would probably make it slower.
Post 15 Feb 2007, 02:21
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 15 Feb 2007, 02:25
Goplat: are you sure there isn't a way to process it in chunks, and "tie those chunks together"?
Post 15 Feb 2007, 02:25
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 15 Feb 2007, 05:07
Unrolling :
- 3X unroll will probably give you a ~10 speed improvement on a loop that tight.
- For the sake of branch prediction with the unrolls you may want to use the follow structure.
Code:
mov edx,eax
.loop:
...
inc ecx
jz .end
mov edx,eax
...
inc ecx
jz .end
mov edx,eax
...
inc ecx
jz .end
mov edx,eax
jmp .loop
.end:
    


Addressing the LUT :
- Hold crc32_table in another register [reg1 + reg2 * x] runs faster over many iterations than [Address + reg2 * x]. ~2-5% improvement

Alignment :
- Align [.loop:] to 16 bytes. Use correct padding with 90h and 66h bytes.
Here's a poorly made macro that does it for you.
Code:
macro AMDPad16
{
    virtual
        align 16
        a = $-$$
    end virtual
    if a=1
       db 90h
    end if
    if a=2
       db 66h,90h
    end if
    if a=3
       db 66h,66h,90h
    end if
    if a=4
       db 66h,66h,66h,90h
    end if
    if a=5
       db 66h,66h,90h,66h,90h
    end if
    if a=6
       db 66h,66h,90h,66h,66h,90h
    end if
    if a=7
       db 66h,66h,66h,90h,66h,66h,90h
    end if
    if a=8
       db 66h,66h,66h,90h,66h,66h,66h,90h
    end if
    if a=9
       db 66h,66h,90h,66h,66h,90h,66h,66h,90h
    end if
    if a=10
       db 66h,66h,66h,90h,66h,66h,90h,66h,66h,90h
    end if
    if a=11
       db 66h,66h,66h,90h,66h,66h,66h,90h,66h,66h,90h
    end if
    if a=12
       db 66h,66h,66h,90h,66h,66h,66h,90h,66h,66h,66h,90h
    end if
    if a=13
       db 66h,66h,66h,90h,66h,66h,90h,66h,66h,90h,66h,66h,90h
    end if
    if a=14
       db 66h,66h,66h,90h,66h,66h,66h,90h,66h,66h,90h,66h,66h,90h
    end if
    if a=15
       db 66h,66h,66h,90h,66h,66h,66h,90h,66h,66h,66h,90h,66h,66h,90h
    end if
}
    

2-5% improvement.

Bigger LUT :
- If you can keep it under a half a MB (~65535 * Cool then it should be fine. Before the algorithm starts just use PREFETCH loop to go through the hash in 64byte stride. Then it'll be in the cache. This would only be acceptable on newer processors which seem to always have at least 1 MB of L2 per core AND only if the buffer length was sufficiently long to make ti worth while to precache the LUT. 30% improvement under the right conditions.

Why no parallelism :
- The next byte from the buffer gets XORd with the first byte of the current CRC value.
- The SSE4 CRC32 opcode will probably run as badly is the string opcodes Razz Hopefully not.
Post 15 Feb 2007, 05:07
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 15 Feb 2007, 16:22
r22 wrote:

- The SSE4 CRC32 opcode will probably run as badly is the string opcodes Razz Hopefully not.

sure, the CRC32 instruction won't be as fast as simple logical or arithmical instructions like XOR, TEXT, ADD, SUB..., but it will probably be faster than the loop versions.
Post 15 Feb 2007, 16:22
View user's profile Send private message Reply with quote
Goplat



Joined: 15 Sep 2006
Posts: 181
Goplat 15 Feb 2007, 16:56
Garthower wrote:
Code:
        mov      edx,eax
        shl      edx,8
        mov      bl,[esi]
        shr      eax,24
        xor      al,bl
        mov      eax,dword [crc32_table + eax * 4]
        xor      eax,edx    


This could be done with fewer instructions as
Code:
        mov     ebx,eax
        shr     ebx,24
        xor     bl,[esi]
        shl     eax,8
        xor     eax,dword [crc32_table + ebx * 4]    
Post 15 Feb 2007, 16:56
View user's profile Send private message Reply with quote
Octavio



Joined: 21 Jun 2003
Posts: 366
Location: Spain
Octavio 15 Feb 2007, 23:24
see Zlib source code
Post 15 Feb 2007, 23:24
View user's profile Send private message Visit poster's website Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1905
DOS386 17 Feb 2007, 01:27
Quote:
on what CPU should your code run? if it should run only on CPUs that support SSSE3/SSE4 (suplimentary SSE3), than you could do the whole stuff in just ONE instruction! They have the "CRC32" instruction, but I still don't know its usage.


Well, one day SSSSSSSSSSE99 will have instructions for CRC32, MD5, SHA-160,
SHA-512, SHA-4096, RSA, Rijndael, Twofish, Deflate, LZMA, PPMD, MPEG, ...
... and maybe also for booting and shutting down Vista ... by now wasting
much more time than "only" table-based CRC32 code in ASM Laughing

No thanks, prefer 80386 compatible code ...

Quote:
see Zlib source code
In FASM by now ?

_________________
Bug Nr.: 12345

Title: Hello World program compiles to 100 KB !!!

Status: Closed: NOT a Bug
Post 17 Feb 2007, 01:27
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 17 Feb 2007, 01:47
NTOSKRNL_VXE: CRC is pretty simple to implement, which is why it's now being added to silicon. And the reasoning for it is stated in the Intel provided manual. At first, I thought it was rather silly, but consider a software iSCSI stack running on a gigabit LAN...

MPEG is already being offloaded to GPUs, which are more suitable for that task.

Quote:

see Zlib source code
Quote:

In FASM by now ?


Isn't the choice of assembler irrelevant? x86 assembly is x86 assembly. Sure, AT&T syntax is awful, but it's still readable (and there's a masm version too, anyway).
Post 17 Feb 2007, 01:47
View user's profile Send private message Visit poster's website Reply with quote
SomeoneNew



Joined: 12 Aug 2006
Posts: 54
SomeoneNew 06 Mar 2008, 11:57
Sorry to revive an "old" thread... but I always wanted to know why 90% of the CRC32 codes I've seen uses a look-up table, is it for speed reasons?.

I have a CRC32 generator that doesn't use a look-up table and runs faster than one that actually does (although it might be a bad implementation, that I do not know).

Please explain me Smile

Thanks

_________________
Im new, sorry if I bothered with any stupid question Smile
Post 06 Mar 2008, 11:57
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20460
Location: In your JS exploiting you and your system
revolution 06 Mar 2008, 12:02
It is for speed, table methods are much faster, usually by a factor of 8 as most algos these days use an 8bitx32bit LUT.

But also 16bitx32bit LUT's are used in some cases where the ultimate speed is required (or desired), bit it needs 256k of memory just for the LUT and is usually slower on a cached CPU system.
Post 06 Mar 2008, 12:02
View user's profile Send private message Visit poster's website Reply with quote
SomeoneNew



Joined: 12 Aug 2006
Posts: 54
SomeoneNew 06 Mar 2008, 12:18
So perhaps this overhead explains why the LUT one is slower in my case?, although as I said I don't have access to that CRC32 generator so I can't know for sure.

Are there any good crc32 codes I could take a look at that use a LUT?

Thanks.
Post 06 Mar 2008, 12:18
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20460
Location: In your JS exploiting you and your system
revolution 06 Mar 2008, 12:30
SomeoneNew wrote:
So perhaps this overhead explains why the LUT one is slower in my case?, although as I said I don't have access to that CRC32 generator so I can't know for sure.
maybe
SomeoneNew wrote:
Are there any good crc32 codes I could take a look at that use a LUT?
This may no be good but I have a CRC library for both table and non-table computations. You are welcome to test them for speed and compare.
Post 06 Mar 2008, 12:30
View user's profile Send private message Visit poster's website Reply with quote
SomeoneNew



Joined: 12 Aug 2006
Posts: 54
SomeoneNew 06 Mar 2008, 13:00
Thanks, I'll take a look!
Was this the fruit of a coding exercise?, I can't see why would someone really need such an extensive crc32 library :p
Post 06 Mar 2008, 13:00
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4081
Location: vpcmpistri
bitRAKE 06 Mar 2008, 17:14
author of code unknown
Code:
CRC32   proc lData:DWORD, ptrData:DWORD

     push    esi
 push    ecx
 push    edx

     mov     esi, ptrData
        xor     edx, edx
    or      eax, -1
     mov     ecx, lData

CRC32_loop:
   mov     dl, [esi]
   xor     dl, al
      shr     eax, 8
      xor     eax, [crc32_table + 4*edx]
  inc     esi
 dec     ecx
 jnz     CRC32_loop

      not     eax

     pop     edx
 pop     ecx
 pop     esi
 ret

; 1K CRC32 Table
crc32_table      \
  dd 000000000h,077073096h,0EE0E612Ch,0990951BAh,0076DC419h,0706AF48Fh,0E963A535h,09E6495A3h
  dd 00EDB8832h,079DCB8A4h,0E0D5E91Eh,097D2D988h,009B64C2Bh,07EB17CBDh,0E7B82D07h,090BF1D91h
  dd 01DB71064h,06AB020F2h,0F3B97148h,084BE41DEh,01ADAD47Dh,06DDDE4EBh,0F4D4B551h,083D385C7h
  dd 0136C9856h,0646BA8C0h,0FD62F97Ah,08A65C9ECh,014015C4Fh,063066CD9h,0FA0F3D63h,08D080DF5h
  dd 03B6E20C8h,04C69105Eh,0D56041E4h,0A2677172h,03C03E4D1h,04B04D447h,0D20D85FDh,0A50AB56Bh
  dd 035B5A8FAh,042B2986Ch,0DBBBC9D6h,0ACBCF940h,032D86CE3h,045DF5C75h,0DCD60DCFh,0ABD13D59h
  dd 026D930ACh,051DE003Ah,0C8D75180h,0BFD06116h,021B4F4B5h,056B3C423h,0CFBA9599h,0B8BDA50Fh
  dd 02802B89Eh,05F058808h,0C60CD9B2h,0B10BE924h,02F6F7C87h,058684C11h,0C1611DABh,0B6662D3Dh
  dd 076DC4190h,001DB7106h,098D220BCh,0EFD5102Ah,071B18589h,006B6B51Fh,09FBFE4A5h,0E8B8D433h
  dd 07807C9A2h,00F00F934h,09609A88Eh,0E10E9818h,07F6A0DBBh,0086D3D2Dh,091646C97h,0E6635C01h
  dd 06B6B51F4h,01C6C6162h,0856530D8h,0F262004Eh,06C0695EDh,01B01A57Bh,08208F4C1h,0F50FC457h
  dd 065B0D9C6h,012B7E950h,08BBEB8EAh,0FCB9887Ch,062DD1DDFh,015DA2D49h,08CD37CF3h,0FBD44C65h
  dd 04DB26158h,03AB551CEh,0A3BC0074h,0D4BB30E2h,04ADFA541h,03DD895D7h,0A4D1C46Dh,0D3D6F4FBh
  dd 04369E96Ah,0346ED9FCh,0AD678846h,0DA60B8D0h,044042D73h,033031DE5h,0AA0A4C5Fh,0DD0D7CC9h
  dd 05005713Ch,0270241AAh,0BE0B1010h,0C90C2086h,05768B525h,0206F85B3h,0B966D409h,0CE61E49Fh
  dd 05EDEF90Eh,029D9C998h,0B0D09822h,0C7D7A8B4h,059B33D17h,02EB40D81h,0B7BD5C3Bh,0C0BA6CADh
  dd 0EDB88320h,09ABFB3B6h,003B6E20Ch,074B1D29Ah,0EAD54739h,09DD277AFh,004DB2615h,073DC1683h
  dd 0E3630B12h,094643B84h,00D6D6A3Eh,07A6A5AA8h,0E40ECF0Bh,09309FF9Dh,00A00AE27h,07D079EB1h
  dd 0F00F9344h,08708A3D2h,01E01F268h,06906C2FEh,0F762575Dh,0806567CBh,0196C3671h,06E6B06E7h
  dd 0FED41B76h,089D32BE0h,010DA7A5Ah,067DD4ACCh,0F9B9DF6Fh,08EBEEFF9h,017B7BE43h,060B08ED5h
  dd 0D6D6A3E8h,0A1D1937Eh,038D8C2C4h,04FDFF252h,0D1BB67F1h,0A6BC5767h,03FB506DDh,048B2364Bh
  dd 0D80D2BDAh,0AF0A1B4Ch,036034AF6h,041047A60h,0DF60EFC3h,0A867DF55h,0316E8EEFh,04669BE79h
  dd 0CB61B38Ch,0BC66831Ah,0256FD2A0h,05268E236h,0CC0C7795h,0BB0B4703h,0220216B9h,05505262Fh
  dd 0C5BA3BBEh,0B2BD0B28h,02BB45A92h,05CB36A04h,0C2D7FFA7h,0B5D0CF31h,02CD99E8Bh,05BDEAE1Dh
  dd 09B64C2B0h,0EC63F226h,0756AA39Ch,0026D930Ah,09C0906A9h,0EB0E363Fh,072076785h,005005713h
  dd 095BF4A82h,0E2B87A14h,07BB12BAEh,00CB61B38h,092D28E9Bh,0E5D5BE0Dh,07CDCEFB7h,00BDBDF21h
  dd 086D3D2D4h,0F1D4E242h,068DDB3F8h,01FDA836Eh,081BE16CDh,0F6B9265Bh,06FB077E1h,018B74777h
  dd 088085AE6h,0FF0F6A70h,066063BCAh,011010B5Ch,08F659EFFh,0F862AE69h,0616BFFD3h,0166CCF45h
  dd 0A00AE278h,0D70DD2EEh,04E048354h,03903B3C2h,0A7672661h,0D06016F7h,04969474Dh,03E6E77DBh
  dd 0AED16A4Ah,0D9D65ADCh,040DF0B66h,037D83BF0h,0A9BCAE53h,0DEBB9EC5h,047B2CF7Fh,030B5FFE9h
  dd 0BDBDF21Ch,0CABAC28Ah,053B39330h,024B4A3A6h,0BAD03605h,0CDD70693h,054DE5729h,023D967BFh
  dd 0B3667A2Eh,0C4614AB8h,05D681B02h,02A6F2B94h,0B40BBE37h,0C30C8EA1h,05A05DF1Bh,02D02EF8Dh

CRC32   endp    

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 06 Mar 2008, 17:14
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.