My Inflate implementation. (was Working with bit streams...)

Index > Projects and Ideas > My Inflate implementation. (was Working with bit streams...)

Goto page Previous 1, 2

Author

Thread

JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria

JohnFound 02 Dec 2014, 09:43

revolution wrote:

I expect many people are confused by your response since it appears to go against the initial purpose.

This contradiction is actually false. Yes I want to know what is the fastest way of handling bit streams, as an algorithms, not CPU specific optimizations techniques.

For example, using double shifts (shld) proved to be too slow, compared with simple register logical instructions.

I expected some ideas about using MMX (for example) or whatever of this kind.

Also, when I started the topic, I though that the bottleneck will be the memory access, so reading dword/qword data will speed the processing. This assumption was false and in fact, the memory read speed is not important at all at least in context of the Inflate algorithm.

Notice, that I am searching a high ratio of speed/size not only one of them.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9

02 Dec 2014, 09:43

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20989
Location: In your JS exploiting you and your system

revolution 02 Dec 2014, 10:34

SHLD, MMX and memory access times are all CPU specific optimisations and have nothing to do with algorithms. So the apparent contradiction is still there.

When doing algorithmic analysis usually the asymptotic bounds are considered because in the absence of a specific CPU or system there is probably no other way to compare them.

Implementing algorithms in a CPU instruction set and then comparing the runtime is not a way to determine which algorithm is "faster". It will only give you a way to determine which algorithm is best suited to the system under test within the finite bounds tested.

02 Dec 2014, 10:34

JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria

JohnFound 02 Dec 2014, 10:56

Whatever... I think I found really fast solution for Huffman tree traversing. Working...

02 Dec 2014, 10:56

JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria

JohnFound 06 Dec 2014, 12:31

News: I tried to implement the Huffman codes decoding by not traversing the binary tree, but searching in one big lookup table. It was in hope, that this technique will be much faster.

This is branch: Click

Well yes, it is faster. But not as faster as I expected. In addition the code became too ugly and I decided the price is too high for so small performance gain.

Anyway, optimizing the previous implementation for size, now it is under 1K without any performance loss: deflate.asm

@redsock - sorry, I didn't answered on your previous post earlier. So, about CRC32 - gzip uses CRC32 checksum, simply because its implementation of Inflate misses a lot of checks inside the inner loops. This way on broken stream, it generate broken data and understands it only after the whole stream is processed, even if the invalid data is at the beginning of the stream.

On the other hand, my implementation has full set of integrity checks inside the main loops, so it detects the invalid data very early in the processing. This way, I simply don't need CRC32 as a way to detect errors in the stream.

06 Dec 2014, 12:31

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20989
Location: In your JS exploiting you and your system

revolution 06 Dec 2014, 12:35

JohnFound wrote:

This way, I simply don't need CRC32 as a way to detect errors in the stream.

Well I hope you are correct about that. Right now the thought in my head is "famous last words", hehe Razz

06 Dec 2014, 12:35

JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria

JohnFound 06 Dec 2014, 12:39

revolution wrote:

JohnFound wrote:
This way, I simply don't need CRC32 as a way to detect errors in the stream.
Well I hope you are correct about that. Right now the thought in my head is "famous last words", hehe

CRC32 also does not guarantees 100% error free decompression.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9

06 Dec 2014, 12:39

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20989
Location: In your JS exploiting you and your system

revolution 06 Dec 2014, 12:47

JohnFound wrote:

CRC32 also does not guarantees 100% error free decompression.

Neither does a hash function. But they both do a very good job of it. I wonder how the false negative rates of CRC32 compares to just checking for compressed data consistency, and thus how much trouble you are potentially exposing the code to? I can't find any details about this so perhaps no one has done a study?

06 Dec 2014, 12:47

JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria

JohnFound 06 Dec 2014, 14:05

It depends on what is considered "error". My implementation does not consider as error the difference between the uncompressed stream and the original data, because it knows nothing about the original data. It consider an errors only when the stream is not valid deflate stream.

If the user of the library knows something about the original data (for example checksum, or even the original file) he can choose whether to compare the result data to this information or not.

06 Dec 2014, 14:05

Matrix

Joined: 04 Sep 2004
Posts: 1169
Location: Overflow

Matrix 06 Dec 2014, 19:42

JohnFound wrote:

News: I tried to implement the Huffman codes decoding by not traversing the binary tree, but searching in one big lookup table. It was in hope, that this technique will be much faster.

This is branch: Click

Well yes, it is faster. But not as faster as I expected. In addition the code became too ugly and I decided the price is too high for so small performance gain.

Anyway, optimizing the previous implementation for size, now it is under 1K without any performance loss: deflate.asm

@redsock - sorry, I didn't answered on your previous post earlier. So, about CRC32 - gzip uses CRC32 checksum, simply because its implementation of Inflate misses a lot of checks inside the inner loops. This way on broken stream, it generate broken data and understands it only after the whole stream is processed, even if the invalid data is at the beginning of the stream.

On the other hand, my implementation has full set of integrity checks inside the main loops, so it detects the invalid data very early in the processing. This way, I simply don't need CRC32 as a way to detect errors in the stream.

Ugly looks is never a big price for performance. Consider pretty macros.

crc32 serves data integrity purposes, we may replace it with hsa1sum of a whole uncompresed block.

06 Dec 2014, 19:42

Goto page Previous 1, 2

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum