flat assembler
Message board for the users of flat assembler.
 Home   FAQ   Search   Register 
 Profile   Log in to check your private messages   Log in 
flat assembler > Projects and Ideas > My Inflate implementation. (was Working with bit streams...)

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
JohnFound



Joined: 16 Jun 2003
Posts: 3459
Location: Bulgaria

revolution wrote:
I expect many people are confused by your response since it appears to go against the initial purpose.



This contradiction is actually false. Yes I want to know what is the fastest way of handling bit streams, as an algorithms, not CPU specific optimizations techniques.

For example, using double shifts (shld) proved to be too slow, compared with simple register logical instructions.

I expected some ideas about using MMX (for example) or whatever of this kind.

Also, when I started the topic, I though that the bottleneck will be the memory access, so reading dword/qword data will speed the processing. This assumption was false and in fact, the memory read speed is not important at all at least in context of the Inflate algorithm.

Notice, that I am searching a high ratio of speed/size not only one of them.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 02 Dec 2014, 09:43
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15302
Location: Bigweld Industries
SHLD, MMX and memory access times are all CPU specific optimisations and have nothing to do with algorithms. So the apparent contradiction is still there.

When doing algorithmic analysis usually the asymptotic bounds are considered because in the absence of a specific CPU or system there is probably no other way to compare them.

Implementing algorithms in a CPU instruction set and then comparing the runtime is not a way to determine which algorithm is "faster". It will only give you a way to determine which algorithm is best suited to the system under test within the finite bounds tested.
Post 02 Dec 2014, 10:34
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3459
Location: Bulgaria
Whatever... I think I found really fast solution for Huffman tree traversing. Working...
Post 02 Dec 2014, 10:56
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3459
Location: Bulgaria
News: I tried to implement the Huffman codes decoding by not traversing the binary tree, but searching in one big lookup table. It was in hope, that this technique will be much faster.

This is branch: Click

Well yes, it is faster. But not as faster as I expected. In addition the code became too ugly and I decided the price is too high for so small performance gain.

Anyway, optimizing the previous implementation for size, now it is under 1K without any performance loss: deflate.asm

@redsock - sorry, I didn't answered on your previous post earlier. So, about CRC32 - gzip uses CRC32 checksum, simply because its implementation of Inflate misses a lot of checks inside the inner loops. This way on broken stream, it generate broken data and understands it only after the whole stream is processed, even if the invalid data is at the beginning of the stream.

On the other hand, my implementation has full set of integrity checks inside the main loops, so it detects the invalid data very early in the processing. This way, I simply don't need CRC32 as a way to detect errors in the stream.
Post 06 Dec 2014, 12:31
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15302
Location: Bigweld Industries

JohnFound wrote:
This way, I simply don't need CRC32 as a way to detect errors in the stream.

Well I hope you are correct about that. Right now the thought in my head is "famous last words", hehe Razz
Post 06 Dec 2014, 12:35
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3459
Location: Bulgaria

revolution wrote:

JohnFound wrote:
This way, I simply don't need CRC32 as a way to detect errors in the stream.

Well I hope you are correct about that. Right now the thought in my head is "famous last words", hehe Razz



CRC32 also does not guarantees 100% error free decompression.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 06 Dec 2014, 12:39
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15302
Location: Bigweld Industries

JohnFound wrote:
CRC32 also does not guarantees 100% error free decompression.

Neither does a hash function. But they both do a very good job of it. I wonder how the false negative rates of CRC32 compares to just checking for compressed data consistency, and thus how much trouble you are potentially exposing the code to? I can't find any details about this so perhaps no one has done a study?
Post 06 Dec 2014, 12:47
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3459
Location: Bulgaria
It depends on what is considered "error". My implementation does not consider as error the difference between the uncompressed stream and the original data, because it knows nothing about the original data. It consider an errors only when the stream is not valid deflate stream.

If the user of the library knows something about the original data (for example checksum, or even the original file) he can choose whether to compare the result data to this information or not.
Post 06 Dec 2014, 14:05
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow

JohnFound wrote:
News: I tried to implement the Huffman codes decoding by not traversing the binary tree, but searching in one big lookup table. It was in hope, that this technique will be much faster.

This is branch: Click

Well yes, it is faster. But not as faster as I expected. In addition the code became too ugly and I decided the price is too high for so small performance gain.

Anyway, optimizing the previous implementation for size, now it is under 1K without any performance loss: deflate.asm

@redsock - sorry, I didn't answered on your previous post earlier. So, about CRC32 - gzip uses CRC32 checksum, simply because its implementation of Inflate misses a lot of checks inside the inner loops. This way on broken stream, it generate broken data and understands it only after the whole stream is processed, even if the invalid data is at the beginning of the stream.

On the other hand, my implementation has full set of integrity checks inside the main loops, so it detects the invalid data very early in the processing. This way, I simply don't need CRC32 as a way to detect errors in the stream.



Ugly looks is never a big price for performance. Consider pretty macros.

crc32 serves data integrity purposes, we may replace it with hsa1sum of a whole uncompresed block.
Post 06 Dec 2014, 19:42
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >

Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2005 phpBB Group.

Main index   Download   Documentation   Examples   Message board
Copyright © 2004-2016, Tomasz Grysztar.