flat assembler
Message board for the users of flat assembler.

Index > Main > data compression theories and implementation

Author
Thread Post new topic Reply to topic
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
I was thinking of a data compression method. I don't know if it's already been implemented. The idea is to have a byte lookup table and also a known file format lookup table. Would this even work considering that most files are structured(have headers)?
would it not just increase the size instead of compressing it?
Post 10 Aug 2011, 15:27
View user's profile Send private message Reply with quote
cod3b453



Joined: 25 Aug 2004
Posts: 619
cod3b453
A number of algorithms use lookup tables, trees or "dictionaries" of common sequences. The important question is what are you compressing and how does the algorithm handle it. If your data is random it is likely the "compressed" result is larger but if your data is a smaller set of values or repeats or has common patterns, it is easier to compress.
Post 10 Aug 2011, 19:24
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
I haven't come with any code yet as I was just thinking. The way I thought it would be would only work on known file formats and less likely on unstructured binary data. I was thinking that, if the file has a known format and it has pre-defined flags, those flags(ie DWORD) can be put in a look-up table and then in the compressed file replace that field with a byte that points to location of the flag in the look-up table.

Example, take an .EXE file.
Let's take the CPU type field.
and here's the look up table for the EXE header CPU type field.

0. i386 ; WORD this is our cpu type
1. AMD ; WORD
3. etc...; WORD

on compression, this field will be reduced to a byte, 0 in this case.
and then on decompression, the deflater will replace this field with a WORD therefore re-building a valid PE.

But then, I'm trying to think how that can be applied to a block of data such as a floating number field. ?
Hhmm...
Post 10 Aug 2011, 20:45
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
Byte is too coarse of a unit. That is why most compressors use bits. Most used fields take 2 bits (so they will compress the most), and the less the fields are used, the longer are the bitstreams. The bitstreams can even be larger than a DWORD, depending on your dictionay size...
...this is called http://en.wikipedia.org/wiki/Huffman_coding

7-zip uses multiple compression types on archives which contain for example txt, exe, mp3, wav etc.


Description: half of the files considered as text, other half - executable
Filesize: 2.9 KB
Viewed: 2402 Time(s)

7-zip.png



_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 12 Aug 2011, 07:20
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
^^Hmm.., I read the wikipedia page. It seems so complicated the first time.
Post 12 Aug 2011, 18:01
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1901
DOS386
typedef wrote:
I was thinking of a data compression method. I don't know if it's already been implemented. The idea is to have a byte lookup table and also a known file format lookup table. Would this even work considering that most files are structured(have headers)?
would it not just increase the size instead of compressing it?


This won't work. Problems:

1. You are limited to known file types

2. You could "optimize" some inefficient headers (like the PE header, or even the silly "This program can't be run in DOS mode, go online and pirate some Windaube, see http://www.best*********4free.net/index.html, dude"), but not the other file data

3. You can compress such headers using ordinary algo's too - no need for your "technology"

4. Check out RLE, LZ77, Huffman, LZW84, Deflate (= LZ77+Huffman), and come back when you understand them all Smile
Post 17 Aug 2011, 04:12
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
DOS386 wrote:
typedef wrote:
I was thinking of a data compression method. I don't know if it's already been implemented. The idea is to have a byte lookup table and also a known file format lookup table. Would this even work considering that most files are structured(have headers)?
would it not just increase the size instead of compressing it?


This won't work. Problems:

1. You are limited to known file types

2. You could "optimize" some inefficient headers (like the PE header, or even the silly "This program can't be run in DOS mode, go online and pirate some Windaube, see http://www.best*********4free.net/index.html, dude"), but not the other file data

3. You can compress such headers using ordinary algo's too - no need for your "technology"

4. Check out RLE, LZ77, Huffman, LZW84, Deflate (= LZ77+Huffman), and come back when you understand them all Smile

Very Happy

PS: Your weird link did not work either
Post 17 Aug 2011, 04:55
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.