flat assembler
Message board for the users of flat assembler.

Index > DOS > How to write simple packer?

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY 24 Aug 2005, 00:36
Hi! I'm thinking about making a text packer.
I have a text like this:

tttttffffaaaaaaabbb

and it gets packed to

t5f4a7b3

What I mean is pack the text based on how much times each char repeats.
I thinks you understand what I mean.
Are there other ways of doing simple packers like this? And how could be done such logic?
Please post some code.

Sorry for my bad english.
Thanks!
Post 24 Aug 2005, 00:36
View user's profile Send private message Reply with quote
bazik



Joined: 28 Jul 2003
Posts: 34
Location: .de
bazik 24 Aug 2005, 00:45
OzzY wrote:
Hi! I'm thinking about making a text packer.
I have a text like this:

tttttffffaaaaaaabbb

and it gets packed to

t5f4a7b3


In which earth spoken language do you have words with more than 2 or 3 times the same character in a row? Very Happy

I posted some *simple* compression routine on this forum some time ago. Just search for 'compression' and my username.
Post 24 Aug 2005, 00:45
View user's profile Send private message Reply with quote
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY 24 Aug 2005, 00:45
well... no language, but imagine for example a "Hello"... it has double "l" Very Happy
but is there any other way to have less bytes per char?

also I done a search but found nothing really simple... Sad
Post 24 Aug 2005, 00:45
View user's profile Send private message Reply with quote
bazik



Joined: 28 Jul 2003
Posts: 34
Location: .de
bazik 24 Aug 2005, 01:19
OzzY wrote:
well... no language, but imagine for example a "Hello"... it has double "l" Very Happy
but is there any other way to have less bytes per char?

also I done a search but found nothing really simple... Sad


Sorry, I mixed up the forum where I posted it Wink

Source is here: http://board.win32asmcommunity.net/index.php?topic=2031.0
Post 24 Aug 2005, 01:19
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 24 Aug 2005, 01:51
Do you want to write/understand the code, or just have some compression code that works?
Post 24 Aug 2005, 01:51
View user's profile Send private message Visit poster's website Reply with quote
crc



Joined: 21 Jun 2003
Posts: 637
Location: Penndel, PA [USA]
crc 24 Aug 2005, 01:54
Quote:
but is there any other way to have less bytes per char?


You could always use a varient of huffman encoding, like Chuck Moore does in ColorForth. See http://colorforth.com/chars.html for his page on the encodings (averaging something like 5.2 bits per character, rather than 8 )
Post 24 Aug 2005, 01:54
View user's profile Send private message Visit poster's website Reply with quote
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY 24 Aug 2005, 01:58
I want to write and understand. And it has to be very simple (I don't mind if it compress really bad or is slow).
The one that bazik posted isn't simple. Sad
Post 24 Aug 2005, 01:58
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 24 Aug 2005, 02:03
I would suggest you start looking for RLE, run-length-encoding, which is basically the scheme that you're talkign about in your first post.

LZ compression, like basic posted, really should be accompanied with text material and probably some graphical illustrations as well Smile
Post 24 Aug 2005, 02:03
View user's profile Send private message Visit poster's website Reply with quote
bazik



Joined: 28 Jul 2003
Posts: 34
Location: .de
bazik 24 Aug 2005, 08:47
Ya, RLE might be the easiest to start with.

I could also suggest the 'Data Compression Book' if you really want to get into that. I have this book here, although I didnt have the time yet to work out some of the stuff explained there in assembly Sad ISBN is 1-55851-434-1, http://www.amazon.com/exec/obidos/tg/detail/-/1558514341/qid%3D1124873090/sr%3D11-1/ref%3Dsr%5F11%5F1/102-8911412-3770567?v=glance
Post 24 Aug 2005, 08:47
View user's profile Send private message Reply with quote
UCM



Joined: 25 Feb 2005
Posts: 285
Location: Canada
UCM 24 Aug 2005, 16:37
Heh, "Hello" would be compressed to "Hel2o" same length. lol.
In fact, this entire post wouldn't be compressed at all using that algo. I had thought of this algo a long time ago, but I realized how pointless it would be. (Except for executables with lots of 0s.) Plus, what if there was a number in the to-be-compressed string?
Post 24 Aug 2005, 16:37
View user's profile Send private message Reply with quote
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY 24 Aug 2005, 17:15
Yep. I was thinking about that too. Any way to solve this? Is there a way to make a char waste less then 1 byte?
Post 24 Aug 2005, 17:15
View user's profile Send private message Reply with quote
El Tangas



Joined: 11 Oct 2003
Posts: 120
Location: Sunset Empire
El Tangas 24 Aug 2005, 17:38
I once thought of an algo to compress files with lots of zeros:
Remove all zeroes from the file, while creating a bit string representing the places where there were zeros (1 means not zero, 0 means zero). Then the file could be reconstructed from the zeroless file and the bit string.
To achieve compression, the file must have more than 1/8 zeroes (instead of zero it could be any byte making up more than 1/8 of the file). This is a lousy algo, I know...
Post 24 Aug 2005, 17:38
View user's profile Send private message Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 24 Aug 2005, 23:02
hi
you can start making a table with most commonly used characters and define new byte codes for commonly used characters on less bits, this can reduce size of text files significantly.
(write down buffer and refill it on demand for "fresh" table)
Post 24 Aug 2005, 23:02
View user's profile Send private message Visit poster's website Reply with quote
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY 25 Aug 2005, 01:29
Hey Matrix! I like your idea! But could you post some code to make things clear?

Thanks
Post 25 Aug 2005, 01:29
View user's profile Send private message Reply with quote
vbVeryBeginner



Joined: 15 Aug 2004
Posts: 884
Location: \\world\asia\malaysia
vbVeryBeginner 25 Aug 2005, 04:47
if use dictionary based, then u would only support for particular language and need to deal with the uppercase and lowercase. i guess the concept need to be byte based instead of character based.

Introduction / Lossless Data Compression
http://www.vectorsite.net/ttdcmp1.html

free zipper
http://www.7-zip.org/

library files
http://datacompression.info/LZSS.shtml

Data Compression from wikipedia
http://en.wikipedia.org/wiki/Compression_algorithm

-sulaiman
Post 25 Aug 2005, 04:47
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 25 Aug 2005, 14:49
sry didnt make any compression codes yet,

vbVeryBeginner,
i think you didnt really mean that dictionary based thing as it is written Smile
but instead commonly repeated character arrays.

OzzY, for your request

made example like

Code:
'afffffgfffffhfffff' $0
    


must be able to compress otherwise store as uncompressed to fit in same block size
compressed as

Code:
<bit 0 - 1=compressed 0=uncompressed><bits 1-7 - block lenght>
#block start#
1 0010011
#dictionary#
<record size><redundant data>
$5h fffff
#dictionary-end#
$0 ; eof dictionary $0 instead of size
<compressed data = 00 <index in dictionary(1 = first)> >
'a' $0$1 'g' $0$1 'h' $0$1$ 0$0
    


compressed block ; in this example the compressed size is 19, including dictionary, uncompressed size whould be 19 bytes, so it should be left uncompressed.
Code:
#block start#
1 0010011b 05h 'fffff' 00h
'a' 00h 01h 'g' 00h 01h 'h' 00h 01h
00h 00h
; lets overcome the 00 problem, this requires to write 00 00 instead of 00 in compressed block
    


uncompressed block:
Code:
<bit 0 - 1=compressed 0=uncompressed><bits 1-7 - block lenght>
#block start#
0 0010011b
'afffffgfffffhfffff' $0
    


enhancement: adding rle

if <record> contains same characters, then it can be compressed too using simple rle
in this case:

Code:
<dictionary record #1>
<bit 0 - 1 if compressed><bit 1-7 - record size not including repeat count>
1 0000001b
<repeat count><character>
05h 'f'
00h
    


so

Code:
'afffffgfffffhfffff' $0
    


could be
compressed
as

; in this example the compressed size is 16, including dictionary, uncompressed size whould be 19 bytes, so a compresion ratio of 84.2153 %
Code:
#block start#
1 0010011b
1 0000001b
05h 'f'
00h
'a' 00h 01h 'g' 00h 01h 'h' 00h 01h
00h 00h
    


another enhancement:
Code:
<compressed data = 00 <index in dictionary(1 = first), if bit 7 set then use rle, no dictionary> >
bit 7 = 1 then
bit 0-6 is repeat count
db record size
db data
    


example:
Code:
data : 'abcffffffffffdef'
    


Code:
; block start
1 0010000b 00h ; no dictionary
'abc' 00h 10001010h 01h 'f' 'def'
    



not hard to implement, but long time.
Post 25 Aug 2005, 14:49
View user's profile Send private message Visit poster's website Reply with quote
Adam Kachwalla



Joined: 01 Apr 2006
Posts: 150
Adam Kachwalla 25 Apr 2006, 11:22
I am looking into writing a packer for low-level programs (such as an OS Kernel). I think it is an excellent idea. Many people may think that a program will be slow just by compressing it with UPX or something like that.

This is the reality (in stages):

    1. CPU loads the compressed program code from the hard drive
    2. Program has instructions that allow it to be decompressed within the memory
    3. CPU executes program code.

This, in my opinion (and personal experience) is faster than normal execution:

    1. CPU loads whole program from HDD
    2. CPU executes program

A-HA! Many people (including corporations such as Microsoft) fall into the trap of leaving their program uncompressed. This is because, according to them, the lower the number of stages to be performed, the less time is taken for execution.

It is the time taken for the stages to be executed, and not necessarily the number of stages (although on certain occasions this can help).

The HDD is slower than the RAM, and so if you have to load less from the HDD into the RAM, you save a lot of time.

The decompression method is treated as part of the execution itself, and is executed directly within the CPU. Then the decompressed program code is loaded into either the RAM or the CPU cache and then executed from there.

NOTE: UPX will not compress flat binaries or COM files without headers.
Post 25 Apr 2006, 11:22
View user's profile Send private message Reply with quote
TDCNL



Joined: 25 Jan 2006
Posts: 56
TDCNL 25 Apr 2006, 16:58
Adam that's bullocks....

Unpacking consumes more memory of the PC because of the unpacker code, it takes longer to unpack in memory than to startup decompressed executable.

_________________
:: The Dutch Cracker ::
Post 25 Apr 2006, 16:58
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 25 Apr 2006, 17:03
Indeed it's bullocks - read http://f0dder.reteam.org/packandstuff.htm .

Furthermore, a modern OS does not load your executable file to memory all at once, it does "demand-load". A compressed executable *will* need to be loaded all at once, unless you write some very sophisticated code.

By the way, you haven't concidered that NTFS supports compressed files - making executable compression superfluous while still allowing demand-load, discarding pages, etc etc etc.
Post 25 Apr 2006, 17:03
View user's profile Send private message Visit poster's website Reply with quote
Adam Kachwalla



Joined: 01 Apr 2006
Posts: 150
Adam Kachwalla 25 Apr 2006, 22:59
There is something I might have left out:

Compressing it also adds protection (to a certain extent) from crackers. Also, if the unpacking code is a gigabyte long, of course it will take more memory (That's the sort of packer I would dump). Packers such as UPX will indeed have to take a few more bytes, but that is nothing: esspecially if you are loading a 1MB executable from a slow device. UPX compresses that executable to only 300KB, so at least it will fit on a floppy disk! That is a more obvious example (I hope).
Post 25 Apr 2006, 22:59
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.