flat assembler
Message board for the users of flat assembler.
Index
> DOS > How to write simple packer? Goto page 1, 2, 3 Next |
Author |
|
bazik 24 Aug 2005, 00:45
OzzY wrote: Hi! I'm thinking about making a text packer. In which earth spoken language do you have words with more than 2 or 3 times the same character in a row? I posted some *simple* compression routine on this forum some time ago. Just search for 'compression' and my username. |
|||
24 Aug 2005, 00:45 |
|
OzzY 24 Aug 2005, 00:45
well... no language, but imagine for example a "Hello"... it has double "l"
but is there any other way to have less bytes per char? also I done a search but found nothing really simple... |
|||
24 Aug 2005, 00:45 |
|
bazik 24 Aug 2005, 01:19
OzzY wrote: well... no language, but imagine for example a "Hello"... it has double "l" Sorry, I mixed up the forum where I posted it Source is here: http://board.win32asmcommunity.net/index.php?topic=2031.0 |
|||
24 Aug 2005, 01:19 |
|
f0dder 24 Aug 2005, 01:51
Do you want to write/understand the code, or just have some compression code that works?
|
|||
24 Aug 2005, 01:51 |
|
crc 24 Aug 2005, 01:54
Quote: but is there any other way to have less bytes per char? You could always use a varient of huffman encoding, like Chuck Moore does in ColorForth. See http://colorforth.com/chars.html for his page on the encodings (averaging something like 5.2 bits per character, rather than 8 ) |
|||
24 Aug 2005, 01:54 |
|
OzzY 24 Aug 2005, 01:58
I want to write and understand. And it has to be very simple (I don't mind if it compress really bad or is slow).
The one that bazik posted isn't simple. |
|||
24 Aug 2005, 01:58 |
|
f0dder 24 Aug 2005, 02:03
I would suggest you start looking for RLE, run-length-encoding, which is basically the scheme that you're talkign about in your first post.
LZ compression, like basic posted, really should be accompanied with text material and probably some graphical illustrations as well |
|||
24 Aug 2005, 02:03 |
|
bazik 24 Aug 2005, 08:47
Ya, RLE might be the easiest to start with.
I could also suggest the 'Data Compression Book' if you really want to get into that. I have this book here, although I didnt have the time yet to work out some of the stuff explained there in assembly ISBN is 1-55851-434-1, http://www.amazon.com/exec/obidos/tg/detail/-/1558514341/qid%3D1124873090/sr%3D11-1/ref%3Dsr%5F11%5F1/102-8911412-3770567?v=glance |
|||
24 Aug 2005, 08:47 |
|
UCM 24 Aug 2005, 16:37
Heh, "Hello" would be compressed to "Hel2o" same length. lol.
In fact, this entire post wouldn't be compressed at all using that algo. I had thought of this algo a long time ago, but I realized how pointless it would be. (Except for executables with lots of 0s.) Plus, what if there was a number in the to-be-compressed string? |
|||
24 Aug 2005, 16:37 |
|
OzzY 24 Aug 2005, 17:15
Yep. I was thinking about that too. Any way to solve this? Is there a way to make a char waste less then 1 byte?
|
|||
24 Aug 2005, 17:15 |
|
El Tangas 24 Aug 2005, 17:38
I once thought of an algo to compress files with lots of zeros:
Remove all zeroes from the file, while creating a bit string representing the places where there were zeros (1 means not zero, 0 means zero). Then the file could be reconstructed from the zeroless file and the bit string. To achieve compression, the file must have more than 1/8 zeroes (instead of zero it could be any byte making up more than 1/8 of the file). This is a lousy algo, I know... |
|||
24 Aug 2005, 17:38 |
|
Matrix 24 Aug 2005, 23:02
hi
you can start making a table with most commonly used characters and define new byte codes for commonly used characters on less bits, this can reduce size of text files significantly. (write down buffer and refill it on demand for "fresh" table) |
|||
24 Aug 2005, 23:02 |
|
OzzY 25 Aug 2005, 01:29
Hey Matrix! I like your idea! But could you post some code to make things clear?
Thanks |
|||
25 Aug 2005, 01:29 |
|
vbVeryBeginner 25 Aug 2005, 04:47
if use dictionary based, then u would only support for particular language and need to deal with the uppercase and lowercase. i guess the concept need to be byte based instead of character based.
Introduction / Lossless Data Compression http://www.vectorsite.net/ttdcmp1.html free zipper http://www.7-zip.org/ library files http://datacompression.info/LZSS.shtml Data Compression from wikipedia http://en.wikipedia.org/wiki/Compression_algorithm -sulaiman |
|||
25 Aug 2005, 04:47 |
|
Matrix 25 Aug 2005, 14:49
sry didnt make any compression codes yet,
vbVeryBeginner, i think you didnt really mean that dictionary based thing as it is written but instead commonly repeated character arrays. OzzY, for your request made example like Code: 'afffffgfffffhfffff' $0 must be able to compress otherwise store as uncompressed to fit in same block size compressed as Code: <bit 0 - 1=compressed 0=uncompressed><bits 1-7 - block lenght> #block start# 1 0010011 #dictionary# <record size><redundant data> $5h fffff #dictionary-end# $0 ; eof dictionary $0 instead of size <compressed data = 00 <index in dictionary(1 = first)> > 'a' $0$1 'g' $0$1 'h' $0$1$ 0$0 compressed block ; in this example the compressed size is 19, including dictionary, uncompressed size whould be 19 bytes, so it should be left uncompressed. Code: #block start# 1 0010011b 05h 'fffff' 00h 'a' 00h 01h 'g' 00h 01h 'h' 00h 01h 00h 00h ; lets overcome the 00 problem, this requires to write 00 00 instead of 00 in compressed block uncompressed block: Code: <bit 0 - 1=compressed 0=uncompressed><bits 1-7 - block lenght> #block start# 0 0010011b 'afffffgfffffhfffff' $0 enhancement: adding rle if <record> contains same characters, then it can be compressed too using simple rle in this case: Code: <dictionary record #1> <bit 0 - 1 if compressed><bit 1-7 - record size not including repeat count> 1 0000001b <repeat count><character> 05h 'f' 00h so Code: 'afffffgfffffhfffff' $0 could be compressed as ; in this example the compressed size is 16, including dictionary, uncompressed size whould be 19 bytes, so a compresion ratio of 84.2153 % Code: #block start# 1 0010011b 1 0000001b 05h 'f' 00h 'a' 00h 01h 'g' 00h 01h 'h' 00h 01h 00h 00h another enhancement: Code: <compressed data = 00 <index in dictionary(1 = first), if bit 7 set then use rle, no dictionary> > bit 7 = 1 then bit 0-6 is repeat count db record size db data example: Code: data : 'abcffffffffffdef' Code: ; block start 1 0010000b 00h ; no dictionary 'abc' 00h 10001010h 01h 'f' 'def' not hard to implement, but long time. |
|||
25 Aug 2005, 14:49 |
|
Adam Kachwalla 25 Apr 2006, 11:22
I am looking into writing a packer for low-level programs (such as an OS Kernel). I think it is an excellent idea. Many people may think that a program will be slow just by compressing it with UPX or something like that.
This is the reality (in stages):
2. Program has instructions that allow it to be decompressed within the memory 3. CPU executes program code. This, in my opinion (and personal experience) is faster than normal execution:
2. CPU executes program A-HA! Many people (including corporations such as Microsoft) fall into the trap of leaving their program uncompressed. This is because, according to them, the lower the number of stages to be performed, the less time is taken for execution. It is the time taken for the stages to be executed, and not necessarily the number of stages (although on certain occasions this can help). The HDD is slower than the RAM, and so if you have to load less from the HDD into the RAM, you save a lot of time. The decompression method is treated as part of the execution itself, and is executed directly within the CPU. Then the decompressed program code is loaded into either the RAM or the CPU cache and then executed from there. NOTE: UPX will not compress flat binaries or COM files without headers. |
|||
25 Apr 2006, 11:22 |
|
TDCNL 25 Apr 2006, 16:58
Adam that's bullocks....
Unpacking consumes more memory of the PC because of the unpacker code, it takes longer to unpack in memory than to startup decompressed executable. _________________ :: The Dutch Cracker :: |
|||
25 Apr 2006, 16:58 |
|
f0dder 25 Apr 2006, 17:03
Indeed it's bullocks - read http://f0dder.reteam.org/packandstuff.htm .
Furthermore, a modern OS does not load your executable file to memory all at once, it does "demand-load". A compressed executable *will* need to be loaded all at once, unless you write some very sophisticated code. By the way, you haven't concidered that NTFS supports compressed files - making executable compression superfluous while still allowing demand-load, discarding pages, etc etc etc. |
|||
25 Apr 2006, 17:03 |
|
Adam Kachwalla 25 Apr 2006, 22:59
There is something I might have left out:
Compressing it also adds protection (to a certain extent) from crackers. Also, if the unpacking code is a gigabyte long, of course it will take more memory (That's the sort of packer I would dump). Packers such as UPX will indeed have to take a few more bytes, but that is nothing: esspecially if you are loading a 1MB executable from a slow device. UPX compresses that executable to only 300KB, so at least it will fit on a floppy disk! That is a more obvious example (I hope). |
|||
25 Apr 2006, 22:59 |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.