flat assembler
Message board for the users of flat assembler.
Index
> Main > carryless rangecoder |
Author |
|
donkey7 07 Jul 2005, 21:14
recently i made an entropy coder - subbotin carryless rangecoder. it's quite similiar to arithmetic coder.
i sent it in attachment. can anyone tell me how to improve its speed?
_________________ Keep coding! |
|||||||||||
07 Jul 2005, 21:14 |
|
donkey7 12 Jul 2005, 15:24
yeah, amazing!
it will be more readable if you change value 16843009 to hexadecimal form $01010101. also this algo can be used in .fourthloop too. have you any idea about improving decoding speed (something like trees etc., i don't know)? it's more important to me. anyway, thanks Madis! |
|||
12 Jul 2005, 15:24 |
|
THEWizardGenius 12 Jul 2005, 21:02
What is a range coder? What does it do and how does it work?
|
|||
12 Jul 2005, 21:02 |
|
T&K(r) 12 Jul 2005, 21:22
TheWizardGenius - visit http://www.arturocampos.com to know more how rangecider working ( and what is it )
|
|||
12 Jul 2005, 21:22 |
|
Madis731 12 Jul 2005, 22:58
Have a look at these:
Its strange that the results are so symmetrical - for example 7zip packing:unpacking=1:10 in terms of speed. I don't quite like the per-byte codec like adding and comparing. I know this can be streamlined but I have yet to discover how
|
|||||||||||||||||||||
12 Jul 2005, 22:58 |
|
donkey7 13 Jul 2005, 13:32
Speeds are quite identical because it's only an coder. Typical compressor have a model (wchich modeles the input, eg: LZ, PPM or BWT) and coder (wchich compresses output of previous stage). I'm actually writing an LZ-based compressor. Actually it have only an model - modelled data must be compressed externally, eg: by this coder.
7-zip also uses range coder (slightly different), ZIP uses Huffman or Shannon-Fano codes. Range coder seem to be better solution because it is faster than Huffman in adaptive implementations (and far slower in semi-static implementations). I have made an Huffman coder a short time ago, it's available on my website. You can also use shcodec http://webcenter.ru/~xander/, and comapre results. Result of rangecoder are symmetrical because coder and decoder does almost the same work. ZIP decompresses faster than compresses because compresses must find repeated patterns and decompressor only repeats them based on informations gived by compressor. |
|||
13 Jul 2005, 13:32 |
|
Madis731 14 Jul 2005, 08:14
I was thinking of an optimization like NOT initializing symbols at first because you need to organize them anyways. So after you have put them at right places (less movement required) - you can fill in the rest (256-n) places with symbols not yet used.
I bumped up the addition speed by making it DQWORD wide but the best results is promised on variable width additions. You can double the width on any successice NotAbove-addition and fallback to DWORD if addition goes over eax (the compared value). Divisions should be get rid of and any MUL DIV should be checked for ZERO/ONE before executing. Removing unneccessary chunks of code executed on preknown results. btw is the goal to beat shcodec or to prove that you are capable of making something comparable to speeds of shcodec? |
|||
14 Jul 2005, 08:14 |
|
donkey7 14 Jul 2005, 16:37
can you write some code for it. i don't understand this 'variable width additions' (my english is not s good ;-( ).
i think that checking for one defore divisions will slow down the coder because at most of the time (over 99.9%) that cases won't appear. to speed up i should not sort frequencies on first chunks, but only on further chunks (i think, but i don't tested this). the goal is making the fastest coder available (i mean fast enough) that has best performance over all adaptive coders. i was thinking on a implementation with binary trees and without frequency sorting. this will be faster on 'random' data. i need coder as last step in lz-based compressor. it will code lenghts of values (in bits) (adaptively), distances and lengths(static, because it can't be adaptive due to high range of possibly values), and literals (also adaptively). lenghts of values are rather constant (and there frequency sorting will help very much) and literals are rather random (and there frequency sorting will not help or even make thing worse). you can get and compare some other coders on a Eugene Shelwien website. |
|||
14 Jul 2005, 16:37 |
|
r22 15 Jul 2005, 02:15
If you want speed, you may want to consider using SSE instructions.
I'm not sure if it's possible but you may also want to process or than one symbol at a time. If you do use SSE you'll want to align your variables and tables on 16byte bounds so you can use the Aligned as opposed to the Unaligned SSE instructions. |
|||
15 Jul 2005, 02:15 |
|
donkey7 15 Jul 2005, 14:06
r22: processing more than one symbol at a time seems to be impossible. but some instructions can be parallelized (i think). anyway, i don't want to make code for small group of processors (SSE is relatively new technology), but optimize the entire algorithm.
i will think over a binary trees, but it may be impossible to get to work . Madis731 improvements are quite good with frequency sorting, now there are are one thing to do - improve speed on 'random' data. |
|||
15 Jul 2005, 14:06 |
|
f0dder 16 Jul 2005, 22:46
Hrm, use SSE in a coder? I thought the algorithm in question used integer and not FP data? And afaik, SSE only supports floatingpoint datatypes (SSE2 adds double-width MMX operations, though). Or am I wrong?
|
|||
16 Jul 2005, 22:46 |
|
Madis731 17 Jul 2005, 17:59
Takes time for me to suit in again because I have been on a holiday. I'll try to figure out some better algos and post if any. MMX came from early Pentiums and I think it is possible to use MMX instructions, but this means we have to sort all data to 8byte chunks and I really don't think of any way right now to access 8 different memorylocations at a time. Like donkey7 said - it seems to be impossible
|
|||
17 Jul 2005, 17:59 |
|
donkey7 17 Jul 2005, 18:53
yes, simd instruction won't help. we must find an algo with complexity like o(log n) or o(sqrt n)., for example binary trees.
i thought a bit on binary trees and there are my suggestions: - don't sort by frequencies (version for 'random' data), - we can divide entire set of symbols into two parts, compute cumulative frequency for second part and this would on average double the speed of searching symbols cumulative frequencies, - we can divide sybols for smaller & smaller sets, this would result in something like binary tree, the main problem is how to manage them (when increasing or decreasing frequency of given symbol) , - this would have o(log2 n) complexity - good enough for 256 symbols, |
|||
17 Jul 2005, 18:53 |
|
donkey7 18 Jul 2005, 21:03
here are my recent implementations of adaptive coder.
shindlet_bt_* - using thing like binary search trees, shindlet_sl_* - using simple linear search shindlet_fs_* - using frequency sorting, i've included also Madis731 improvements. bt has o(log n) complexity but it isn't much faster thal sl. could anyone 'fix' it ;-? i've changed coder from subbotin to shindlet (minor change). please test them on different types of files.
_________________ Keep coding! |
|||||||||||
18 Jul 2005, 21:03 |
|
donkey7 19 Jul 2005, 19:10
i've updated my coders again.
bt_* are greatly improved, and sl_* are slightly faster (because i've made an mistake in sources before - i forgot to replace all searching alogs with Madis731 ones). test them and tell me what can be changed to make coder faster. note that bt_* and sl_* produces the same output (fs_* produces different output but it has the same length as previous ones).
_________________ Keep coding! |
|||||||||||
19 Jul 2005, 19:10 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.