carryless rangecoder

Index > Main > carryless rangecoder

Author

Thread

donkey7

Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska

donkey7 07 Jul 2005, 21:14

recently i made an entropy coder - subbotin carryless rangecoder. it's quite similiar to arithmetic coder.
i sent it in attachment. can anyone tell me how to improve its speed?

Description:	Subbotin CarryLess Order-0 Range Coder	Download
Filename:	subbotin.tgz
Filesize:	5.09 KB
Downloaded:	1590 Time(s)

_________________
Keep coding!

07 Jul 2005, 21:14

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 11 Jul 2005, 21:42

Set .firstloop to:

Code:

.firstloop:
        ; load symbol cumulative probability
        movzx   eax,byte[frntbuf+ebp]
        xor     ebx,ebx
        xor     ecx,ecx

    imul eax,01010101h
    push eax
@@:
    mov  eax,[esp]
    xor  eax,dword[ecx+symbols]
    mov  edx,eax
    not  eax
    sub  edx,01010101h
    and  eax,80808080h
    and  eax,edx
    jne  @f
    add  ebx,[ecx*4+ranges+00]
    add  ebx,[ecx*4+ranges+04]
    add  ebx,[ecx*4+ranges+08]
    add  ebx,[ecx*4+ranges+12]
    add  ecx,4
    jmp  @b
@@:
    shr eax,8
    jc  @f
    add ebx,[ecx*4+ranges]
    inc ecx
    shr eax,8
    jc  @f
    add ebx,[ecx*4+ranges]
    inc ecx
    shr eax,8
    jc  @f
    add ebx,[ecx*4+ranges]
    inc ecx
@@:
    pop eax
;......until encode symbol

Last edited by Madis731 on 12 Jul 2005, 22:07; edited 1 time in total

11 Jul 2005, 21:42

donkey7

Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska

donkey7 12 Jul 2005, 15:24

yeah, amazing!
it will be more readable if you change value 16843009 to hexadecimal form $01010101. also this algo can be used in .fourthloop too.
have you any idea about improving decoding speed (something like trees etc., i don't know)? it's more important to me.
anyway, thanks Madis!

12 Jul 2005, 15:24

THEWizardGenius

Joined: 14 Jan 2005
Posts: 382
Location: California, USA

THEWizardGenius 12 Jul 2005, 21:02

What is a range coder? What does it do and how does it work?

12 Jul 2005, 21:02

T&K(r)

Joined: 10 Jul 2005
Posts: 18
Location: Poland - > Krakow

T&K(r) 12 Jul 2005, 21:22

TheWizardGenius - visit http://www.arturocampos.com to know more how rangecider working ( and what is it )

12 Jul 2005, 21:22

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 12 Jul 2005, 22:58

Have a look at these:
Its strange that the results are so symmetrical - for example 7zip packing:unpacking=1:10 in terms of speed.

I don't quite like the per-byte codec like adding and comparing. I know this can be streamlined but I have yet to discover how Sad

Description:	20% speed gain decompressor	Download
Filename:	subb_adapt_d2.asm
Filesize:	8.94 KB
Downloaded:	1661 Time(s)

Description:	14% speed gain compressor	Download
Filename:	subb_adapt_c2.asm
Filesize:	9.63 KB
Downloaded:	1624 Time(s)

_________________
My updated idol Very Happy

http://www.agner.org/optimize/

12 Jul 2005, 22:58

donkey7

Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska

donkey7 13 Jul 2005, 13:32

Speeds are quite identical because it's only an coder. Typical compressor have a model (wchich modeles the input, eg: LZ, PPM or BWT) and coder (wchich compresses output of previous stage). I'm actually writing an LZ-based compressor. Actually it have only an model - modelled data must be compressed externally, eg: by this coder.
7-zip also uses range coder (slightly different), ZIP uses Huffman or Shannon-Fano codes. Range coder seem to be better solution because it is faster than Huffman in adaptive implementations (and far slower in semi-static implementations). I have made an Huffman coder a short time ago, it's available on my website. You can also use shcodec http://webcenter.ru/~xander/, and comapre results.

Result of rangecoder are symmetrical because coder and decoder does almost the same work. ZIP decompresses faster than compresses because compresses must find repeated patterns and decompressor only repeats them based on informations gived by compressor.

13 Jul 2005, 13:32

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 14 Jul 2005, 08:14

I was thinking of an optimization like NOT initializing symbols at first because you need to organize them anyways. So after you have put them at right places (less movement required) - you can fill in the rest (256-n) places with symbols not yet used.
I bumped up the addition speed by making it DQWORD wide but the best results is promised on variable width additions. You can double the width on any successice NotAbove-addition and fallback to DWORD if addition goes over eax (the compared value).
Divisions should be get rid of and any MUL DIV should be checked for ZERO/ONE before executing. Removing unneccessary chunks of code executed on preknown results.

btw is the goal to beat shcodec or to prove that you are capable of making something comparable to speeds of shcodec?

14 Jul 2005, 08:14

donkey7

Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska

donkey7 14 Jul 2005, 16:37

can you write some code for it. i don't understand this 'variable width additions' (my english is not s good ;-( ).
i think that checking for one defore divisions will slow down the coder because at most of the time (over 99.9%) that cases won't appear.
to speed up i should not sort frequencies on first chunks, but only on further chunks (i think, but i don't tested this).

the goal is making the fastest coder available (i mean fast enough) that has best performance over all adaptive coders.

i was thinking on a implementation with binary trees and without frequency sorting. this will be faster on 'random' data.

i need coder as last step in lz-based compressor. it will code lenghts of values (in bits) (adaptively), distances and lengths(static, because it can't be adaptive due to high range of possibly values), and literals (also adaptively). lenghts of values are rather constant (and there frequency sorting will help very much) and literals are rather random (and there frequency sorting will not help or even make thing worse).

you can get and compare some other coders on a Eugene Shelwien website.

14 Jul 2005, 16:37

r22

Joined: 27 Dec 2004
Posts: 805

r22 15 Jul 2005, 02:15

If you want speed, you may want to consider using SSE instructions.
I'm not sure if it's possible but you may also want to process or than one symbol at a time.
If you do use SSE you'll want to align your variables and tables on 16byte bounds so you can use the Aligned as opposed to the Unaligned SSE instructions.

15 Jul 2005, 02:15

donkey7

Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska

donkey7 15 Jul 2005, 14:06

r22: processing more than one symbol at a time seems to be impossible. but some instructions can be parallelized (i think). anyway, i don't want to make code for small group of processors (SSE is relatively new technology), but optimize the entire algorithm.

i will think over a binary trees, but it may be impossible to get to work Wink

.
Madis731 improvements are quite good with frequency sorting, now there are are one thing to do - improve speed on 'random' data.

15 Jul 2005, 14:06

f0dder

Joined: 19 Feb 2004
Posts: 3174
Location: Denmark

f0dder 16 Jul 2005, 22:46

Hrm, use SSE in a coder? I thought the algorithm in question used integer and not FP data? And afaik, SSE only supports floatingpoint datatypes (SSE2 adds double-width MMX operations, though). Or am I wrong?

16 Jul 2005, 22:46

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 17 Jul 2005, 17:59

Takes time for me to suit in again because I have been on a holiday. I'll try to figure out some better algos and post if any. MMX came from early Pentiums and I think it is possible to use MMX instructions, but this means we have to sort all data to 8byte chunks and I really don't think of any way right now to access 8 different memorylocations at a time. Like donkey7 said - it seems to be impossible Wink

17 Jul 2005, 17:59

donkey7

Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska

donkey7 17 Jul 2005, 18:53

yes, simd instruction won't help. we must find an algo with complexity like o(log n) or o(sqrt n)., for example binary trees.

i thought a bit on binary trees and there are my suggestions:
- don't sort by frequencies (version for 'random' data),
- we can divide entire set of symbols into two parts, compute cumulative frequency for second part and this would on average double the speed of searching symbols cumulative frequencies,
- we can divide sybols for smaller & smaller sets, this would result in something like binary tree, the main problem is how to manage them (when increasing or decreasing frequency of given symbol) ,
- this would have o(log2 n) complexity - good enough for 256 symbols,

17 Jul 2005, 18:53

donkey7

Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska

donkey7 18 Jul 2005, 21:03

here are my recent implementations of adaptive coder.
shindlet_bt_* - using thing like binary search trees,
shindlet_sl_* - using simple linear search
shindlet_fs_* - using frequency sorting,
i've included also Madis731 improvements.

bt has o(log n) complexity but it isn't much faster thal sl. could anyone 'fix' it ;-?

i've changed coder from subbotin to shindlet (minor change).

please test them on different types of files.

Description:	set of shindlet-based coders	Download
Filename:	range.tgz
Filesize:	7.49 KB
Downloaded:	1534 Time(s)

_________________
Keep coding!

18 Jul 2005, 21:03

donkey7

Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska

donkey7 19 Jul 2005, 19:10

i've updated my coders again.
bt_* are greatly improved, and sl_* are slightly faster (because i've made an mistake in sources before - i forgot to replace all searching alogs with Madis731 ones).

test them and tell me what can be changed to make coder faster.

note that bt_* and sl_* produces the same output (fs_* produces different output but it has the same length as previous ones).

Description:	shindlet coders	Download
Filename:	range.tgz
Filesize:	7.81 KB
Downloaded:	1603 Time(s)

_________________
Keep coding!

19 Jul 2005, 19:10

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum