saturation arithmetic without mmx+

Index > Main > saturation arithmetic without mmx+

Author

Thread

randomdude

Joined: 01 Jun 2012
Posts: 83

randomdude 19 Sep 2014, 11:06

which would be the fastest way to perform a signed addition/substraction with saturation?

for unsigned ones i think i have figured it out:

# add

Code:

        mov     eax,0xFFFFFFFF
        add     eax,0x00000001
        sbb     ecx,ecx
        or      eax,ecx

# sub

Code:

        mov     eax,0x00000000
        sub     eax,0x00000001
        sbb     ecx,ecx
        not     ecx
        and     eax,ecx

# sub alternative

Code:

        mov     eax,0x00000000
        sub     eax,0x00000001
        cmc
        sbb     ecx,ecx
        and     eax,ecx

the two first lines of each are just an example to cause an overflow

Last edited by randomdude on 19 Sep 2014, 17:09; edited 2 times in total

19 Sep 2014, 11:06

DOS386

Joined: 08 Dec 2006
Posts: 1904

DOS386 19 Sep 2014, 11:40

interesting, FAQ added

Last edited by DOS386 on 19 Sep 2014, 11:41; edited 1 time in total

19 Sep 2014, 11:40

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 19 Sep 2014, 11:40

randomdude wrote:

which would be the fastest way to perform a signed addition/subtraction with saturation?

Don't know. It would depend upon the machine/CPU in use and the exact requirements.

19 Sep 2014, 11:40

Matrix

Joined: 04 Sep 2004
Posts: 1164
Location: Overflow

Matrix 19 Sep 2014, 13:50

randomdude wrote:

which would be the fastest way to perform a signed addition/substraction with saturation?

for unsigned ones i think i have figured it out:

# add

Code:

        mov     eax,0xFFFFFFFF
        add     eax,1
        sbb     ecx,ecx
        or      eax,ecx

# sub

Code:

        mov     eax,0
        sub     eax,1
        sbb     ecx,ecx
        not     ecx
        and     eax,ecx

or the alternative

Code:

        mov     eax,0
        sub     eax,1
        cmc
        sbb     ecx,ecx
        and     eax,ecx

Without testing and measuring runtimes i'd say the first one will take the least clock cycleson an x86 pc,

19 Sep 2014, 13:50

shutdownall

Joined: 02 Apr 2010
Posts: 517
Location: Munich

shutdownall 19 Sep 2014, 15:32

randomdude wrote:

which would be the fastest way to perform a signed addition/substraction with saturation?

# sub
Code:
        mov     eax,0
        sub     eax,1
        sbb     ecx,ecx
        not     ecx
        and     eax,ecx    

I would avoid mov eax,0 and use the shorter method (in bytes and in execution) with xor eax,eax.

19 Sep 2014, 15:32

randomdude

Joined: 01 Jun 2012
Posts: 83

randomdude 19 Sep 2014, 17:14

that was just an example, the last 2-3 lines are the important ones Very Happy

@Matrix

thats what i suspected, thanks for clarifying. tho the other ones is 1 byte smaller i think

revolution wrote:

Don't know.

i cant believe this coming from you Razz

Quote:

It would depend upon the machine/CPU in use and the exact requirements.

i just would like to know a good/decent ones for signed addition/substraction

Last edited by randomdude on 05 Oct 2014, 17:20; edited 1 time in total

19 Sep 2014, 17:14

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20689
Location: In your JS exploiting you and your system

revolution 19 Sep 2014, 23:47

You need to state the requirements more precisely.

Is the input register always EAX? Which other registers can be clobbered (if any?). How is the speed measured? What is the baseline measurement timing for comparison? At what point do we consider an improvement significant? How much Icache pressure needs to be factored in? How wide is the range of CPUs that this needs to work with?

19 Sep 2014, 23:47

randomdude

Joined: 01 Jun 2012
Posts: 83

randomdude 05 Oct 2014, 09:01

found some nice links if someone is still interested

http://blog.regehr.org/archives/278
http://locklessinc.com/articles/sat_arithmetic/

05 Oct 2014, 09:01

comrade

Joined: 16 Jun 2003
Posts: 1150
Location: Russian Federation

comrade 05 Oct 2014, 15:30

I've used this in my flares demo ( http://www.pouet.net/prod.php?which=64152 ) which uses the 16-bit 5-6-5 RGB mode:

esi is source bitmap
edi is destination bitmap
ebx is the offset to the flare bitmap

edi = esi + ebx, saturated
(All bitmaps are 16-bit 5-6-5 RGB mode). The routine processes two pixels at once, no MMX.

Code:

drawFlare proc
        push    ebp
        mov     [dwStack], esp
        mov     [y], dotHeight
@@redo: mov     ecx, dotWidth
        @@x:    mov     eax, [esi]
                mov     edx, [ebx]
                shr     eax, 1
                and     edx, 0F7DEF7DEh
                shr     edx, 1
                and     eax, 07BEF7BEFh

                mov     ebp, 084108410h
                add     eax, edx
                and     ebp, eax
                mov     esp, ebp
                shr     esp, 4
                sub     ebp, esp
                or      eax, ebp

                add     eax, eax
                mov     [edi], eax
                add     esi, 4
                add     edi, 4
                add     ebx, 4
                sub     ecx, 2
                jnz     @@x
        add     esi, scanline-dotLine
        add     edi, scanline-dotLine
        dec     [y]
        jnz     @@redo
        mov     esp, [dwStack]
        pop     ebp
        ret
drawFlare endp

05 Oct 2014, 15:30

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum