flat assembler
Message board for the users of flat assembler.

Index > Main > saturation arithmetic without mmx+

Author
Thread Post new topic Reply to topic
randomdude



Joined: 01 Jun 2012
Posts: 83
randomdude 19 Sep 2014, 11:06
which would be the fastest way to perform a signed addition/substraction with saturation?

for unsigned ones i think i have figured it out:

# add
Code:
        mov     eax,0xFFFFFFFF
        add     eax,0x00000001
        sbb     ecx,ecx
        or      eax,ecx      


# sub
Code:
        mov     eax,0x00000000
        sub     eax,0x00000001
        sbb     ecx,ecx
        not     ecx
        and     eax,ecx    

# sub alternative
Code:
        mov     eax,0x00000000
        sub     eax,0x00000001
        cmc
        sbb     ecx,ecx
        and     eax,ecx    


the two first lines of each are just an example to cause an overflow


Last edited by randomdude on 19 Sep 2014, 17:09; edited 2 times in total
Post 19 Sep 2014, 11:06
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1900
DOS386 19 Sep 2014, 11:40
interesting, FAQ added


Last edited by DOS386 on 19 Sep 2014, 11:41; edited 1 time in total
Post 19 Sep 2014, 11:40
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20298
Location: In your JS exploiting you and your system
revolution 19 Sep 2014, 11:40
randomdude wrote:
which would be the fastest way to perform a signed addition/subtraction with saturation?
Don't know. It would depend upon the machine/CPU in use and the exact requirements.
Post 19 Sep 2014, 11:40
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 19 Sep 2014, 13:50
randomdude wrote:
which would be the fastest way to perform a signed addition/substraction with saturation?

for unsigned ones i think i have figured it out:

# add
Code:
        mov     eax,0xFFFFFFFF
        add     eax,1
        sbb     ecx,ecx
        or      eax,ecx      


# sub
Code:
        mov     eax,0
        sub     eax,1
        sbb     ecx,ecx
        not     ecx
        and     eax,ecx    

or the alternative
Code:
        mov     eax,0
        sub     eax,1
        cmc
        sbb     ecx,ecx
        and     eax,ecx    


Without testing and measuring runtimes i'd say the first one will take the least clock cycleson an x86 pc,
Post 19 Sep 2014, 13:50
View user's profile Send private message Visit poster's website Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 517
Location: Munich
shutdownall 19 Sep 2014, 15:32
randomdude wrote:
which would be the fastest way to perform a signed addition/substraction with saturation?

# sub
Code:
        mov     eax,0
        sub     eax,1
        sbb     ecx,ecx
        not     ecx
        and     eax,ecx    



I would avoid mov eax,0 and use the shorter method (in bytes and in execution) with xor eax,eax.
Post 19 Sep 2014, 15:32
View user's profile Send private message Send e-mail Reply with quote
randomdude



Joined: 01 Jun 2012
Posts: 83
randomdude 19 Sep 2014, 17:14
that was just an example, the last 2-3 lines are the important ones Very Happy

@Matrix

thats what i suspected, thanks for clarifying. tho the other ones is 1 byte smaller i think

revolution wrote:
Don't know.

i cant believe this coming from you Razz

Quote:
It would depend upon the machine/CPU in use and the exact requirements.

i just would like to know a good/decent ones for signed addition/substraction


Last edited by randomdude on 05 Oct 2014, 17:20; edited 1 time in total
Post 19 Sep 2014, 17:14
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20298
Location: In your JS exploiting you and your system
revolution 19 Sep 2014, 23:47
You need to state the requirements more precisely.

Is the input register always EAX? Which other registers can be clobbered (if any?). How is the speed measured? What is the baseline measurement timing for comparison? At what point do we consider an improvement significant? How much Icache pressure needs to be factored in? How wide is the range of CPUs that this needs to work with?
Post 19 Sep 2014, 23:47
View user's profile Send private message Visit poster's website Reply with quote
randomdude



Joined: 01 Jun 2012
Posts: 83
randomdude 05 Oct 2014, 09:01
Post 05 Oct 2014, 09:01
View user's profile Send private message Reply with quote
comrade



Joined: 16 Jun 2003
Posts: 1150
Location: Russian Federation
comrade 05 Oct 2014, 15:30
I've used this in my flares demo ( http://www.pouet.net/prod.php?which=64152 ) which uses the 16-bit 5-6-5 RGB mode:

esi is source bitmap
edi is destination bitmap
ebx is the offset to the flare bitmap

edi = esi + ebx, saturated
(All bitmaps are 16-bit 5-6-5 RGB mode). The routine processes two pixels at once, no MMX.

Code:
drawFlare proc
        push    ebp
        mov     [dwStack], esp
        mov     [y], dotHeight
@@redo: mov     ecx, dotWidth
        @@x:    mov     eax, [esi]
                mov     edx, [ebx]
                shr     eax, 1
                and     edx, 0F7DEF7DEh
                shr     edx, 1
                and     eax, 07BEF7BEFh

                mov     ebp, 084108410h
                add     eax, edx
                and     ebp, eax
                mov     esp, ebp
                shr     esp, 4
                sub     ebp, esp
                or      eax, ebp

                add     eax, eax
                mov     [edi], eax
                add     esi, 4
                add     edi, 4
                add     ebx, 4
                sub     ecx, 2
                jnz     @@x
        add     esi, scanline-dotLine
        add     edi, scanline-dotLine
        dec     [y]
        jnz     @@redo
        mov     esp, [dwStack]
        pop     ebp
        ret
drawFlare endp    
Post 05 Oct 2014, 15:30
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.