flat assembler
Message board for the users of flat assembler.
Index
> Main > saturation arithmetic without mmx+ |
Author |
|
DOS386 19 Sep 2014, 11:40
interesting, FAQ added
Last edited by DOS386 on 19 Sep 2014, 11:41; edited 1 time in total |
|||
19 Sep 2014, 11:40 |
|
revolution 19 Sep 2014, 11:40
randomdude wrote: which would be the fastest way to perform a signed addition/subtraction with saturation? |
|||
19 Sep 2014, 11:40 |
|
Matrix 19 Sep 2014, 13:50
randomdude wrote: which would be the fastest way to perform a signed addition/substraction with saturation? Without testing and measuring runtimes i'd say the first one will take the least clock cycleson an x86 pc, |
|||
19 Sep 2014, 13:50 |
|
shutdownall 19 Sep 2014, 15:32
randomdude wrote: which would be the fastest way to perform a signed addition/substraction with saturation? I would avoid mov eax,0 and use the shorter method (in bytes and in execution) with xor eax,eax. |
|||
19 Sep 2014, 15:32 |
|
randomdude 19 Sep 2014, 17:14
that was just an example, the last 2-3 lines are the important ones
@Matrix thats what i suspected, thanks for clarifying. tho the other ones is 1 byte smaller i think revolution wrote: Don't know. i cant believe this coming from you Quote: It would depend upon the machine/CPU in use and the exact requirements. i just would like to know a good/decent ones for signed addition/substraction Last edited by randomdude on 05 Oct 2014, 17:20; edited 1 time in total |
|||
19 Sep 2014, 17:14 |
|
revolution 19 Sep 2014, 23:47
You need to state the requirements more precisely.
Is the input register always EAX? Which other registers can be clobbered (if any?). How is the speed measured? What is the baseline measurement timing for comparison? At what point do we consider an improvement significant? How much Icache pressure needs to be factored in? How wide is the range of CPUs that this needs to work with? |
|||
19 Sep 2014, 23:47 |
|
randomdude 05 Oct 2014, 09:01
found some nice links if someone is still interested
http://blog.regehr.org/archives/278 http://locklessinc.com/articles/sat_arithmetic/ |
|||
05 Oct 2014, 09:01 |
|
comrade 05 Oct 2014, 15:30
I've used this in my flares demo ( http://www.pouet.net/prod.php?which=64152 ) which uses the 16-bit 5-6-5 RGB mode:
esi is source bitmap edi is destination bitmap ebx is the offset to the flare bitmap edi = esi + ebx, saturated (All bitmaps are 16-bit 5-6-5 RGB mode). The routine processes two pixels at once, no MMX. Code: drawFlare proc push ebp mov [dwStack], esp mov [y], dotHeight @@redo: mov ecx, dotWidth @@x: mov eax, [esi] mov edx, [ebx] shr eax, 1 and edx, 0F7DEF7DEh shr edx, 1 and eax, 07BEF7BEFh mov ebp, 084108410h add eax, edx and ebp, eax mov esp, ebp shr esp, 4 sub ebp, esp or eax, ebp add eax, eax mov [edi], eax add esi, 4 add edi, 4 add ebx, 4 sub ecx, 2 jnz @@x add esi, scanline-dotLine add edi, scanline-dotLine dec [y] jnz @@redo mov esp, [dwStack] pop ebp ret drawFlare endp |
|||
05 Oct 2014, 15:30 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.