flat assembler
Message board for the users of flat assembler.
Index
> Main > Integer average algorithm |
Author |
|
MazeGen 24 May 2005, 07:01
As I see it:
Code: mov edx,eax xor edx,ebx ;wait until mov is finished shr edx,1 ;wait until xor is finished and eax,ebx add eax,edx ;wait until and is finished What about: Code: mov edx,eax and eax,ebx xor edx,ebx shr edx,1 ;wait until xor is finished add eax,edx |
|||
24 May 2005, 07:01 |
|
Octavio 24 May 2005, 09:54
Madis731 wrote: Would anyone care to explain to me why this first one is recommended on 'gems' sites and not the latter one. Compatibility? Why would my CPU waste perhaps because the second one only works on unsigned numbers. What about using mmx? |
|||
24 May 2005, 09:54 |
|
MCD 24 May 2005, 10:53
Let me point out a small unprecision in your code suggestion
Code: mov edx,eax and eax,ebx xor edx,ebx sar edx,1 ;wait until xor is finished add eax,edx ;wait until sar is finished; <- forgot this This one is only slightly better Code: mov edx,eax xor eax,ebx and edx,ebx sar eax,1 add eax,edx ;wait sar is finished Quote: What about using mmx? Sure, this goes too: Code: ;a: mm0 ;<- this should not mean "ammo" ;b: mm1 ;tmp: mm2 ;help mmx register value movq mm2,mm0 pxor mm0,mm1 pand mm2,mm1 psraw/d mm0,mm0,1 ;<= this disallows you from using byte/qword average because psrlb/w are not available in mmx paddb/w/d/q* mm0,mm2 ;wait psraw/d is finished actually, you can't only use word and dword data sizes, because of the restriction of the paddX and psraX instruction. * note: paddq was introduced with SSE2. Actually, if you allow MMX-II instructions introduced with SSE, you can calculate the average in one instruction: Code: pavgb/w mm0,mm1;only available for byte/words ;There is also something very similar for xmm FPU registers _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || Last edited by MCD on 25 May 2005, 11:07; edited 4 times in total |
|||
24 May 2005, 10:53 |
|
MazeGen 24 May 2005, 11:01
Eh, you're right, MCD
I was pointing in the first place to the fact that Madis have leaved out of consideration dependecies between the instructions. |
|||
24 May 2005, 11:01 |
|
Madis731 24 May 2005, 15:31
You are right - there are dependancies but I would've noticed them if I had used Pentium II or earlier one. The code you suggested acts the same way because the "mov eax,edx" can't start before the last "add eax,edx" is finished so one pass is faster, but by running it 10 times in a row (NO jmp - only 10xcode) the dependancy problem occurs in a different place.
I want to argue about signed numbers: It works with POS+POS, POS+NEG, NEG+POS and NEG+NEG 0FFFFFFFFh+0FFFFFFFDh=Carry+0FFFFFFFCh => -1+(-3)=-4 rcr 0FFFFFFFCh,1 = 0FFFFFFFEh <= -2 What do you mean it works ONLY on unsigned numbers? BTW the SSE solution is elegant but I was speaking about the gems that were introduced many-many years ago and I just got a confirmation that rcr/rcl did indeed exist in 286 Intel processors so why wasn't it used. You didn't answer my question but were arguing about the opimization of the first algorithm. What I need to know is WHY is my version BAD? I don't want to know how optimized the first version is - thanks! |
|||
24 May 2005, 15:31 |
|
r22 24 May 2005, 22:08
Code: mov eax,-4 mov edx,2 add eax,edx rcr eax,1 push eax push fmt ; '%li',0 push buffer call [wsprintf] add esp,0ch push 0 push buffer push buffer push 0 call [MessageBox] When finding the average of a negative and a positive number the ADD RCR algorithm fails. |
|||
24 May 2005, 22:08 |
|
Madis731 25 May 2005, 10:18
Indeed I didn't notice it before - but the first algorithm also works only with unsigned. I should use "sar" instead of "shr" then.
Thanks for pointing that out. |
|||
25 May 2005, 10:18 |
|
MCD 25 May 2005, 11:03
Ups, i missed this issue too. Just corrected my stuff also.
|
|||
25 May 2005, 11:03 |
|
Madis731 26 May 2005, 16:45
Ok, now that its clear I say that I will stick to my version because I've never had to use signed values. This will do.
Thanks everybody for your input! Too bad there isn't rotate arithmetic right i.e. RAR |
|||
26 May 2005, 16:45 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.