flat assembler
Message board for the users of flat assembler.

Index > Main > Optimizing - is it true?

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 11 Feb 2005, 20:51
I've made some code that checks individually a register against some value. Let's say case 1 goto Label1, case 2 goto Label2, ... and I've got three of them.
Now if I go and make a jump table near the if/case statements it will jump to a label according to the value in the register, but does the additional memory-read give enough penalty that this kind of optimization is not worth the effort.
I specifically said THREE, because its obvious for 2 that the optimal is straight-forward method and for 10++ the optimal would be look-up table, but how to measure this kind of optimization - taskswitching in Windows is disturbing so much, that I can't get correct readings.
Any thoughts - strong arguments - references -....?

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 11 Feb 2005, 20:51
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
beppe85



Joined: 23 Oct 2004
Posts: 181
beppe85 11 Feb 2005, 22:06
I guess it really depends on RAM installed on user system, each type can have different access times. Also the cache size/speed affects overall runtime, so its hard to get to a definitive test.

Task-switching will not affect measure you you try small loops.
Post 11 Feb 2005, 22:06
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 14 Feb 2005, 10:50
Lets hope now its all in L1 cache - shouldn't depend on RAM too much then. The problem is if on misprediction the pipeline would be flushed anyway - will or will there not be any more penalties for consecutive jumps.

I usually make my test programs large enought to get closer to the average - this means I loop the same code several (~10000) times and sometimes it happens to task-switch in the middle of this. I have always hoped that the task-switch is aligned to the program start but it isn't always so - why? - and then I'd have to run it many times and choose the average best results.
Post 14 Feb 2005, 10:50
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
beppe85



Joined: 23 Oct 2004
Posts: 181
beppe85 14 Feb 2005, 20:00
invoke Sleep,0

After return, you'll be at beginning of time slice.
Post 14 Feb 2005, 20:00
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 15 Feb 2005, 10:03
Is it documented or you just know?
Maybe you know how many ticks I have left to test my code when I start a timeslice or milliseconds. I could run a quick test to find that out like when two RDTSCs compares are way out of 30clock limit for example when I get 40-45ticks and suddenly its 1460ticks => I can be sure that there was a blackout some time ago Very Happy
Post 15 Feb 2005, 10:03
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
beppe85



Joined: 23 Oct 2004
Posts: 181
beppe85 15 Feb 2005, 10:39
Yeah, it's not just guessing...goto MSDN; Note that Sleep(0) its a special case, Sleep(1) takes 1ms from running thread.

There are many variables that can affect time slice length...if there are no higher priority process, maybe you could get a measure on a specific platform. But its trial and error, as you pointed...
Post 15 Feb 2005, 10:39
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 15 Feb 2005, 10:55
Interesting - we learn every day, but I read my first post again and
I can see that I wasn't clear enough.
My question was "Is it true that optimizing 3 jumps with a table is worth the effort?"
I hate to admit that I didn't know the guts of the Sleep(Ex) function call Sad
Post 15 Feb 2005, 10:55
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 15 Feb 2005, 11:46
Madis731 wrote:

I hate to admit that I didn't know the guts of the Sleep(Ex) function call Sad

me too
Post 15 Feb 2005, 11:46
View user's profile Send private message Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S. 24 Feb 2005, 23:46
Why tables?
Code:
        dec      eax
        jc       case_0
        jz       case_1
        ja       case_2    
Of course, there's one redundant Jcc in this code.

Anyway, I think it's better not to mix tables with code, but place them somewhere in data section. CPU uses different cahces for code/data.
Post 24 Feb 2005, 23:46
View user's profile Send private message Reply with quote
Ralph



Joined: 04 Oct 2003
Posts: 86
Ralph 28 Feb 2005, 23:01
I use cmov for similar things sometimes when I have around 3 branches. Not sure how fast it is. More code but no memory hits and no conditional branching.

Code:
        mov     eax,.case1
        mov     ecx,.case2
        cmp     dl,1
        cmovz   eax,ecx
        mov     ecx,.case3
        cmp     dl,2
        cmovz   eax,ecx
        call    eax
    
Post 28 Feb 2005, 23:01
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 01 Mar 2005, 10:57
I once read about conditional moves and was fascinated about it. Later I saw that the penalties they had could be replaced with 3 1µop commands with no penalties. I can't remember what was it, but did it do something with the BTB or stalled the pipeline ^o) can't recall right now - it just has been stuck in my mind CMOVcc is bad, bad, bad Razz
Post 01 Mar 2005, 10:57
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
beppe85



Joined: 23 Oct 2004
Posts: 181
beppe85 01 Mar 2005, 13:20
Danny Thorpe(Borland Delphi's Chief Scientist) in this post(http://blogs.borland.com/dcc/archive/2003/07/18/2374.aspx) said that CMOVcc 'will' not be included. He put his motivations there. It turns out that D7 has it now...well, he's not an assembly expert, btw...

If you give one or two spare registers, I guess could rewrite using SETcc/OR, but I know too it can be done better. Embarassed

LATER: Just to correct myself. Danny was refering to the compiler output, and I to inline assembly.

_________________
"I assemble, therefore I am"

If you got some spare time, visit my blog: http://www.beppe.theblog.com.br/ and sign my guestmap
Post 01 Mar 2005, 13:20
View user's profile Send private message Reply with quote
S.T.A.S.



Joined: 09 Jan 2004
Posts: 173
Location: Ru#27
S.T.A.S. 02 Mar 2005, 01:19
Well, I'm not sure how many CPU tics will it take to execute CMOVcc on PIV (because there's no information in official docs about it). But CMOVcc are rather fast on P6/K7/K8 cores. And they have no "branch mispredict" and other similar stuff, unlike Jcc.

Ralhp's code is really very nice! (I'd just replace 'call' with 'jmp' when it's possible)
However in some cases it's better to use conditional branches. say we have chances of branches like 90%, 5%, 5%. So, branch prediction can make a good job.
Post 02 Mar 2005, 01:19
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 02 Mar 2005, 08:51
Something related to this topic: I was wondering in which CPU family those jump hints with opcode 2Eh and 3Eh were introduced, since I only saw them mentioned in those Intel docs. Not tested on Athlon (XP) CPUs yet. I'm also very interested in how much they would speed up branching.

The biggest problem with those seems to be Fasm here: Flat Assembler doesn't seem to support them yet. So, it'm gonna test them with DB (like in old Tasm days) for myself.
Post 02 Mar 2005, 08:51
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 02 Mar 2005, 09:08
MCD: there is no common syntax for such hints, and you can easily make macros which is also better because with maro you can easily turn hints on/off.
Post 02 Mar 2005, 09:08
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 02 Mar 2005, 09:20
vid wrote:
MCD: there is no common syntax for such hints, and you can easily make macros which is also better because with maro you can easily turn hints on/off.
Good. But does anyone know when they were introduced?

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 02 Mar 2005, 09:20
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 02 Mar 2005, 10:49
They were already discussed here:
http://board.flatassembler.net/topic.php?t=716
Post 02 Mar 2005, 10:49
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 02 Mar 2005, 10:56
thanks, I forgot about those. Bad they are P IV+, cause I guess Athlon doesn't have them yet.
Post 02 Mar 2005, 10:56
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 02 Mar 2005, 12:07
...and on the same topic, there are many articles and programs that promise to make perfect jumptables, but when you go 5 and above (when you can't think the algorithm in your head), then they usually make two lookup tables. It gives so much penalty that comparing and jumping is even better - and I'm not yet going to program for PIV-only so ut/lt are not my options Sad

Binary search gave me impressive results so I can't compromise the readability of code and make two tables to look-up the value jumped to. What a dilemma Razz
Post 02 Mar 2005, 12:07
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 02 Mar 2005, 12:24
The same for me, as I'm definetely not starting to program PIV+ code only. perhaps 805/686+, but that's it.
Post 02 Mar 2005, 12:24
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.