flat assembler
Message board for the users of flat assembler.
Index
> Main > Optimizing - is it true? Goto page 1, 2, 3 Next |
Author |
|
beppe85 11 Feb 2005, 22:06
I guess it really depends on RAM installed on user system, each type can have different access times. Also the cache size/speed affects overall runtime, so its hard to get to a definitive test.
Task-switching will not affect measure you you try small loops. |
|||
11 Feb 2005, 22:06 |
|
Madis731 14 Feb 2005, 10:50
Lets hope now its all in L1 cache - shouldn't depend on RAM too much then. The problem is if on misprediction the pipeline would be flushed anyway - will or will there not be any more penalties for consecutive jumps.
I usually make my test programs large enought to get closer to the average - this means I loop the same code several (~10000) times and sometimes it happens to task-switch in the middle of this. I have always hoped that the task-switch is aligned to the program start but it isn't always so - why? - and then I'd have to run it many times and choose the average best results. |
|||
14 Feb 2005, 10:50 |
|
beppe85 14 Feb 2005, 20:00
invoke Sleep,0
After return, you'll be at beginning of time slice. |
|||
14 Feb 2005, 20:00 |
|
Madis731 15 Feb 2005, 10:03
Is it documented or you just know?
Maybe you know how many ticks I have left to test my code when I start a timeslice or milliseconds. I could run a quick test to find that out like when two RDTSCs compares are way out of 30clock limit for example when I get 40-45ticks and suddenly its 1460ticks => I can be sure that there was a blackout some time ago |
|||
15 Feb 2005, 10:03 |
|
beppe85 15 Feb 2005, 10:39
Yeah, it's not just guessing...goto MSDN; Note that Sleep(0) its a special case, Sleep(1) takes 1ms from running thread.
There are many variables that can affect time slice length...if there are no higher priority process, maybe you could get a measure on a specific platform. But its trial and error, as you pointed... |
|||
15 Feb 2005, 10:39 |
|
Madis731 15 Feb 2005, 10:55
Interesting - we learn every day, but I read my first post again and
I can see that I wasn't clear enough. My question was "Is it true that optimizing 3 jumps with a table is worth the effort?" I hate to admit that I didn't know the guts of the Sleep(Ex) function call |
|||
15 Feb 2005, 10:55 |
|
MCD 15 Feb 2005, 11:46
Madis731 wrote:
me too |
|||
15 Feb 2005, 11:46 |
|
S.T.A.S. 24 Feb 2005, 23:46
Why tables?
Code: dec eax jc case_0 jz case_1 ja case_2 Anyway, I think it's better not to mix tables with code, but place them somewhere in data section. CPU uses different cahces for code/data. |
|||
24 Feb 2005, 23:46 |
|
Ralph 28 Feb 2005, 23:01
I use cmov for similar things sometimes when I have around 3 branches. Not sure how fast it is. More code but no memory hits and no conditional branching.
Code: mov eax,.case1 mov ecx,.case2 cmp dl,1 cmovz eax,ecx mov ecx,.case3 cmp dl,2 cmovz eax,ecx call eax |
|||
28 Feb 2005, 23:01 |
|
Madis731 01 Mar 2005, 10:57
I once read about conditional moves and was fascinated about it. Later I saw that the penalties they had could be replaced with 3 1µop commands with no penalties. I can't remember what was it, but did it do something with the BTB or stalled the pipeline ^o) can't recall right now - it just has been stuck in my mind CMOVcc is bad, bad, bad
|
|||
01 Mar 2005, 10:57 |
|
beppe85 01 Mar 2005, 13:20
Danny Thorpe(Borland Delphi's Chief Scientist) in this post(http://blogs.borland.com/dcc/archive/2003/07/18/2374.aspx) said that CMOVcc 'will' not be included. He put his motivations there. It turns out that D7 has it now...well, he's not an assembly expert, btw...
If you give one or two spare registers, I guess could rewrite using SETcc/OR, but I know too it can be done better. LATER: Just to correct myself. Danny was refering to the compiler output, and I to inline assembly. _________________ "I assemble, therefore I am" If you got some spare time, visit my blog: http://www.beppe.theblog.com.br/ and sign my guestmap |
|||
01 Mar 2005, 13:20 |
|
S.T.A.S. 02 Mar 2005, 01:19
Well, I'm not sure how many CPU tics will it take to execute CMOVcc on PIV (because there's no information in official docs about it). But CMOVcc are rather fast on P6/K7/K8 cores. And they have no "branch mispredict" and other similar stuff, unlike Jcc.
Ralhp's code is really very nice! (I'd just replace 'call' with 'jmp' when it's possible) However in some cases it's better to use conditional branches. say we have chances of branches like 90%, 5%, 5%. So, branch prediction can make a good job. |
|||
02 Mar 2005, 01:19 |
|
MCD 02 Mar 2005, 08:51
Something related to this topic: I was wondering in which CPU family those jump hints with opcode 2Eh and 3Eh were introduced, since I only saw them mentioned in those Intel docs. Not tested on Athlon (XP) CPUs yet. I'm also very interested in how much they would speed up branching.
The biggest problem with those seems to be Fasm here: Flat Assembler doesn't seem to support them yet. So, it'm gonna test them with DB (like in old Tasm days) for myself. |
|||
02 Mar 2005, 08:51 |
|
vid 02 Mar 2005, 09:08
MCD: there is no common syntax for such hints, and you can easily make macros which is also better because with maro you can easily turn hints on/off.
|
|||
02 Mar 2005, 09:08 |
|
MCD 02 Mar 2005, 09:20
vid wrote: MCD: there is no common syntax for such hints, and you can easily make macros which is also better because with maro you can easily turn hints on/off. _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || |
|||
02 Mar 2005, 09:20 |
|
Tomasz Grysztar 02 Mar 2005, 10:49
They were already discussed here:
http://board.flatassembler.net/topic.php?t=716 |
|||
02 Mar 2005, 10:49 |
|
MCD 02 Mar 2005, 10:56
thanks, I forgot about those. Bad they are P IV+, cause I guess Athlon doesn't have them yet.
|
|||
02 Mar 2005, 10:56 |
|
Madis731 02 Mar 2005, 12:07
...and on the same topic, there are many articles and programs that promise to make perfect jumptables, but when you go 5 and above (when you can't think the algorithm in your head), then they usually make two lookup tables. It gives so much penalty that comparing and jumping is even better - and I'm not yet going to program for PIV-only so ut/lt are not my options
Binary search gave me impressive results so I can't compromise the readability of code and make two tables to look-up the value jumped to. What a dilemma |
|||
02 Mar 2005, 12:07 |
|
MCD 02 Mar 2005, 12:24
The same for me, as I'm definetely not starting to program PIV+ code only. perhaps 805/686+, but that's it.
|
|||
02 Mar 2005, 12:24 |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.