flat assembler
Message board for the users of flat assembler.

Index > Main > Mysterious optimization issue - long vs short cond. jumps

Author
Thread Post new topic Reply to topic
ronware



Joined: 08 Jan 2004
Posts: 179
Location: Israel
ronware
I recently modified a loop construct in Reva so that rather than using a 'long' jump, it used a short one. Specifically, the original had code like :

Code:
          db 0fh, 82h
.jmp:  dd 0
    


And the new code is:

Code:
         db 72h
.jmp:  db 0
    


The '.jmp' is filled in at interpretation time, with the values which will be used.

To my surprise, the long jump form executes much faster -- almost twice as fast (!) and I cannot for the life of me understand why this should be. Both routines appear to work correctly.

Any ideas?
Post 05 Jul 2005, 20:51
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 7715
Location: Kraków, Poland
Tomasz Grysztar
Might be some alignment issue.
Post 05 Jul 2005, 20:54
View user's profile Send private message Visit poster's website Reply with quote
ronware



Joined: 08 Jan 2004
Posts: 179
Location: Israel
ronware
I thought of that, but the destination of the jump is the same offset in both cases. I also tried forcing the dest. to be aligned, but that did almost nothing. I got the same results on an Athlon-XP and a Pentium4.
Post 05 Jul 2005, 20:55
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
instruction cache / branch prediction thingy perhaps?

Do you modify .jmp often, or right before reaching the code?
Post 05 Jul 2005, 21:43
View user's profile Send private message Visit poster's website Reply with quote
ronware



Joined: 08 Jan 2004
Posts: 179
Location: Israel
ronware
.jmp is modified once, then that code is copied to a different place (where it is executed). The snipped of code is just a template used to generate the final code.
Post 05 Jul 2005, 22:05
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
Hy ronware,
on newer CPUs, when you modify your code what you execute it causes many ticks of penalties.
because code has to be reloaded in instruction cache.
Post 06 Jul 2005, 12:30
View user's profile Send private message Visit poster's website Reply with quote
ronware



Joined: 08 Jan 2004
Posts: 179
Location: Israel
ronware
Hmm. It's interesting - I tried the same code on my PentiumM laptop and it was faster. Goes to show, it's hard to predict code performance!
Post 06 Jul 2005, 14:46
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
My 2cents.

A long jmp is like MOV EIP, JMP_LOCATION
while a short jmp is like ADD EIP, (SIGN EXTENDED) JMP_LOCATION
if not sign extended than the processor checks the sign and then adds or subtracts from EIP.

In any case the shorter instruction does more work than the longer instruction.

Just like how DEC ECX; JNZ LABEL runs faster than LOOP LABEL.

If I'm totally off feel free to enlighten me.
Post 06 Jul 2005, 17:08
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
I would still think it has to do with branch prediction... with the indirect jmp, no branch prediction can be done. With the short jump, branch prediction is being done, and it might be that a mispredicted branch is more expensive than no branch prediction at all?
Post 06 Jul 2005, 17:10
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 975
Location: Czechoslovakia
MazeGen
But we are talking about short/near conditional jmp, no indirect jmp...
Post 06 Jul 2005, 17:39
View user's profile Send private message Visit poster's website Reply with quote
ronware



Joined: 08 Jan 2004
Posts: 179
Location: Israel
ronware
Both JMPs are conditional, relative jumps. The first is a 'long form' that takes a 32bit offset, the second is the short form which can only jump +- 127 bytes.
Post 06 Jul 2005, 17:53
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Hm, I wonder how I got the impression that the first jump was indirect and conditionless Embarassed
Post 06 Jul 2005, 19:23
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.