flat assembler
Message board for the users of flat assembler.

Index > Main > REP RET

Author
Thread Post new topic Reply to topic
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
any info on this code from MS?

Code:
        jne RestoreRcx
        db 0f3h                         ; (encode REP for REP RET)
        ret                             ; no overrun, use REP RET to avoid AMD
                                        ; branch prediction flaw after Jcc    
Post 29 Nov 2006, 18:34
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17666
Location: In your JS exploiting you and your system
revolution
What do you want to know?
Post 29 Nov 2006, 18:46
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
some info about the mentioned flaw.
Post 29 Nov 2006, 18:49
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
MAD_DËMON



Joined: 03 Mar 2006
Posts: 23
MAD_DËMON
The REP prefix is used along with the RET instruction for being able of perform branch prediction in returns from procedures, afk Athlon 64 and Opteron processors aren't able to apply branch prediction in a RET without the REP prefix. It's used when a branch points directly to a RET instruction or when there's a control transfer instruction before a RET.


Last edited by MAD_DËMON on 29 Nov 2006, 20:45; edited 1 time in total
Post 29 Nov 2006, 20:39
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17666
Location: In your JS exploiting you and your system
revolution
Presumably the RET is mispredicted to go to the wrong place, causing a pipelne flush.

What in particular do you want to know? The number of clock ticks lost? The processors it affects? The precise details about the internal AMD circuitry that cause the flaw? The reason REP "fixes" the problem?
Post 29 Nov 2006, 20:44
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Software Optimization Guide for AMD64 Processors wrote:
6.2 Two-Byte Near-Return RET Instruction

Optimization
Use of a two-byte near-return can improve performance. The single-byte near-return (opcode C3h) of
the RET instruction should be used carefully. Specifically, avoid the following two situations:
• Any kind of branch (either conditional or unconditional) that has the single-byte near-return RET
instruction as its target. See “Examples.”
• A conditional branch that occurs in the code directly before the single-byte near-return RET
instruction. See “Examples.”
Application
This optimization applies to:
• 32-bit software
• 64-bit software

Rationale

The processor is unable to apply a branch prediction to the single-byte near-return form (opcode C3h)
of the RET instruction.
The easiest way to assure the utilization of the branch prediction mechanism is to use a two-byte RET
instruction. A two-byte RET has a REP instruction inserted before the RET, which produces the
functional equivalent of the single-byte near-return RET instruction, but is not affected by the
prediction limitations outlined above. To use a two-byte RET, define a text macro named REPRET and
use it instead of the RET instruction to force the intended object code.
REPRET TEXTEQU <DB 0F3h, 0C3h>

Examples

Avoid branches in which the target of the branch is a single-byte near-return:
jmp label ; Jump to a single-byte near-return RET instruction.
...
label:
ret ; RET is potentially mispredicted.
Avoid branches that immediately precede a single-byte near-return:
jz label ; Conditional branch is not taken.
ret ; RET is a fall-through instruction,
; potentially mispredicted.
If possible, move an existing instruction, such as a POP instruction that is part of the function
epilogue, so that it is inserted between the branch and the RET instruction:
jz label
pop ebp ; Pad with at least one non-branch instruction.
ret
If no existing instruction is available for this purpose, then insert a NOP instruction to provide the
necessary padding or, better still, use the recommended two-byte version of RET.


With this, you have confirmation that it's not an invention of MS Razz
Post 29 Nov 2006, 20:56
View user's profile Send private message Reply with quote
MAD_DËMON



Joined: 03 Mar 2006
Posts: 23
MAD_DËMON
I think that I have found a strange flaw in Visual C++ Express, when testing for zero the return value from a windows API call, eax is saved in another register before testing it for zero (test eax,eax) in a if() statement, I don't know exactly what is the purpose of that
Post 29 Nov 2006, 21:11
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
thanks
Post 29 Nov 2006, 22:37
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.