flat assembler
Message board for the users of flat assembler.

Index > Main > REP RETN

Author
Thread Post new topic Reply to topic
score_under



Joined: 27 Aug 2009
Posts: 27
score_under
Why, in the disassembly of one certain MSVC-generated function (in almost all MSVC-generated apps), is the instruction sequence "rep retn" used?

In regmon, for example:
Code:
00409B3A   $  3B0D 307A4300 CMP ECX,DWORD PTR DS:[437A30]
00409B40   .  75 02         JNZ SHORT Regmon.00409B44
00409B42   .  F3:           PREFIX REP:                              ;  Superfluous prefix
00409B43   .  C3            RETN
00409B44   >  E9 8E1E0000   JMP Regmon.0040B9D7    

It's checking to see if the stack is balanced and not corrupted by storing a hash of ESP in ECX, and comparing it to some stored value - though I can't see the use of the REP before the RETN... could someone explain this?
Post 18 Feb 2010, 03:40
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
It is an optimisation for AMD CPUs. The branch predictor gets it wrong if the retn instruction is jumped to directly by a branch instruction, or if the retn directly follows a branch instruction.

You can safely ignore it.
Post 18 Feb 2010, 04:14
View user's profile Send private message Visit poster's website Reply with quote
asmmsa



Joined: 06 Feb 2010
Posts: 45
asmmsa
F3 is not REP, its a prefix. its rep only for stos/movs/lods/ins/outs. for other instructions it has diffrent meaning.
for ret - its undefined, and shouldnt be used. they might modify ret instruction so with f3 it will do some other stuff than only returning.

branch prediction is only with conditional jumps. f3 + ret = reserved, that means dont use it because in 10 years your code will crash.


branches are 2e and 3e btw, not f3.
Post 18 Feb 2010, 09:53
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
asmmsa: AMD explicity say to use 'rep ret' to solve a problem with the branch prediction. It only affects performance slightly so putting it in, or leaving it out, makes almost no different for most programs.

I seriously doubt that in 10 years the code will crash. Once the manufacturers have stated to use a particular encoding that usually means it will work both on all past CPUs and for all future CPUs without issue, with both Intel and AMD (and others like VIA also). Don't worry about it, it is minor and causes no problems.
Post 18 Feb 2010, 09:59
View user's profile Send private message Visit poster's website Reply with quote
asmmsa



Joined: 06 Feb 2010
Posts: 45
asmmsa
if rep use f3, then i didnt know that.
perhaps i have old version of manuals.


and how adding it to ret solve anything?
ret is uncontitional, it always do 1 way!
Post 18 Feb 2010, 10:03
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
It would seem that when AMD designed the branch predictors they make a mistake when the branch led to a 'ret'. So to solve the problem, they say to put a 'rep' in front so that the branch predictor will work correctly and do predictions.
Code:
jcc .exit ;In some AMD CPUs, this cannot be predicted correctly
;...
.exit: ret    
Code:
jcc .exit ;Now can be predicted correctly
;...
.exit: rep ret    
Post 18 Feb 2010, 10:12
View user's profile Send private message Visit poster's website Reply with quote
Fanael



Joined: 03 Jul 2009
Posts: 168
Fanael
Does ret imm16 suffer from branch misprediction too?
Post 18 Feb 2010, 11:02
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Why recommend "rep ret", though? Wouldn't a nop has been just about as good, without the conceptual nastiness of "rep ret"? Smile
Post 18 Feb 2010, 12:15
View user's profile Send private message Visit poster's website Reply with quote
chaoscode



Joined: 21 Nov 2006
Posts: 64
chaoscode
well. AMD can excute 3 (simple) instructions on the same time,
i think that a prefix is choosen because it doesn't use Execution resources and a nop does.

but maybe i'm not right and just tell wired Crap.

edit:
Why don't ask the amd Support?
Post 18 Feb 2010, 12:54
View user's profile Send private message ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
chaoscode wrote:
well. AMD can excute 3 (simple) instructions on the same time,
i think that a prefix is choosen because it doesn't use Execution resources and a nop does.
I considered that instruction count was the answer, and it's probably the explanation - seems pretty lame though, just how often is "branching to RETN" going to be a performance problem bottleneck in normal code? Seems slightly irresponsible to recommend a "reserved" instruction sequence as a fix...

_________________
Image - carpe noctem
Post 18 Feb 2010, 12:57
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
Well branch prediction is very important to achieve top performance in critical loops. A modern CPU with malfunctioning branch prediction would run really badly. In normal GUI or I/O type code it will make no difference of course.
Post 18 Feb 2010, 13:06
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Of course, revolution... but how often do you have "Jcc <location-of-ret>" inside a critical loop? Smile - dunno how serious the flaw is, though. If it's just the ret that misses branch prediction I wouldn't say it's too bad, if the whole branch predictors "gets confused for a while" it would be more serious.
Post 18 Feb 2010, 13:10
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
I guess AMD felt a little embarrassed and needed a way to fix it without people complaining about creating another overhead of extra instructions.

BTW: it also affects this:
Code:
.loop:
;...
jcc .loop ;<---- prediction problem
ret    
That construct could be quite common in performance code.
Post 18 Feb 2010, 15:01
View user's profile Send private message Visit poster's website Reply with quote
asmmsa



Joined: 06 Feb 2010
Posts: 45
asmmsa
AMD noticed it after they released cpu on market?

LOL, i want to replace 1 of those noobs, my skills are lower for now, but at least i wouldnt do such idiocy. How could they miss that? Its not some small company run by amateur, its 2nd biggest one, they cant make such mistakes!


by the way, fasm doesnt support branch prediction? i have to add prefixes by db before each conditional jump?
Post 18 Feb 2010, 15:34
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17270
Location: In your JS exploiting you and your system
revolution
asmmsa: Modern CPUs are extremely complex. The occasional mistake is to be expected. But AMD have done worse.

http://en.wikipedia.org/wiki/AMD_K10#TLB_Bug
Post 18 Feb 2010, 15:38
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.