flat assembler
Message board for the users of flat assembler.

Index > Main > Instruction times

Author
Thread Post new topic Reply to topic
Michael



Joined: 24 Oct 2005
Posts: 4
Michael 24 Oct 2005, 16:21
Hello everyone,

I am relatively new to assembly. I have been readeing Art of Assembly by Randall Hyde. From the book I understand that each instruction takes 1 clock , if the operators are registers, and the instructions are in cache.
Now I find this site: http://www.online.ee/%7Eandre/i80386/Opcodes/. The minimum time for any instruction is 2 and the max I have found so far is 41 Shocked . Wich values are correct for current processors?

Thanks in advance.
Michael.
Post 24 Oct 2005, 16:21
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 24 Oct 2005, 16:56
it depends on processor, and on latter processors by state of processor (what's in cache, if destination of jump was predicted...). It's just a little too much for man to remember, all values on all processors.
Post 24 Oct 2005, 16:56
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Michael



Joined: 24 Oct 2005
Posts: 4
Michael 24 Oct 2005, 19:10
I don't want to know exactly the times for each instruction, but an approximation. Also do similar instructions in same conditions have the same speed? Like mov eax,ebx; xor eax,eax; mul eax,ebx etc.
Is there any documentation about instruction times for newer processors?

Michael.
Post 24 Oct 2005, 19:10
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 24 Oct 2005, 19:34
similar are mov,add,sub,cmp,test,xor,or,and, "mul" is much more complicated and takes long time. "div" is even much more complicated than "mul". yes, similar take similar time. usually simpler and older instructions (which existed on 086, 286 and so on) are faster, and it's better to use combination of such instructions than newer alternative. so don't waste time learning 386 instructions like cmov, bt, etc. if you want speed.
Post 24 Oct 2005, 19:34
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Michael



Joined: 24 Oct 2005
Posts: 4
Michael 24 Oct 2005, 20:38
Ok, got it now. Thanks for the quick responses.
Post 24 Oct 2005, 20:38
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 25 Oct 2005, 12:49
vid wrote:
it's better to use combination of such instructions than newer alternative. so don't waste time learning 386 instructions like cmov, bt, etc. if you want speed.

Well, not always. For instance, look at this rule from Intel Optimization Manual:
Quote:

Assembly/Compiler Coding Rule 2. (M impact, ML generality) Use the
setcc and cmov instructions to eliminate unpredictable conditional branches
where possible. [...]
Post 25 Oct 2005, 12:49
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 25 Oct 2005, 13:56
well, you are better informed here...
what is "predictable" control branch?
Post 25 Oct 2005, 13:56
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Octavio



Joined: 21 Jun 2003
Posts: 366
Location: Spain
Octavio 25 Oct 2005, 14:11
vid wrote:
well, you are better informed here...
what is "predictable" control branch?

The processor stores information about how many times the jmp is done or not and make predictions based on stadistics.
if the jmp is done about 50% of times then is unpredictable.
Post 25 Oct 2005, 14:11
View user's profile Send private message Visit poster's website Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 25 Oct 2005, 14:26
The processor uses a cache called Branch Target Buffer (BTB) to predict whether the jump will be taken or not, according to the branch history.

The jump is unpredictable if the jump was taken as often as not taken.
Post 25 Oct 2005, 14:26
View user's profile Send private message Visit poster's website Reply with quote
Hayden



Joined: 06 Oct 2005
Posts: 132
Hayden 26 Oct 2005, 05:23
You can optimize intruction speed by pairing intructions together.
Most newer CPU's will execute both pipe-v and pipe-u in parallele.

consider the following senario...
...
mov eax, dword -- v-pipe
add eax, edx -- u-pipe -- (mov eax, dword depenancy)

this would cause the cpu to stall since pipe-u cannot be executed until
pipe-v has completed the mov eax, dword intruction. cpu stalls usualy
incure a number of clocks for recovery.

solution...

...
mov eax, dword -- v-pipe
nop -- u-pipe

add eax, edx -- v-pipe
... -- u-pipe

now both the v-pipe and u-pipe are executed without any cpu stalls. it
is much better to do a more meaningful intruction other than the nop.
But I hope this illistrates the idea.

code and data alignment will also gain some extra speed.

mmx/smid code should be aligned to 8-bytes along with 64-bit code
but 32-bit code should be aligned to 4-bytes. although alighn(Cool is Ok.

16-bit data aligned to 4-bytes -- optimal for 32/64 bit systems.
32-bit data aligned to 4-bytes
64-bit data aligned to 8-bytes
ect...

some coding ethics...

make sure that all data is aligned to the apropiate boundry(s). when
codeing align to the apropiate boundry and pair your intructions that
can be executed at the same time.

ie: align 4
; pair 1
mov eax, [esp+4] -- v-pipe
mov ebx, [esp+8] -- u-pipe
; pair 2
mov ecx, [esp+12] -- v-pipe
add eax, ebx -- u-pipe

note: some intrucions require more then 1 clock cycle ie - mul requires 3.
After coding a 'mul ebx' eax and edx should not be accessed until
at least 3 clock cycles to avoid a 3 clock cpu stall plus any recovery.

I hope this generalizes some cpu charcteristics for you. I'm not that good
at explaining theese things but more infomation can be found at the intel
website under pairing.

_________________
New User.. Hayden McKay.
Post 26 Oct 2005, 05:23
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 26 Oct 2005, 05:54
there was a link to good tutorial on old FASM site.. anybody has it?
Post 26 Oct 2005, 05:54
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
decard



Joined: 11 Sep 2003
Posts: 1092
Location: Poland
decard 26 Oct 2005, 07:25
Post 26 Oct 2005, 07:25
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 26 Oct 2005, 08:22
yes, but i had .hlp version of pentopt...
Post 26 Oct 2005, 08:22
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.