flat assembler
Message board for the users of flat assembler.

Index > Windows > nop

Author
Thread Post new topic Reply to topic
eskizo



Joined: 22 Nov 2005
Posts: 59
eskizo 23 Jun 2009, 18:29
Hello!

I was wondering if nop (no operation) currently spends clock circles on processor. So I would like you test this two codes for me, since I don't know how to do this "time/clocks spend checker".

Code:
;code1

mov ecx, 0xFFFFFFFF
@@:
loop @b
    


Code:
;code2

mov ecx, 0xFFFFFFF
@@:
nop
loop @b
    
Post 23 Jun 2009, 18:29
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 23 Jun 2009, 18:48
The result in the vast majority of processors will be that it does not impose an extra clock latency in the code per iteration. The reason is that the processor is capable of handling both instruction in parallel. If you replace nop with "xor eax, eax" you'll probably still get the same times.

As for if NOP consumes a clock cycle it will depend, but for sure it will waste processor resources so it may make a loop take longer to complete if instructions willing to be executed have to wait for NOP to release processor execution units (some AMD processors can avoid sending NOPs to execution units but them still consume processor resources).
Post 23 Jun 2009, 18:48
View user's profile Send private message Reply with quote
eskizo



Joined: 22 Nov 2005
Posts: 59
eskizo 23 Jun 2009, 19:09
Thanks Loco. By the way, whaits the best way to build a "time/clocks spend checker" ? I mean, how to check how many 'nanoseconds' a block of code spends...
Post 23 Jun 2009, 19:09
View user's profile Send private message Reply with quote
arigity



Joined: 22 Dec 2008
Posts: 45
arigity 23 Jun 2009, 19:33
the intel optimization manual has a section on nops.
Post 23 Jun 2009, 19:33
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 23 Jun 2009, 19:49
eskizo wrote:
Thanks Loco. By the way, whaits the best way to build a "time/clocks spend checker" ? I mean, how to check how many 'nanoseconds' a block of code spends...
If you don't mind measuring based on how much CPU cycles it takes instead of how much time has passed, use RDTSC. It is accurate down to a single clock cycle AFAIK (which on modern CPUs is a fraction of a nanosecond).
Post 23 Jun 2009, 19:49
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
windwakr



Joined: 30 Jun 2004
Posts: 827
windwakr 23 Jun 2009, 19:59
If you do any sort of timing you should set the priority of your program to high or realtime.

EDIT: fixed

Ok, If you need a good article on rdtsc timing read THIS.


Last edited by windwakr on 23 Jun 2009, 20:18; edited 3 times in total
Post 23 Jun 2009, 19:59
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 23 Jun 2009, 20:04
With any kind of timing, no?
Post 23 Jun 2009, 20:04
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 27 Jun 2009, 12:35
Timing with RDTSC are usually only accurate upto the point where context-switch occurs. That is the primary reason for realtime-priority. Rule of thumb says, measure clocks that usually end in less than one millisecond which is about 2000000 clocks on 2GHz CPU. If you measure with RDTSC you need to be aware that any code that takes less than bus multiplier clocks, you see 0 or the bus multiplier.

Example: 2GHz T7200 CPU has a 166.3MHz bus and a 12x multiplier. Time difference between 2 RDTSCs can be 0, 12, 24, 36, 48 and you cannot get more accurate than that.
1) If you test your code 10 times and you get results 0,12,12,0,0,0,12,12,0,12 you can be pretty sure that it was 6 clocks.
2) If you want to be *really* sure though, you should loop your code say 1024 times and then do SHR [result],10 Smile When you get 6, then it usually is 6±0.5 clocks.
Post 27 Jun 2009, 12:35
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 27 Jun 2009, 16:29
Quote:

If you measure with RDTSC you need to be aware that any code that takes less than bus multiplier clocks, you see 0 or the bus multiplier.

Example: 2GHz T7200 CPU has a 166.3MHz bus and a 12x multiplier. Time difference between 2 RDTSCs can be 0, 12, 24, 36, 48 and you cannot get more accurate than that.

Why? Confused Have you really seen two consecutive RDTSC giving exactly the same count???
Post 27 Jun 2009, 16:29
View user's profile Send private message Reply with quote
windwakr



Joined: 30 Jun 2004
Posts: 827
windwakr 27 Jun 2009, 16:52
Code:
rdtsc
mov ebx,eax
rdtsc
    


always either different by 105 or 98, weird.

Madis731 wrote:

Example: 2GHz T7200 CPU has a 166.3MHz bus and a 12x multiplier. Time difference between 2 RDTSCs can be 0, 12, 24, 36, 48 and you cannot get more accurate than that.


Ok, 98 goes into my multiplier 7 times, but what about 105? 7.5 times, how do you explain that?


BTW: That processor was only $2! It came with some junk motherboard we bought at a flea market.


Description:
Filesize: 15.24 KB
Viewed: 5633 Time(s)

my_cpu.png



_________________
----> * <---- My star, won HERE
Post 27 Jun 2009, 16:52
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 27 Jun 2009, 17:17
After tried countless times this code:
Code:
format pe gui 4.0

mov edi, 3

align 16

.loop:
rdtsc
mov ebx, eax
rdtsc
sub eax, ebx

dec edi
jnz .loop
int3     

I always get EAX=5.

CPUID reports this:
Code:
Name: AMD Athlon 64 3200+
Code Name: Venice
Core Speed: 2009.4 MHz
Multiplier: 10.0x
Bus Speed: 200.9 MHz
HT Link: 1004.7 MHz    
Post 27 Jun 2009, 17:17
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 27 Jun 2009, 21:10
Madis731 wrote:
Timing with RDTSC
Timing with anything, no??? Or are you saying the windows API functions are somehow more precise and less vulnerable to context switching?


windwakr wrote:
Code:
rdtsc
mov ebx,eax
rdtsc
    
Try putting a bunch of other instructions in there which don't modify ebx or eax, then time it without the mob ebx,eax, and again with it. Subtract the difference to find more accurately how many clocks it uses.
Post 27 Jun 2009, 21:10
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.