flat assembler
Message board for the users of flat assembler.

Index > Windows > What!? Why does this happen....

Author
Thread Post new topic Reply to topic
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
I was playing around with some speed tests and found something very strange...

This code gets me 4207 milliseconds:
Code:
include 'win32ax.inc'

.data
    start_time dd ?
    _output rb 6
    counter dd ?
.code
start:
            invoke GetTickCount
            mov [start_time],eax
            xor ecx,ecx
            .theloop:
            mov eax,[counter]
            add eax,1
            mov [counter],eax
            add ecx,1
            cmp ecx,1000000000
            jl .theloop
            .exit:
            invoke GetTickCount
            sub eax,[start_time]
            invoke wsprintf,_output,"%i",eax
            invoke MessageBox,NULL,_output,"Speed Test",MB_OK
            invoke ExitProcess,0
 .end start
    


But, when I add "mov [counter],ecx" to the code, things SPEED up to 3370 milliseconds:
Code:
include 'win32ax.inc'

.data
    start_time dd ?
    _output rb 6
    counter dd ?
.code
start:
            invoke GetTickCount
            mov [start_time],eax
            xor ecx,ecx
            .theloop:
            mov eax,[counter]
            add eax,1
            mov [counter],eax
            mov [counter],ecx
            add ecx,1
            cmp ecx,1000000000
            jl .theloop
            .exit:
            invoke GetTickCount
            sub eax,[start_time]
            invoke wsprintf,_output,"%i",eax
            invoke MessageBox,NULL,_output,"Speed Test",MB_OK
            invoke ExitProcess,0
 .end start
    


What's going on? Why do things speed up when I add the statement "mov [counter],ecx", I thought it might have modified ecx some how but it also speeds up with "mov [counter]," edx, and ebx

But what's happening, why do things run faster because of this?
Post 26 Mar 2008, 20:38
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Instruction pairing? What cpu type and specific OS flavor are you running?
Post 26 Mar 2008, 22:26
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Just a wild guess, but maybe it was just enough to make the prediction work better Smile, check alignment of loop too. Other than that, no clue.
Post 26 Mar 2008, 23:33
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17669
Location: In your JS exploiting you and your system
revolution
Optimising for speed is always problematic with very tight loops. There are so many things that can affect the loop timing. Alignment in memory, number of instructions in the loop, CPU version, memory usage pattern, etc.

Out-of-order CPU's are very complex and to properly make sure you have really optimised a piece of code the only way to be sure is run it in many different code variants on many different system variants and compare the timings.

But the example above is of course useless and will tell you nothing about what to expect for a genuine function that actually does something useful.
Post 27 Mar 2008, 01:24
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Quote:
But the example above is of course useless and will tell you nothing about what to expect for a genuine function that actually does something useful.
Optimist Smile
Post 27 Mar 2008, 01:34
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
Quote:

add eax,1
mov [counter],eax
mov [counter],ecx

To a modern processor is just ...
Quote:

add eax,1
mov [counter],ecx

Thanks to dependency checking/optimizations and pipelines etc.

Now
add eax,1
mov [counter],eax
Has a partial stale because ADD is modifying eax and then MOV is writing it to memory. The processor has to wait for the ADD to finish before it can perform the MOV (in general don't start ripping this on semantics).

This stale doesn't exist in
add eax,1
mov [counter],ecx
The process can perform these two opcodes at the same time, because they are mutually exclusive.


Also it's not a loop alignment issue because the label
.theLoop: would represent the same memory location in both programs.
Post 27 Mar 2008, 23:09
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
daniel.lewis



Joined: 28 Jan 2008
Posts: 92
daniel.lewis
Hrmm....

moving/adding/subbing things into eax has a shortcut form that takes fewer bytes, but this is out of eax, not into it. :p
Post 28 Mar 2008, 08:01
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.