flat assembler
Message board for the users of flat assembler.

Index > Main > XOR EAX,EAX

Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next
Author
Thread Post new topic Reply to topic
r22



Joined: 27 Dec 2004
Posts: 805
r22 19 Jul 2007, 00:34
IF YOU HAVE A 64 BIT PROCESSOR please run this benchmark and report your results. (If you want to port it to linux so the weirdos can use it please do so).

Code:
format PE64 console
entry start

include '%fasminc%\win64a.inc'

section '.code' code readable executable

start:
sub rsp,8*9
;; increase priority of the process
        call    [GetCurrentProcess] ;returns -1
        mov     rdx,100h ;realtime
        mov     rcx,rax
        call    [SetPriorityClass]
        call    [GetCurrentThread];;returns -2
        mov     rdx,15
        mov     rcx,rax
        call    [SetThreadPriority]

        mov     rcx,_running
        call    [printf]
;;run benchmarks
        mov     rcx,0x7FFFFFFF
        call    TestXor
        mov     rcx,0x7FFFFFFF
        call    TestMov
;;calc percentage
        mov     rax,qword[XORTIME]
        mov     rdx,qword[MOVTIME]
        cvtsi2sd xmm0,rax  ;;xor time
        cvtsi2sd xmm1,rdx  ;;mov time
        movq     xmm2,xmm1
        subsd    xmm2,xmm0 ;;=mov-xor time
        divsd    xmm2,xmm1
        mulsd    xmm2,qword[HUNDRED]
        movq     qword[PERCENT],xmm2
        mov      rcx,_per
        mov      rdx,qword[PERCENT]
        call     [printf]

;;hang console
        xor     ecx,ecx
        xor     edx,edx
        mov     r8,rcx
        mov     r9,rcx
        call    [MessageBox]
        xor     ecx,ecx
        call    [ExitProcess]

align 16
TestXor:
        pop rbp

        ;;;;RCX = COUNTER
        rdtsc
        mov r14d,eax
        mov r15d,edx
        jmp xorlp
align 16
   xorlp: ;;USELESS LOOP
        xor eax,eax
        xor edx,edx
        sub rcx,1
        add rdx,rax ;; do nothing
        test rcx,rcx
        jnz xorlp
   xordn:
        rdtsc
        sub eax,r14d
        sbb edx,r15d
        mov rcx,_func1
        mov r8d, eax
        mov dword[XORTIME],eax
        mov dword[XORTIME+4],edx
        call [printf]

        jmp rbp

align 16
TestMov:
        pop rbp

        ;;;;RCX = COUNTER
        rdtsc
        mov r14d,eax
        mov r15d,edx
        jmp movlp
align 16
   movlp: ;;USELESS LOOP
        mov eax,0x0
        mov edx,0x0
        sub rcx,1
        add rdx,rax ;; do nothing
        test rcx,rcx
        jnz movlp
   movdn:
        rdtsc
        sub eax,r14d
        sbb edx,r15d
        mov rcx,_func2
        mov r8d, eax
        mov dword[MOVTIME],eax
        mov dword[MOVTIME+4],edx
        call [printf]

        jmp rbp


section '.data' data readable writeable
XORTIME dq 0
MOVTIME dq 0
PERCENT dq 0
HUNDRED dq 100.0

_running db 'XOV VS MOV 64-BIT Benchmark Started... (time in processor ticks)',13,10
         db '   Note: Req32 is used because the upper half of the 64bit register is cleared',13,10,0
_func1 db 'Function1 time (xor r32,r32): 0x%X%08X',13,10,0
_func2 db 'Function2 time (mov r32,0x0): 0x%X%08X',13,10,0
_per   db 'Percentage speed difference : %f%%',0

section '.idata' import data readable writeable

  library kernel32,'KERNEL32.DLL',\
          msvcrt,'MSVCRT.DLL',\
          user32,'USER32.DLL'
      include  "%fasminc%\apia\kernel32.inc"
      include  "%fasminc%\apia\user32.inc"

  import msvcrt,\
         printf,'printf'

section '.reloc' fixups data discardable
    


Results:
XOV VS MOV 64-BIT Benchmark Started... (time in processor ticks)
Note: Req32 is used because the upper half of the 64bit register is cleared
Function1 time (xor r32,r32): 0x1004AB862
Function2 time (mov r32,0x0): 0x180561AA3
Percentage speed difference : 33.315732%

Without having to resort to (my humble opinion) ...
XOR is faster that's all their is to it. If you want to continue using mov to clear your registers then QUIETLY continue being slow/wrong. Very Happy
Post 19 Jul 2007, 00:34
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
FrozenKnight



Joined: 24 Jun 2005
Posts: 128
FrozenKnight 19 Jul 2007, 08:50
i bet that soon one of these guys is going to post about the advantages of using .net in your asm coding.
Post 19 Jul 2007, 08:50
View user's profile Send private message Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 19 Jul 2007, 09:58
r22 wrote:
(If you want to port it to linux so the weirdos can use it please do so).
I will acknowledge being weird, and also being unable to spell calendar correctly, but, I don't think it is pertinant to this thread for someone to hurl insults at fellow FASM forumers based upon WHICH VERSION of FASM they employ. It took A LOT OF EFFORT, to create FASM for each of the different operating systems, and one ought not, in my opinion, dismiss that effort by disparaging one's preference for choice. Personally, I think that FASM deserves a lot of kudos, SIMPLY FOR HAVING GIVEN SO MANY PEOPLE a CHOICE. FASM offers users the opportunity to bypass the monopolistic tendencies of M$. In my opinion, if you find Linux irrelevant, or useless, that discussion belongs in a different thread, but even there, I hope you would provide some data or argument against Linux rather than simply scoff at it.
r22 wrote:

IF YOU HAVE A 64 BIT PROCESSOR please run this benchmark and report your results.


This is surely a step forward, in the discussion. Thank you for this contribution. Well done. Problems are two:
a. we need the "plain vanilla" version, i.e. standard 32 bit cpu, not the more "exotic" 64 bit cpu architecture.
b. we need a real world application, (hint: FASM itself!!!!) which has had the XOR's replaced by MOV's. Then we ALSO need, ACTUAL TIMES, (using the clock on the motherboard, as Michael Abrash showed, many decades ago,) not percentages, because if the real world application runs one third slower with MOV, than with XOR, and still executes, FROM the perspective of the end user, in the SAME time, then, the fact that it is one third faster with XOR is MEANINGLESS.
CPU architecture has improved execution speeds more than 1000 times, during the past three decades. The same cannot be said for software development times, which drag on, seemingly forever. My point is all about IMPROVING SOFTWARE IMPLEMENTATION TIMES, not execution times. I have no doubt that your 64 bit task above executes faster with XOR, than with MOV, I made no claim to superiority of speed or memory utilization associated with MOV. My argument, which thus far seems a tad difficult to comprehend by FASM forumers, is that neither EXECUTION SPEED nor memory reduction remain as current foci of interest in developing commercially viable software. Banks, Factories, Hospitals, i.e. places which use computers, and need reliable software, do not have the slightest interest in whether or not a program written with one "coding" philosophy executes in 439 microseconds, while another program, implementing precisely the same task, requires 439 MILLIseconds, i.e. 1000 times slower, JUST SO LONG as the end user cannot detect any difference. But, if the slower executing program, is FAR MORE READILY understood, by the programming staff of the hospital or factory, then, the contract will be awarded to the software developer who wrote a PROGRAM, instead of the other developer who wrote (much faster executing) CODE. XOR here, is not the MAIN point, it serves only to illustrate what is wrong with the "code" written on the FASM forum, including FASM itself.
r22 wrote:
XOR is faster that's all their is to it. If you want to continue using mov to clear your registers then QUIETLY continue being slow/wrong.
I fear you have misunderstood the MAIN point of my argument. I do not claim that XOR is slower. I do not claim that XOR uses more memory. I claim that neither execution speed, nor memory useage, both CRITICAL parameters in estimating the worth of a program, thirty five years ago, ARE NO LONGER RELEVANT, in real world applications. Yes, in the gaming world, and perhaps in military simulator training exercises, yes, there may be some few applications for which every nanosecond counts, BUT FOR MOST human applications in commerce, saving 30 nanoseconds is UTTERLY irrelevant.
Smile
Post 19 Jul 2007, 09:58
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20454
Location: In your JS exploiting you and your system
revolution 19 Jul 2007, 10:32
vid wrote:
tom: as i countless time told you, XOR instruction on x86 is not same as boolean XOR. boolean XOR operates on true / false values, but instruction XOR operates on set of 32 true/false values, and modifies one of it's operands. Do not mistake these two.
I think tom tobias already knows that vid. He is just seeding everyone here, that is why I have ceased to respond to outbursts like the above. Don't feed the trolls!
Post 19 Jul 2007, 10:32
View user's profile Send private message Visit poster's website Reply with quote
FrozenKnight



Joined: 24 Jun 2005
Posts: 128
FrozenKnight 19 Jul 2007, 13:29
tom tobias, a list of earl world application where that may make a difference unfortunately i don't have the time to do the research or codeing to make the examples.

Encryption of large amounts of data. memory checkers, random number generators (i would show you my example of this but i managed ti find to remove all 3 of the xors from it for better optimization.)


but tom my point doesn't end at xor. if we were to remove every optimization just to gain readability then why are we developing in asm. The only advantage to codeing in asm over other languages is that you have control over the execution speed and size by using such optimizations. if i just wanted to make a readable program then C or C++ would work much better. i code in ASM to show off tricks and skills not make thing easy for the lazy among us. yes this has hit me in the a** a few times and it probably will a few more but big deal. even when i'm codeing in C i some times wonder why that line is there and why i pot it there. asm wont change that. and neither will using non optimized code.

Note: to all newbs coders like tom tobias are why you now need a 2 ghz or better cpu to run any game you buy at the store.
Post 19 Jul 2007, 13:29
View user's profile Send private message Reply with quote
vador



Joined: 12 Nov 2006
Posts: 68
Location: Madagascar
vador 23 Jul 2007, 08:09
I believe in the future they will be programming games in C# of VB.NET, just because of readability and because the game would ship earlier. That makes me sad...
Post 23 Jul 2007, 08:09
View user's profile Send private message Reply with quote
calpol2004



Joined: 16 Dec 2004
Posts: 110
calpol2004 23 Jul 2007, 13:46
I can't believe we're arguiing about something so petty Laughing.

My stance on the subject is this, if your going to use complex instructions just to save a few bytes then go ahead, just make sure you comment it and maybe even comment in the alternative more easy to understand method. Alot of people only use assembly for speed and size and couldn't give a rats ass about readability, if they did then they'd use a HLL language.

Tom does have a point however, one of the main problems with assembly is that to a begginner the code looks like mud Confused.

And vador unfortunatly it seems to be going that way, but communities like this will always exist. I feel your pain however, the college course i'm doing teaches visual basic and .NET and all that crap Sad.
Post 23 Jul 2007, 13:46
View user's profile Send private message MSN Messenger Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak 23 Jul 2007, 21:57
vador wrote:
I believe in the future they will be programming games in C# of VB.NET, just because of readability and because the game would ship earlier. That makes me sad...


This is a sad truth.
Post 23 Jul 2007, 21:57
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 23 Jul 2007, 22:25
Code:
XOV VS MOV 64-BIT Benchmark Started... (time in processor ticks)

   Note: Req32 is used because the upper half of the 64bit register is cleared

Function1 time (xor r32,r32): 0x10834D66E

Function2 time (mov r32,0x0): 0x10420351E

Percentage speed difference : -1.568697%    


feed Smile
Post 23 Jul 2007, 22:25
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak 23 Jul 2007, 22:30
What about sub eax, eax, or shl eax, 32/shr eax, 32? And there's and eax, 0... Then there's also mul 0... I'm sure there's others too.
Post 23 Jul 2007, 22:30
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 23 Jul 2007, 22:54
All of them does not perform register renaming so the processor silly waits for instruction to finish to store the obvious zero result.

vid, what processor did you use?
Post 23 Jul 2007, 22:54
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 23 Jul 2007, 23:00
Quote:
vid, what processor did you use?

Intel Core 2 Duo T5500
Post 23 Jul 2007, 23:00
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 24 Jul 2007, 00:06
r22 wrote:
XOR is faster that's all their [sic] is to it.
hmmm...
Shocked
Post 24 Jul 2007, 00:06
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 24 Jul 2007, 00:48
Since the zeroing constitutes a third of the total loop code I wonder why you get such small negative difference. You did run the test several times, right?
Post 24 Jul 2007, 00:48
View user's profile Send private message Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 24 Jul 2007, 02:31
Jeff Reilly and Dave Salvator, Intel Corp wrote:
The ideal benchmark uses the applications and performs the operations you need. If this is not the case, you have to assess how representative the benchmark is of your needs....

http://www.automotivedesignline.com/howto/bodyelectronics/197700787;jsessionid=VZ0N0HAJRIY0CQSNDLQCKIKCJUNN2JVN

I think we need a benchmark that tests XOR and MOV, without loops, (that's correct, BIG program, guess what, you have LOTS of memory, USE it!!), and without any overhead, such as accessing the stack, or using call/return.
I believe it is advantageous to develop such a benchmark, here, on the forum, and thereby refute the argument that assembly language is difficult for "begginners [sic] to understand". No, it is clearly difficult, when written as CODE, for anyone to understand.
The notion that one should perform a test initially with a 64 bit cpu, seems to me counter productive--let's first get a good testing program running ON ANY 32 bit x86 cpu.
With regard to LocoDelAssembly's query re: quantity of iterations, the test ought to produce the same result every time: that means, NOTHING else is running on the computer, so, the notion of "setting priority" is nonsense. Of course such a test CANNOT BE EXECUTED, meaningfully, under windows, or linux. It must run in protected mode, in Ring zero, not ring 3, with NO OPERATING SYSTEM overhead.
Proper benchmarking is DIFFICULT. The only way, to have meaningful results, is to create such a huge program that one can measure the time in minutes/seconds with a wristwatch. The moment one starts accessing hardware on the computer, with interrupts, then the fragile times required for performing the actual instruction of interest are disturbed by the interrupts. No, a proper test needs to disable all interrupts during the test. Executing the task must be the sole activity of the cpu. To avoid switching times, there are two identical versions of the same test. One version uses XOR, the other MOV. Both tests must be initiated separately.
power on, enter protected mode, prompt to start, (user must record the time)disable interrupts, run program, enable interrupts, prompt user upon completion, approximately 10-20 seconds later.
The test itself ought to include assigning zero to all four registers, EAX, EBX, ECX, EDX, then assigning any arbitrary integer to the same four registers, and then back to zero again, incrementing the integer, and repeat, but without loops. A truly slick program would create the benchmark: i.e. a program to write a program on the fly.
Smile
Post 24 Jul 2007, 02:31
View user's profile Send private message Reply with quote
handyman



Joined: 04 Jun 2007
Posts: 40
Location: USA - KS
handyman 24 Jul 2007, 03:32
Quote:

sbb > SuBtract with Borrow, do you really need a comment for that?


Plue, you are missing the reason for commenting. Comments should not repeat the command, instead it should indicate what is happening in the logic flow and/or the reason that the instruction is being done, if it clarifies the code to your mind. Commenting is needed for when you may want to come back to your own code a year from now and want to know what is happening and why.
Post 24 Jul 2007, 03:32
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 24 Jul 2007, 03:34
I think you risk to loose easily that way Tom Laughing The prefeching must be really good to keep the processor feeded completely all the time.

Can you provide the code that must be repeated across all the available RAM? (assembly code, not descriptive human language).

Note that I think that we must still use a loop to make better meassurements, but we can unroll the loop N times so we don't need to iterate no more than 4 times on a 1 GB RAM system. The program could be designed to ensure that the body of the loop will be executed N times and will be unrolled up to fill all available memory reducing the loop count from N to N/unrolling_factor.

About execution environment, I think we could use some of the OSes developed by fasm members and hack and strip them to our needs (apart of starting our own boot code).
Post 24 Jul 2007, 03:34
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1905
DOS386 24 Jul 2007, 06:27
Forum wrote:

Quote:
Replies: 116


My most successful thread Laughing
Post 24 Jul 2007, 06:27
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 24 Jul 2007, 08:28
Loco: i runned it twice, with similar results. I can play with it more later
Post 24 Jul 2007, 08:28
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
vador



Joined: 12 Nov 2006
Posts: 68
Location: Madagascar
vador 24 Jul 2007, 08:59
[joke]
i don't have a 64-bit processor so i'll run this benchmark inside qemu-x64 running itself inside bochs 2.3 running inside virtualbox running inside VMWare workstation on a transmetta processor
[/joke]

Smile Smile Smile
Post 24 Jul 2007, 08:59
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.