flat assembler
Message board for the users of flat assembler.

Index > Main > slowness of 64 bit code

Author
Thread Post new topic Reply to topic
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 16 Oct 2009, 05:53
I have been working on optimizing some code, and I've noticed that 64 bit code is SLOWER than the corresponding 32 bit programs. Has anyone else noticed this, or am I just missing something?
Post 16 Oct 2009, 05:53
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20301
Location: In your JS exploiting you and your system
revolution 16 Oct 2009, 06:52
Sure, it is slower in the general case. You have to need the 64bit registers before you will see any benefit from it. Adding qword(2) + qword(2) using 64bit code is a good way to to make your code larger and slower for absolutely no beneficial purpose.
Post 16 Oct 2009, 06:52
View user's profile Send private message Visit poster's website Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 789
Location: Adelaide
sinsi 16 Oct 2009, 07:13
You can still use EAX etc. One thing I've noticed in a lot of 64-bit code is the use of e.g. "xor rax,rax" when "xor eax,eax" does the same thing.
The only times you need to use a 64-bit gpr is for memory access, handles or some 64-bit maths. Even rip addressing is only 32-bit isn't it?

Can you post some code that shows the difference?
Post 16 Oct 2009, 07:13
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 27 Oct 2009, 05:12
I noticed it when using the upper xmmx registers. It might be that it put presure on the reaning engine, or could have just been the extra prefix.
Do you know if there are any speed differences between 64bit mode and 32 bit mode? I would like to use the 16 gpr but I only need to work with 32bit words, and my initial testing indicated that there is no speed difference between moving an aligned qword and an aligned dword to and from memory. Is this correct in your experience?
Post 27 Oct 2009, 05:12
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 27 Oct 2009, 09:02
sinsi wrote:
Even rip addressing is only 32-bit isn't it?

RIP is 64-bit register so it addresses whole 64 bits. You can also use 32-bit EIP addressing (generally not recommended):
Code:
mov eax, [eip]    
Post 27 Oct 2009, 09:02
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 27 Oct 2009, 14:38
@tthsqe
For op-code / instruction latencies and throughput refer to the Intel (or AMD) 64bit optimization manual.
www.intel.com/products/processor/manuals/

If you care to share the algorithm / code you are using, I'm sure someone here could offer more specific speed / tuning help.
Post 27 Oct 2009, 14:38
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 27 Oct 2009, 14:41
The only think I can think of is that when 32-bit code takes less space than 64-bit code. That is when using REX prefix makes your code differ from the 32-bit one. For example 15 bytes versus 17 bytes and all the alignments and caches get messed up. Razz
Post 27 Oct 2009, 14:41
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 27 Oct 2009, 15:29
Madis, are you sure the 15 bytes limit can be exceeded in 64-bit mode? Do you have an example?
Post 27 Oct 2009, 15:29
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 27 Oct 2009, 15:48
I'm sure it can't.
Post 27 Oct 2009, 15:48
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20301
Location: In your JS exploiting you and your system
revolution 27 Oct 2009, 16:22
LocoDelAssembly: Two consecutive instructions each with a REX prefix could push a 15byte piece of code to 17bytes.
Post 27 Oct 2009, 16:22
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 27 Oct 2009, 18:20
@revolution: That's exactly what I meant. When "just" switching to 64-bit mode (i.e. without taking care of additional prefixes) you might be hit with a surprise of a few more bytes here and there.
Additionally there are problems of absolute 32-bit addresses, where one would need to add an extra instruction (lea reg,[addr] or mov reg,addr).
Post 27 Oct 2009, 18:20
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 28 Oct 2009, 00:04
So do you think 32bit code should run just as fast in 64bit mode as it does in 32bit mode? There seem to be some problems with push's and pop's and inc's and dec's...
Post 28 Oct 2009, 00:04
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20301
Location: In your JS exploiting you and your system
revolution 28 Oct 2009, 01:00
tthsqe wrote:
So do you think 32bit code should run just as fast in 64bit mode as it does in 32bit mode? There seem to be some problems with push's and pop's and inc's and dec's...
32bit instructions/registers will likely run faster when in 64bit mode when compared to 64bit instructions (as long as your data requirement is all 32bit or less or course). However when compared to 32bit mode then 32bit mode will run the 32bit code fastest, unless your system has many many tasks and >4GB memory then perhaps an underlying 64bit OS would be best to run 32bit code in compatibility mode.

Of course if you are processing 64bit data then you should use 64bit instructions/registers, that is what 64bit is for.
Post 28 Oct 2009, 01:00
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 03 Dec 2009, 21:57
sry about that last post. I forgot to change ebp to rbp so everything in the floating point section was being referenced by ebp. oops. With this change, the performance is the same.
Post 03 Dec 2009, 21:57
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.