flat assembler
Message board for the users of flat assembler.
Index
> Main > Mov DWORD to DWORD |
Author |
|
Vasilev Vjacheslav 19 Jan 2006, 15:05
push [dw1]
pop [dw2] |
|||
19 Jan 2006, 15:05 |
|
Kinex 19 Jan 2006, 15:39
Thanks!
Really great solution! didn't thought of that. But I have to tell you that this is much slower at least on my CPU. Hmm Is there another way? |
|||
19 Jan 2006, 15:39 |
|
MazeGen 19 Jan 2006, 16:25
Move through temporary register is the fastest way.
The only another (slow) way: Code: mov esi, dw1 mov edi, dw2 movsd |
|||
19 Jan 2006, 16:25 |
|
Kinex 19 Jan 2006, 17:32
Thanks! I'll just will go this way.
But anyway I got another problem here.. does anyone know how to speed up this? Code: use32 dword1 dd 0 dword2 dd 1000000000 for: mov eax,[dword1] mov edx,[dword2] cmp eax,edx jc next jge outta next: add [dword1],00000001h jmp for outta: I know I just could use cmp eax,100000000 but I need to have those as variables. I thought of test instruction or something but it didn't work. Thanks for any help Kinex |
|||
19 Jan 2006, 17:32 |
|
Tomasz Grysztar 19 Jan 2006, 17:53
Kinex wrote: Thanks! Wink There's a way to do it without modifying any register and without touching the stack: Code: mov [dw2],eax xor eax,[dw1] xor [dw2],eax xor eax,[dw1] On my machine it's sometimes faster than PUSH/POP and sometimes slower, depending on the context. |
|||
19 Jan 2006, 17:53 |
|
RedGhost 20 Jan 2006, 02:09
Quote:
you can do Code: mov eax, [dword1] cmp eax, [dword2] they don't both have to be in registers _________________ redghost.ca |
|||
20 Jan 2006, 02:09 |
|
Madis731 20 Jan 2006, 08:41
Hi,
I don't want you to get offended by no means, but here are some tips that you might find useful. What you are doing is not wrong, but there are better ways to solving this: Code: use32 dword1 dd 0 dword2 dd 1000000000 for: mov eax,[dword1] ;mov edx,[dword2] ;cmp eax,edx cmp eax,[dword2] ; Like some posts before - not much quicker but still shorter ;jc next ;jge outta ; You don't want two comparisons here. When you test it for carry, you know ; its not smaller (in unsigned world) and you don't have to check again if it ; is bigger. What you have done is checking if dword1 is smaller than dword2 ; and then checking if dword1 is positive and dword2 is negative. I'm not ; sure if you wanted to do that. So there are three ways out of the loop. jae outta ;same as jnc and you don't need anything else ;next: add [dword1],00000001h jmp for outta: Of course this is not a real life situation, but I would do the adding on a register and then, when I'm out of the loop - write it back to memory. I know that there are caches in CPU, which allow you 3 clock operations or less on any memory location, but the fastest way is to take full use of the registers. Why did they put 16 x 64-bit registers in AMD64/EM64T? think about it. Oh, and why is push/pop method slower? I noticed heavy use of it in some quicksort routines that weren't so quick anymore, when compared to my optimizations: Code: push [dword1] pop [dword2] ;These are actually macroinstructions, something like: mov tempreg,[dword1] ;Its CPU's internal register mov [esp],tempreg sub esp,4 ; This one pushed dword1 on the stack add esp,4 mov tempreg,[esp] mov [dword2],tempreg ; This one popped a value from the stack to dword2 The reason why some CPUs are quicker is that they have special circuitry for that. Some have optimized multiplications, others cache-efficiency and longer term optimizations. Why is using your own register better, can be seen from this example, because you can remove the modifying of stack easily. The example seems pointless, but you can't brake up instructions into smaller ones in the architectual level. Although Pentium III and up are able to do 3 microoperations per clock and maybe (I don't have any source on that) there might happen miracles like optimizing on the microcode level. |
|||
20 Jan 2006, 08:41 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.