flat assembler
Message board for the users of flat assembler.
Index
> Main > How to efficiently exchange register with memory? |
Author |
|
Madis731 08 Jan 2008, 09:58
XOR doesn't break dependency??? That's new - I've always thought it did - better get back to the manuals.
EDIT: Oh, I got it XOR breaks it only under the condition that src=dest. And in this case we don't even have a false dependence, but a real one. Code: ;First lets see this one: xor eax,[esi] ; I think its better to have 2 reads (2R+1W) xor [esi],eax ; because XOR m32,r32 takes 6 clocks of latency xor eax,[esi] ; otherwise you'd have 12 clock minimal theoretical ;and it would end not earlier than 6 clocks after the final instruction. This version is better because 6-clock wait on [esi] is finished before the final xor eax,[esi] and this finishes immediately (no additional latency marked with xor r32,m32). You'll get something like this: Code: xor eax,[esi] ; Takes port0 as first vacant xor [esi],eax ; Starts in the same clock occupying port1,2,3,4 xor eax,[esi] ; on the 7th clock port0 is vacant ; Here you can have instructions operating on eax or [esi] without ; any latency. port1 (and 5 on Core 2), ports 2,3 and 4 are vacant. Last edited by Madis731 on 08 Jan 2008, 10:24; edited 1 time in total |
|||
08 Jan 2008, 09:58 |
|
Vov4ik 08 Jan 2008, 10:14
I think, XOR with RAM operand will be splitted into more than one microinstructions - reading, XORing and then writing back. So, in my opinion, MOVing is faster. But it will be better, as Madis731 said, to consult with intel's manuals.
|
|||
08 Jan 2008, 10:14 |
|
Madis731 08 Jan 2008, 10:28
Heh, and btw from Agner's:
Agner Fog wrote:
EDIT: If you are running out of registers and you absolutely need XCHG r32,m32 then there are other ways around it. Make your application even more memory-accessing and leave all your registers in a location name i.e. r8 dd ?, r9 dd ? etc. The good thing is that when eax needs to be exchanged with either one of them then i.e. xchg eax,[r8] is not a good option and a rather fast alternative exists (provided you have at least MMX or even SSE). If you have SSE, prefer it to MMX even if you don't need 128 bits. Code: mov [r9],eax pshufd xmm0,[r8],10110001b ; DCBA will be => CDAB (C,D are ignored in this example) movq [r8],xmm0 ; Only r8 and r9 are written back (dword+dword=qword) ;movdqa [r8],xmm0 ; Uncomment this if you want to exchange r8<x>r9 | r10<x>r11 mov eax,[r9] ;... align 16 r8 dd ? r9 dd ? r10 dd ? r11 dd ? ;... EDIT: Did some calculations... Code: / uops each \ uops fused| 015| 0 | 1 | 5 | 2 | 3 | 4 | Lat | Recip| ===================================================================================== mov [r_9],eax 1 | | | | | | 1 | 1 | 3 | 1 | pshufd xmm0,[r_8],10110001b 3 | 2 | x | x | 1 | 1 | | | | 1 | movq [r_8],xmm0 1 | | | | | | 1 | 1 | | 1 | (movdqa [r_8],xmm0) (1)| ()| ()| ()| ()| ()| (1)| (1)| (3)| (1)| mov eax,[r_9] 1 | | | | | 1 | | | 2 | 1 | TOTAL: 6 | 2 | 1 | 1 | 1 | 2 | 2 | 2 | 5/8 | 4 | 6 or 9 clocks (one or two exchanges) And you are not wasting any registers. Comparable XCHG sequences would take 7 or 14 clocks minimum. |
|||
08 Jan 2008, 10:28 |
|
MCD 09 Jan 2008, 05:46
Madis731 wrote:
The problem with your code is that you assume that you can write to r9, so that you would need additional space beyond r8(the actual variable), which is not always the case. |
|||
09 Jan 2008, 05:46 |
|
Madis731 09 Jan 2008, 06:50
The very fact that write eax to [r9] means that it can be written to. The only assumption is with MOVDQA, where other 8 bytes are not guaranteed. Its upto the coder (or maybe a macro) to guarantee that r8 & r9 are consecutive.
Its true, thought that alignment is needed because SSE can't read beyond page borders and some other problems, like its a lot of coding and doesn't have much speed benefit. Especially over MOV sequence with a temp register, but using SIMD can do the trick sometimes |
|||
09 Jan 2008, 06:50 |
|
f0dder 09 Jan 2008, 12:15
Keep in mind that XCHG with memory operand as an implicit bus LOCK... pretty useful with multithreading (keyword: atomic - your solutions aren't).
|
|||
09 Jan 2008, 12:15 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.