Having a write to a buffer loop, what would be a better way (in terms of speed) to rotate the buffer? Buffer rotation is an exceptional situation and will occur very rarely.
I have thought about three possible solutions:
Given:
rdx - pointer to buffer
rcx - buffer size
First: jump out and return. But I guess this is the worst one
mov rdi, rdx
lea rbx, [rdx + rcx]
continue:
cmp rdi, rbx
jae rotate
do_stuff:
. . .
mov [rdi], ax
add rdi, 2
jmp continue
rotate:
mov rdi, rdx
jmp do_stuff
Second, use cmovcc.
mov rdi, rdx
lea rbx, [rdx + rcx]
continue:
push rdx
cmp rdi, rbx
cmovae [rsp]
pop rdx
do_stuff:
. . .
mov [rdi], ax
add rdi, 2
jmp continue
Third, use cmovcc preinitialized
mov rdi, rdx
mov [rbp - 8], rdx
lea rbx, [rdx + rcx]
continue:
cmp rdi, rbx
cmovae [rbp - 8]
do_stuff:
. . .
mov [rdi], ax
add rdi, 2
jmp continue
I remember from optimization guides, that it's better to avoid conditions in loops. But I'm not sure if cmovcc is better than conditional flow in case branch prediction will predict "not taking the jump" most of the time. Number of iterations may vary. But most of the time I assume couple of hundreds. Thanks in advance!