Having a write to a buffer loop, what would be a better way (in terms of speed) to rotate the buffer? Buffer rotation is an exceptional situation and will occur very rarely.
I have thought about three possible solutions:
Given:
rdx - pointer to buffer
rcx - buffer size
First: jump out and return. But I guess this is the worst one
    
mov rdi, rdx
lea rbx, [rdx + rcx]
continue:
  cmp rdi, rbx
  jae rotate
do_stuff:
  . . .
  mov [rdi], ax
  add rdi, 2
  jmp continue
rotate:
  mov rdi, rdx
  jmp do_stuff
    
 
Second, use cmovcc.
    
mov rdi, rdx
lea rbx, [rdx + rcx]
continue:
  push rdx
  cmp rdi, rbx
  cmovae [rsp]
  pop rdx
do_stuff:
  . . .
  mov [rdi], ax
  add rdi, 2
  jmp continue
    
 
Third, use cmovcc preinitialized
    
mov rdi, rdx
mov [rbp - 8], rdx
lea rbx, [rdx + rcx]
continue:
  cmp rdi, rbx
  cmovae [rbp - 8]
do_stuff:
  . . .
  mov [rdi], ax
  add rdi, 2
  jmp continue
    
 
I remember from optimization guides, that it's better to avoid conditions in loops. But I'm not sure if cmovcc is better than conditional flow in case branch prediction will predict "not taking the jump" most of the time. Number of iterations may vary. But most of the time I assume couple of hundreds. Thanks in advance!