jmurray
Posts: 8
Location: Plymouth, UK
jmurray 28 Apr 2020, 20:39
I'm trying to implement a method to measure memory latency for my university project. The procedure takes in two values rcx(two longs for timestamp counting) and rdx(a memory location pointer).

I require both of these to be available after a CPUID call to see whether such a call affects memory latency. Previously I have solved with with a push pop of rcx, but I am unable to push pop both rcx and rdx as I get an "invalid value" error for "push rdx".

The basic method is as follows: rdtsc > CPUID 0 > read from rdx > rdtsc > return rdtsc's

The code is below:

proc MeasureMemoryCPUID uses rcx, rdx ;rcx and rdx need to be push'ed before CPUID
     push rdx
     ;measure cycles
     mov [rcx],eax
     add rcx,4
     mov [rcx],edx
     add rcx,4
     ;code to measure
     push rcx
     mov eax,0H
     pop rdx
     mov eax, dword [rdx];measuring the leading dword of [rdx] into eax
     pop rcx
     ;measure cycles
     mov [rcx],eax
     add rcx,4
     mov [rcx],edx
     add rcx,4
     mov rax,rcx

I am unsure how to implement this correctly.
revolution

Joined: 24 Aug 2004
Posts: 20214
Location: In your JS exploiting you and your system
revolution 28 Apr 2020, 22:09
Remove the comma after rcx.
proc ... uses rcx rdx    
Also if you want your code section to show correctly don't tick the "Disable BBCode in this post" in your post. I have edited your post to fix it.
fpissarra
Posts: 64
fpissarra 29 Apr 2020, 14:24
If you want to "measure the latency", you should follow Intel's recomendations. For example, they recomend to serialize the processor BEFORE trying to read TSC. You could use MFENCE before the serialization:

  push rbx  ; MS ABI demands RBX, RSI and RDI to be preserved.
  push rsi
  push rdi

  mfence    ; memory sync.

  xor eax,eax  ; serialize processor.


  ; store EDX:EAX somewhere... (probably 1 additional cycle).
  mov  esi,eax
  mov  edi,edx

  ... instructions to measure here... (RSI and RDI must be preserved!).

  ; OBS: You should reserialize the processor again here. to avoid
  ; reordering effects.
  xor  eax,eax  ; 20+ cycles, maybe?


  ; subtract EDX:EAX from previously obtained EDX:EAX.
  sub  eax,esi
  sbb  edx,edi
  shl  rdx,32
  or   rax,rdx

  ; OBS: You could subtract the additional cycles from RAX here.
  ;   sub  rax,21    ; 21 cyvles, maybe?
  ;   jns  .not_negative
  ;   xor  rax,rax   ; clamp to zero.
  ; .not_negative:

  pop rdi
  pop rsi
  pop rbx

But be aware that using TSC to measure latency isn't a precise method...
