flat assembler
Message board for the users of flat assembler.

flat assembler > Examples and Tutorials > Accurate multi-threaded 64-bit counters on a 32-bit machine

Goto page Previous  1, 2, 3
Author
Thread Post new topic Reply to topic
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
and one should stay content with the timing of the oldsafe proc, once
considering the recomended Intel's solution. from Example 8-4.
http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf
on my Yorkfield it times at least 3x slower !!
some little improvement, by using PAUSE to signal entering the wait-loop for the new safe proc, but only unstable 10%. it may work better on older processors though.

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 04 May 2013, 19:29
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
while checking for updates on azillionmonkeys today, i have had an idea reading the code there.
it may achieve 2 benefits:
1) save some power while spinning
2) smooth aggressivity of the thread, allowing other threads to hit successfully
Code:
 mov esi,ADDER_VALUE and 0xffffffff
 mov edi,ADDER_VALUE shr 32
                        
.loop
  pause 
  times 4 nop
  mov eax, dword counter   ; 1
  mov edx, dword counter+4 ; 2
  mov ebx, eax
  mov ecx, edx
  add ebx, esi
  adc ecx, edi
  lock cmpxchg8b  qword counter        ; 3
  jnz .loop

  dec count
  jnz .loop
  ret
    
the pause works as an hint for newer processors "entering a locking loop".
it translates to a "nop" on older processors. the 4 nop there have been calculated on latency/rec.througput of the following instructions
their goal should be to desynchronize the loop, using half the latency of itself, resulting so as "yelding" other threads
the access to the resource. here following tests on Yorkfield (Quad 45nm)

revolution_atom64.exe
Code:
unsafe  Count156602052344132608 Time125
unsafe2 Count200000000000000000 Time1451  <---
oldsafe Count200000000000000000 Time2074
safe    Count200000000000000000 Time1592

unsafe  Count140552233700888576 Time110
unsafe2 Count200000000000000000 Time1482   <---
oldsafe Count200000000000000000 Time1918
safe    Count200000000000000000 Time1358

unsafe  Count140235594251510784 Time109
unsafe2 Count200000000000000000 Time1545   <---
oldsafe Count200000000000000000 Time2012
safe    Count200000000000000000 Time1295
    
stabler results. then running 8 threads all-together,all of them using the proc above, accessing 4 counters,

atom64.exe
Code:
Counter2000000000001000 Time1294
Counter2000000000001000 Time1294
Counter2000000000001000 Time1294
Counter2000000000001000 Time1294

Counter2000000000001000 Time1279
Counter2000000000001000 Time1279
Counter2000000000001000 Time1279
Counter2000000000001000 Time1279

Counter2000000000001000 Time1279
Counter2000000000001000 Time1279
Counter2000000000001000 Time1279
Counter2000000000001000 Time1279    
very stable, and fast. to confirm this i managed to run the same tests on
my older P4 650 Prescott 90nm, single core

revolution_atom64.exe
Code:
unsafe  Count111203394649338880 Time406
unsafe2 Count200000000000000000 Time1454 <---
oldsafe Count200000000000000000 Time2265
safe    Count200000000000000000 Time1578

unsafe  Count109429369441373184 Time422
unsafe2 Count200000000000000000 Time1469  <---
oldsafe Count200000000000000000 Time2265
safe    Count200000000000000000 Time1578

unsafe  Count113975042205012992 Time407
unsafe2 Count200000000000000000 Time1578  <---
oldsafe Count200000000000000000 Time2250
safe    Count200000000000000000 Time1593

unsafe  Count111315688249406464 Time422
unsafe2 Count200000000000000000 Time1484  <---
oldsafe Count200000000000000000 Time2250
safe    Count200000000000000000 Time1594
    
and again 8 threads, accessing 4 counters

atom64.exe
Code:
Counter2000000000001000 Time593
Counter2000000000001000 Time593
Counter2000000000001000 Time593
Counter2000000000001000 Time593

Counter2000000000001000 Time312
Counter2000000000001000 Time312
Counter2000000000001000 Time312
Counter2000000000001000 Time312

Counter2000000000001000 Time704
Counter2000000000001000 Time704
Counter2000000000001000 Time704
Counter2000000000001000 Time704

Counter2000000000001000 Time453
Counter2000000000001000 Time453
Counter2000000000001000 Time453
Counter2000000000001000 Time453
    
and this is pretty satisfying, imo.
Cheers,
Very Happy

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 06 May 2013, 11:25
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16702
Location: In your JS exploiting you and your system
hopcode: Thanks for the update. But note that your optimisations are probably only sensible for this particular test code. In a real program I doubt that such things would be necessary, and might even be harmful to performance. Only proper testing would show which. Although it is good to have alternatives available for people to try,

But perhaps this thread is starting to go a little bit away from the original purpose of this topic? The only thing important is the accuracy. Having the timings optimised for a particular CPU/mobo combinations is not important or interesting unless there is an improvement that is workable on all CPUs, and is going to give at least double the performance. If not, than all this extra time messing about is probably wasted in the long term scheme of things.
Post 06 May 2013, 12:11
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
yeap, agree. from my side i can say those tests being merely a confirmation after some theoretical acquaintance with the CPU-specs.
i have seen sometimes professionals doing fast calculations on fingers without needing to write/test a single line of code;
and results showing not that large discrepancy from reality. that is but what i just like to learn, because there is a lot of
different new and newer hardware out there. having then several manuals and specs doesnt help, when all time is wasted for testing.
Very Happy

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 06 May 2013, 12:59
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2019, Tomasz Grysztar.

Powered by rwasa.