flat assembler
Message board for the users of flat assembler.
![]() Goto page 1, 2, 3 Next |
Author |
|
bitRAKE 07 Mar 2013, 14:53
Code: proc unsafe_thread count mov eax,1 .loop: lock add dword [counter+0],eax jnc @F add dword [counter+4],eax @@: dec [count] jnz .loop ret endp Doesn't mean anything though. Last edited by bitRAKE on 07 Mar 2013, 15:05; edited 1 time in total |
|||
![]() |
|
revolution 07 Mar 2013, 15:02
bitRAKE wrote:
|
|||
![]() |
|
revolution 07 Mar 2013, 15:04
using xadd fails:
Code: Count:1982888850292736 Time:78 Count:2000000000000000 Time:187 Edit: Test code fragment: Code: ADDER_VALUE = 1000000000 proc safe_thread count mov esi,ADDER_VALUE and 0xffffffff mov edi,ADDER_VALUE shr 32 .loop: atomic_64_bit_add [counter] dec [count] jnz .loop ret endp proc unsafe_thread_bitRAKE count .loop: mov eax,ADDER_VALUE lock xadd dword[counter+0],eax jnc .skip add dword[counter+4],1 .skip: dec [count] jnz .loop ret endp Last edited by revolution on 07 Mar 2013, 15:09; edited 1 time in total |
|||
![]() |
|
bitRAKE 07 Mar 2013, 15:07
No, I only post INC thread.
![]() (You change it to make it fail.) |
|||
![]() |
|
revolution 07 Mar 2013, 15:17
bitRAKE wrote: (You change it to make it fail.) |
|||
![]() |
|
bitRAKE 07 Mar 2013, 15:20
It's impossible to design against any failures. Each problem should be assessed for the level of tolerance desired in that situation. We have not designed a dependent thread which gets held up, and we lack the processing power to overflow a 32-bit number in sufficient time. (with INC)
|
|||
![]() |
|
revolution 07 Mar 2013, 15:28
I'm designing against program failures. The code is functionally correct. We can't ask for more than that from code. If the surrounding CPU system has hardware problems then no amount of code can hope to correct for such things. Intel/AMD guarantee the functional correctness of "lock cmpxchg8b" by design. But they can't vouch for someone's flaky PSU altering the memory read values.
|
|||
![]() |
|
bitRAKE 07 Mar 2013, 15:49
The lock cmpxchg8b method is considerably faster than using xchg:
Code: _counter dd counter proc safe_thread count .loop: xor ecx,ecx @@: xchg ecx,[_counter] jecxz @B add dword [ecx+0],1 adc dword [ecx+4],0 mov [_counter],ecx dec [count] jnz .loop ret endp ; cmpx: 4302467295 Time:842 ; xchg: 4302467295 Time:2090 |
|||
![]() |
|
revolution 08 Mar 2013, 00:18
bitRAKE wrote: The lock cmpxchg8b method is considerably faster than using xchg: |
|||
![]() |
|
randall 31 Mar 2013, 02:58
Very fast and simple atomic increment:
Code: mov eax, 1 lock xadd [mem], eax Adds one to [mem] and returns previous value stored in [mem]. |
|||
![]() |
|
revolution 31 Mar 2013, 03:03
randall wrote: Very fast and simple atomic increment: |
|||
![]() |
|
bitRAKE 26 Apr 2013, 12:55
This lockless method is consistently faster on my system. Not as complex as it looks - bit lock on pieces and partitioned the counter into 16 pieces. Each thread operates on the opposite bit state. This is done so that each thread isn't just counting by itself - that would be too easy. So, there is actual forced contention for resources.
Edit: I should also say that it's guaranteed to work because of memory ordering rules on the bus.
|
|||||||||||
![]() |
|
revolution 26 Apr 2013, 13:16
bitRAKE wrote: This lockless method is consistently faster on my system. Not as complex as it looks - bit lock on pieces and partitioned the counter into 16 pieces. Each thread operates on the opposite bit state. This is done so that each thread isn't just counting by itself - that would be too easy. So, there is actual forced contention for resources. Aside: I usually find that unless a new method is provably correct (or at least equal to the previous method it replaces) then the small increase in speed we might temporarily get is easily more than outweighed by the enormous loss of productivity we encounter when the new method produces incorrect results. |
|||
![]() |
|
bitRAKE 26 Apr 2013, 13:22
Sorry, I made an edit. It is guaranteed to work. Vol3, Ch8 in the manual explains the memory ordering rules. It's lockless in that a LOCK# signal is not sent across the bus. Which can have a huge impact in performance. It's not a small gain - it can be over twice faster.
|
|||
![]() |
|
revolution 26 Apr 2013, 13:24
Oh, I just saw this.
bitRAKE wrote: Edit: I should also say that it's guaranteed to work because of memory ordering rules on the bus. |
|||
![]() |
|
revolution 26 Apr 2013, 13:27
bitRAKE wrote: Sorry, I made an edit. It is guaranteed to work. Vol3, Ch8 in the manual explains the memory ordering rules. It's lockless in that a LOCK# signal is not sent across the bus. Which can have a huge impact in performance. It's not a small gain - it can be over twice faster. I don't care about the performance gain if it can only be guaranteed to work on a subset of systems. |
|||
![]() |
|
bitRAKE 26 Apr 2013, 13:45
Unless Intel is lying in the manual then it should work in general (Pentium+). I've been using Dual socket mobos since my overclocked BP6.
|
|||
![]() |
|
revolution 26 Apr 2013, 14:14
bitRAKE wrote: Unless Intel is lying in the manual then it should work in general (Pentium+). I've been using Dual socket mobos since my overclocked BP6. Anyhow, I also think you misread the manual with regard to bus ordering. I can't find anywhere that is stated that separated reads and writes are guaranteed not to be interleaved between threads. Also, the split 32-bit reads and writes you use have no guarantee that another thread or CPU won't take the bus in between operations. Aligning the data only guarantees the basic operation is atomic. There is no guarantee that multiple operations of smaller size are implicitly locked between accesses. The fact that you couldn't fault it with your test does not mean that it is working the way that you think it is. |
|||
![]() |
|
bitRAKE 26 Apr 2013, 15:07
Using the notation from the manual with a slight modification:
Code: READ A.1.1 READ A.1.0 READ B.1.0 READ C.1.0 WRITE B.1.0 WRITE C.1.0 READ A.1.0 WRITE A.1.0 READ A.2.0 READ A.2.1 READ B.2.1 READ C.2.1 WRITE B.2.1 WRITE C.2.1 READ A.2.1 WRITE A.2.1 Last edited by bitRAKE on 26 Apr 2013, 15:21; edited 1 time in total |
|||
![]() |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.