flat assembler
Message board for the users of flat assembler.
Index
> Main > 64bit RingBuffered Array QUEUE !!!! Goto page Previous 1, 2, 3 |
Author |
|
r22 16 Oct 2008, 01:36
CMPXCHG is 3 clocks for (mreg, reg) and 5 for (mem, reg). I don't know if the LOCK adds more latency even if there's no contention.
|
|||
16 Oct 2008, 01:36 |
|
LocoDelAssembly 16 Oct 2008, 02:04
XCHG takes 16 cycles according to AMD optimization manual (Issue Date: September 2005). I guess that the implicit lock prefix adds the extra latency but still, is CMPXCHG in fact slower than XCHG on a failed attempt to set?
|
|||
16 Oct 2008, 02:04 |
|
bitRAKE 02 Dec 2008, 00:56
What Every Programer Should Know About Memory is a very good modern overview. Even covers the problem of using CAS for every atomic operation on x86(-64) processors - CMPXCHG is hardly the answer with the flexiblity availble.
|
|||
02 Dec 2008, 00:56 |
|
Azu 03 Nov 2009, 22:22
LocoDelAssembly wrote: Sorry, there was an error in a comment. |
|||
03 Nov 2009, 22:22 |
|
LocoDelAssembly 03 Nov 2009, 23:08
When using the OS provided synchronization features some actions can be taken, but in this case the OS should be smart enough to know what the program is doing. We humans know the program is wasting a lot of CPU time because it is waiting for a low priority thread to finish executing a single instruction, but from the OS perspective it can't know much further than "this process consumes a lot of time doing an unknown (but yet user requested) task".
LocoDelAssembly wrote: I'm planning to port the pseudo-codes of the paper into fasm too, so we can study the pros and cons of the three versions and conduct some benchmarks (probably made by someone else ). |
|||
03 Nov 2009, 23:08 |
|
Azu 03 Nov 2009, 23:18
LocoDelAssembly wrote: When using the OS provided synchronization features some actions can be taken, but in this case the OS should be smart enough to know what the program is doing. We humans know the program is wasting a lot of CPU time because it is waiting for a low priority thread to finish executing a single instruction, but from the OS perspective it can't know much further than "this process consumes a lot of time doing an unknown (but yet user requested) task". Then, if it is the same, take a little look at the code to see if it's a spinlock, and if it is, let other threads have priority until the data it's spinlocking on is changed, then give it back priority. What is hard about this? |
|||
03 Nov 2009, 23:18 |
|
rugxulo 04 Nov 2009, 17:56
Spinlocks? Isn't this what SSE2's PAUSE (aka, REP NOP) is for?
|
|||
04 Nov 2009, 17:56 |
|
Azu 04 Nov 2009, 18:42
Code: @@: pause cmp [$],12345678 jne @b |
|||
04 Nov 2009, 18:42 |
|
comrade 05 Nov 2009, 00:14
Azu wrote: Every few million clock cycles or so, couldn't it just check EIP and see if it's changed or not? That should only add like 20 cycles overhead at the most, every million cycles or so. 1. There is no instruction to "check the EIP" of another thread. Typically the thread has to be interrupted to retrieve its context information. That is way more than 20 cycles of overhead. Orthogonal to all of this, is a question of sample rate. One would have to be sampling at Nyquist rate to accurately infer whether the EIP is changing or not. This would require one thread to run at double the frequency of another - which is impractical in how SMPs are designed. 2. How do you propose the OS "takes a look at the code", as well as then infer whether "the data it's spinlocking on is changed" ? Spinlocks are backed my memory, and when they are being spun on, the code typically makes no accesses to the data that the spinlock is protecting. The idea of what you have is right, and is implemented in OSes in higher-level synchronization constructs, such as events, condition variables, semaphores, and critical sections. For example, the Windows critical section has a built-in spinlock for short critical regions, which after a few failed acquisition attempts puts the thread into a wait mode, freeing the CPU to run any other ready thread. Spinlocks are a very low-level synchronization primitive, typically used by the OS kernel itself to synchronize in areas where higher-level primitives cannot be used - for example, to synchronize the thread scheduler itself! Besides being used in these situations, spinlocks are also useful to protect very short critical regions - for example, queued lists In these cases, its cheaper to spin on a variable then it is to a put a thread to sleep. Like I mentioned before, the Windows critical section is aware of that phenomena and that is why it has a built-in spinlock as the first form of synchronization. You should pick-up a college book on Operating Systems, even something like Windows Internals would be very useful. |
|||
05 Nov 2009, 00:14 |
|
Azu 05 Nov 2009, 00:32
comrade wrote:
How does the scheduler pass control between threads if it doesn't know their EIP? I don't understand. If it can't, then how the hell do threads work????? comrade wrote: Typically the thread has to be interrupted to retrieve its context information. That is way more than 20 cycles of overhead. It interrupts them anyways, to see if there's a higher priority thread waiting to run. Why would it be expensive to check how much the EIP has changed during these breaks, when the thread is already interrupted? comrade wrote: Orthogonal to all of this, is a question of sample rate. One would have to be sampling at Nyquist rate to accurately infer whether the EIP is changing or not. This would require one thread to run at double the frequency of another - which is impractical in how SMPs are designed. comrade wrote: 2. How do you propose the OS "takes a look at the code", as well as then infer whether "the data it's spinlocking on is changed" ? |
|||
05 Nov 2009, 00:32 |
|
Goto page Previous 1, 2, 3 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.