flat assembler
Message board for the users of flat assembler.
Index
> Main > Reading array on multi-core/-processor system Goto page 1, 2 Next |
Author |
|
b1528932 25 Oct 2010, 21:06
itemsPool + (ecx - 1)*8 + 4
its not correct SIB syntax? regarding your question - yes. |
|||
25 Oct 2010, 21:06 |
|
ouadji 25 Oct 2010, 22:19
only a reading (or compare) causes no problem with multi-tasking. Problems arise when one writes and changes data in memory. cmpxch8b seems totally unnecessary here. and "lock" in addition ? my god ! useless here ! |
|||
25 Oct 2010, 22:19 |
|
LocoDelAssembly 25 Oct 2010, 22:52
Perhaps I've should have explained what I was fearing of
My concern is, that considering we have processors that are capable of executing instructions out of order, and that there is no dependence between the MOV and CMP, perhaps the CPU could execute the CMP first at times, then, another core initiates the LOCK CMPXCHG8b, and then the MOV is executed, if this can happen, then I'm screwed. But perhaps the CPUs arbitrate this somehow, to avoid out of order execution issues? Another way I though of was doing this instead: Code: findItem: ; EAX = ItemID ; Returns ItemEntry in EDX:EAX mov ecx, ITEMS_POOL_SIZE .search: cmp eax, dword [itemsPool + (ecx - 1)*8 + 0] je .found loop .search .notFound: mov eax, -1 ret .found: mov edx, dword [itemsPool + (ecx - 1)*8 + 4] mov ecx, dword [itemsPool + (ecx - 1)*8 + 0] ; I hope no processor in the foreseeable future will optimize out the two instructions below xor ecx, edx xor ecx, edx cmp eax, ecx jne .notFound ret b1528932, your right, but fasm takes care of converting it to [ecx*8 + (itemsPool - 4)] (the value inside the parentheses is resolved at compile-time) which it is what the processor can handle. |
|||
25 Oct 2010, 22:52 |
|
ouadji 25 Oct 2010, 23:22
Quote: perhaps the CPU could execute the CMP first at times, there is no risk of conflicts of sequences between cores. If this was not the case, no software would work! that said, you talk about " CMPXCHG8b", but I see no " CMPXCHG8b" in your code. |
|||
25 Oct 2010, 23:22 |
|
LocoDelAssembly 25 Oct 2010, 23:47
Code: addItem: push ebx edi mov edi, 8 lock xadd [poolPointer], edi and edi, 8*ITEMS_POOL_SIZE - 1 mov eax, dword [itemsPool + edi + 0] mov edx, dword [itemsPool + edi + 4] mov ebx, someId ; No two items in the pool have the same ID with the exception of -1 (invalid ID) mov ecx, someData ; Uniqueness not guaranteed. lock cmpxchg8b qword [itemsPool + edi] pop edi ebx ret After reading the following in AMD manuals: Quote: Locked Instructions—Before completing a locked instruction Then, perhaps this is safe (but maybe overkill): Code: findItem: ; EAX = ItemID ; Returns ItemEntry in EDX:EAX mov ecx, ITEMS_POOL_SIZE .search: cmp eax, dword [itemsPool + (ecx - 1)*8 + 0] je .found loop .search .notFound: mov eax, -1 ret .found: mov edx, dword [itemsPool + (ecx - 1)*8 + 4] lock cmpxchg dword [itemsPool + (ecx - 1)*8 + 0], eax jnz .notFound ret |
|||
25 Oct 2010, 23:47 |
|
revolution 25 Oct 2010, 23:51
You have no choice here, you have to lock the memory that you are reading. The lock must be either guaranteed atomic, or guaranteed to exclude all writers while you read.
Why are you not locking? AFIAK no one has discovered a magic code sequence that can avoid using locks. And the lock is almost cost free anyway in most circumstances. If speed is an issue and you found that avoiding the small lock penalty gave you a massive speed boost then it is time to redesign the whole app data flow. |
|||
25 Oct 2010, 23:51 |
|
LocoDelAssembly 26 Oct 2010, 00:29
AFAIK, all processors do perform atomic read/write without the need of locking, LOCK is required (and valid only when used) on read-modify-write instructions. The problem here is that I have two reads, and I must make sure that when the item-id comparison is about to be made, the MOV EDX, ... was already executed and won't hold the value of a different item (with different ID). If the CPU were following program order strictly, this wouldn't be an issue, but seems there are no guarantees of order preservation.
Looks that my latest code is the way to go (besides LFENCE maybe, but I don't want SSE2 dependence). Thanks to all for following this thread |
|||
26 Oct 2010, 00:29 |
|
revolution 26 Oct 2010, 00:43
LOCK will not give any guarantee of order. And LOCK also does not guarantee order among multiple CPUs. LOCK only provides read-modify-write guarantees, it won't give you ordering guarantees.
Your problem is to read two consecutive 32-bit words and make sure that nothing else modifies any one of those values while you are reading. Is that correct? If so then you have no choice but to protect both those reads somehow to ensure that no other CPU will get in there and make a change. So that means you either have to find an atomic 64-bit instruction (like cmpxchg8b) or use a higher level solution like a mutex or critical section. One thing I can be sure of is that using a lock on a single dword will not solve your problem. Note also that there are specific domain solutions that CAN read 64-bit values without a lock but they do not apply to generic situations. Reading a guaranteed monotonically increasing 64-bit counter can be done without a lock, but it is a very specific situation that relies upon the property of the counter. |
|||
26 Oct 2010, 00:43 |
|
LocoDelAssembly 26 Oct 2010, 05:27
Quote:
Partially, writes are always with LOCK CMPXCHG8B, but when reading, since I don't have a 64-bit load instruction, I read the higher half first, and then check if the lower half has the value I was looking for. If the processor performs these operations in program order, it should work, but this is not guaranteed. Therefore, because of the quote I posted above says and which also Intel seems to agree, it looks that my latest code should be working. Do you see a case in which I could still have EDX loaded with higher half of a different item? (and EAX not set to -1 on return, i.e. situation undetected by the code) PS: Besides the uniqueness property I've told about the lower part, I forgot to mention that no entry will be overwritten with an item having the same id (lower half). |
|||
26 Oct 2010, 05:27 |
|
revolution 26 Oct 2010, 05:44
Can you make your code MMX dependant?
Code: movq mm0,[location] |
|||
26 Oct 2010, 05:44 |
|
b1528932 26 Oct 2010, 11:04
Quote: If the processor performs these operations in program order, it should work I dont think so. all access must be atomic. You read high half, then core 2 change it, and you read low half. You have 2 parts of not the same data. use a spinlock to protect it, or use lock cmpxchg8b. There is no other way. |
|||
26 Oct 2010, 11:04 |
|
LocoDelAssembly 26 Oct 2010, 16:06
revolution, no.
b1528932, in the case of happening what you describe, my code would detect the problem because the lower half would be no longer equal to EAX (writes never store the same ID the items pool entry had previously, and writes are 64-bit atomic). All I need to make sure this will work is to ensure that the higher half is read first and the lower half last or else have both halves read at the same time (but without MMX, SSE or FPU). lock cmpxchg8b would ensure 64-bit atomic read, but it will actually write to memory as well and requires many registers too... If this is really needed then I will have to go for it, but isn't the "lock cmpxchg dword [... + 0], eax" (I wish I had a read-only way) ensuring that the "mov edx, dword [... + 4] " above completes (or at least the data is fetched) first? To both, the implementation must be wait-free. |
|||
26 Oct 2010, 16:06 |
|
revolution 26 Oct 2010, 16:16
LocoDelAssembly wrote: (writes never store the same ID the items pool entry had previously, and writes are 64-bit atomic) Code: do_it_again: mov eax,[lower] mov edx,[upper] mov ecx,[lower] cmp ecx,eax jnz .do_it_again ; do your comparing below here |
|||
26 Oct 2010, 16:16 |
|
LocoDelAssembly 26 Oct 2010, 16:27
Why the processor wouldn't reorder that like the code below?
Code: do_it_again: mov eax,[lower] mov ecx,[lower] mov edx,[upper] ; LOCK CMPXCHG8B from another code happening here cmp ecx,eax jnz .do_it_again |
|||
26 Oct 2010, 16:27 |
|
revolution 26 Oct 2010, 16:33
I forgot to show the lock, sorry.
Code: do_it_again: mov eax,[lower] lock mov edx,[upper] mov ecx,[lower] cmp ecx,eax jnz .do_it_again ; do your comparing below here |
|||
26 Oct 2010, 16:33 |
|
revolution 26 Oct 2010, 16:40
Oh, I just realised why there was no lock in the code I copied: it is from a lock free solution. If you can modify the writer code ...
Code: writer: ;... ;edx:eax has a new value to store mov [lower_copy],eax ;put lock or some other synchro code here mov [upper],edx mov [lower],eax ;... ;... reader: ;... .read_it: mov eax,[lower] mov edx,[upper] mov ecx,[lower_copy] cmp ecx,eax jnz .read_it |
|||
26 Oct 2010, 16:40 |
|
LocoDelAssembly 26 Oct 2010, 17:13
revolution, you gave me an extremely happiness moment with your code above but some time after I realized it might not work. I think your code assumes single writer, multiple readers.
|
|||
26 Oct 2010, 17:13 |
|
revolution 26 Oct 2010, 17:48
Hmm, you are right. I didn't realise you have multiple writers.
My brain has been so addled this last month, so many mistakes and forgotten things. I gotta get back to my home base, but still got two more weeks here till done ... gotta keep working revolution ... don't stop now. |
|||
26 Oct 2010, 17:48 |
|
bitRAKE 28 Oct 2010, 07:03
LOCKs occurring within a cacheline shared by multiple cores must quiet the bus (very slow). LOCKs on isolated cachelines use bus in normal fashion.
So, don't worry about the LOCK - just make ITEMS_POOL_SIZE large enough for all readers/writers to move around on unique cachelines! (And of course, align all accesses.) I like to make all writers also readers to minimize cache thrashing across the bus during LOCK. |
|||
28 Oct 2010, 07:03 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.