flat assembler
Message board for the users of flat assembler.
Index
> Main > 64bit Thread Safe Array Stack |
Author |
|
Madis731 21 Sep 2006, 13:05
I find it hard to use. Don't you need a database in each thread to mark all the owned stack parts. If you push or pop, you will never know what memory space was it and what has been changed by other theads in the mean time. Maybe if all the threads agree to play on constant number of stack slots.
|
|||
21 Sep 2006, 13:05 |
|
r22 21 Sep 2006, 16:03
This is a user defined stack and ADT (abstract data type). By using LOCK XADD many threads can access it without having concurrency issues.
|
|||
21 Sep 2006, 16:03 |
|
Madis731 21 Sep 2006, 18:04
Yeah, I understand how it works, but consider:
1) Thread ONE pushes "13" on the stack 2) Thread TWO pushes "4" and "71" on the stack 3) Thread THREE pushes "5" on the stack 4) Thread TWO pops ?? "5" and "71" from the stack 5) Thread ONE pops "4" from the stack 6) Thread THREE pops "13" from the stack Isn't it a bit confusing - this way you will never know where valuable data is. If you agree on ALL threads that stack has only one position, then you can make a IPC-behaviour, but what else can it be used for? You said something about testing - would you tell me how that test program works, maybe I get some ideas from that!? |
|||
21 Sep 2006, 18:04 |
|
okasvi 21 Sep 2006, 18:56
I think it is meant for several threads pushing and one thread pop'ing?
|
|||
21 Sep 2006, 18:56 |
|
LocoDelAssembly 21 Sep 2006, 19:01
I think it can be using to process requests in LIFO order. For example 1 thread (or more) pushing jobs to do and many other threads poping jobs to work on. It's just an example but I think that shared stack can have many useful uses (and the same for shared queues).
|
|||
21 Sep 2006, 19:01 |
|
r22 21 Sep 2006, 19:05
Good point, about usage.
I guess the only proper use for this setup would be as a sort of time irrelavent queue. I'm using it as a storage for Worker Threads to get jobs off of. Since I just need ALL the jobs to get processed concurrently (in ANY order) it suits my purposes. So technically the PUSH operation (for my needs) doesn't even need to be thread safe as it'll be filled up before the worker threads are initialized. So, it's a 'Concurrent Single Use Work-Request List'. Specifically I'm pushing structures with a file handle and encryption key and the worker threads are popping off the structures and encrypting the files. The reason I created it was because the version I had that used a Mutex seemed to fail with WinXP64. The WaitForSingleObject api caused an exception after the CreateMutex api, so rather then waste time with it a used LOCK and XADD. I thought it was an interesting snippet of code (since you don't see LOCK or XADD used much), but it's ability to be useful in a range of activities is definetly suspect. |
|||
21 Sep 2006, 19:05 |
|
f0dder 21 Sep 2006, 21:51
Seems like a pretty nice idea for work queues - a bus lock certainly beats WaitFor* and similar
|
|||
21 Sep 2006, 21:51 |
|
UCM 21 Sep 2006, 22:37
Why is it always "lock xadd", and nothing else?
|
|||
21 Sep 2006, 22:37 |
|
r22 22 Sep 2006, 00:12
UCM, XADD is the only lockable opcode that lets you acquire a value in memory and update that value in memory in the same instruction. Well other than a CMPXCHG (but you have to use a branch loop to make sure it worked correctly).
In high level code (java) it would look like. //lock xadd mem64, reg64 synchronize{ temp = reg64; reg64 = mem64; mem64 += temp; } It would be great if someone could come up with a way of making a Queue rather then a stack (FILO as opposed to FIFO) without using OS locking mechanisms It wouldn't be too much of a headache to implement using CMPXCHG but the looping required makes me think that it might be TOO cpu intensive and would probably be better off using a Semaphore or Mutex. If anyone knows how spin locks work with events and such feel free to enlighten me. |
|||
22 Sep 2006, 00:12 |
|
f0dder 22 Sep 2006, 07:44
r22: whether it's too CPU intensive depends on how long time will be spent in the loop. Critical Sections on windows actually tries to "spinlock" for a bit, before entering a wait-state... simply because a short spinlock can be faster than a full r3->r0->r3 + waiting.
|
|||
22 Sep 2006, 07:44 |
|
vid 23 Sep 2006, 08:05
r22: nice, could you please make some template with lock() / release() functions so we can see locking part separated?
|
|||
23 Sep 2006, 08:05 |
|
r22 23 Sep 2006, 19:56
Not quite sure what you mean vid.
;;;LOCK XADD LOCK{ XCHG qword[stackPtr],rax ADD qword[stackPtr],rax }RELEASE{} is the only part of the functions that are synchronized. The stack implemented at the top of this thread is a poor example for such things. It was coded to suit my purpose for only needing Concurrent POPping of data off of it. If you want to riddle: What case(s) would cause the above stack to lose a peice of data... ...Answer Thread1: PUSHes DATA1 Thread2: POPes but loses context before it acquires DATA1 but after it's LOCKed XADD decrements the stackPtr Thread1: PUSHes DATA2 Thread2: Returns with DATA2 and DATA1 is lost The 64bit Concurrent Circular Queue is a better template/example of OS independent x86 locking because of it's NULL checks and no context related data integrity issues. |
|||
23 Sep 2006, 19:56 |
|
LocoDelAssembly 26 Dec 2007, 02:41
What if the PUSHing thread is preempted just after "lock xadd" and then another thread fully executes POP from the beginning to the end? Wouldn't that thread get garbage?
|
|||
26 Dec 2007, 02:41 |
|
r22 26 Dec 2007, 04:26
Your right, the stack had issues, which was why I made the Ring Buffered Queue. I think the queue is much more stable.
http://board.flatassembler.net/topic.php?t=5887 |
|||
26 Dec 2007, 04:26 |
|
bitRAKE 26 Dec 2007, 06:22
I like the bit instructions for this kind of stuff.
Code: ; store RDX address in empty slot bts rdx,0 .x: add rax,8 lock bts [rax],0 jc .x mov [rax],rdx ; careful, RAX could run past end of buffer (Doesn't the partial implementation of 64-bit address space by the CPU allow some upper bits to be used as well? Not a good idea, really.) Code: mov ecx,buff_items .0: lock btr [eax+ecx*4-4],0 dec ecx ja .0 jnc .error mov [eax+ecx*4],edx retn |
|||
26 Dec 2007, 06:22 |
|
f0dder 26 Dec 2007, 14:16
bitRAKE wrote: (Doesn't the partial implementation of 64-bit address space by the CPU allow some upper bits to be used as well? Not a good idea, really.) Indeed not a good idea; it would work "for now", but while we're not going to see 64 bits of physical memory anytime soon, does the "partial implementation" affect virtual addresses as well, or only physical memory? _________________ - carpe noctem |
|||
26 Dec 2007, 14:16 |
|
bitRAKE 26 Dec 2007, 16:54
f0dder, I was thinking in light of masking the value prior to use as would need to be done if lower bits are used. Insuring the upper X bits are zero for the virtual address space used shouldn't be too difficult in most situations.
|
|||
26 Dec 2007, 16:54 |
|
f0dder 27 Dec 2007, 02:07
bitRAKE wrote: f0dder, I was thinking in light of masking the value prior to use as would need to be done if lower bits are used. Insuring the upper X bits are zero for the virtual address space used shouldn't be too difficult in most situations. Yes, of course you'd mask , but what if windows suddenly decided it wants to allocate very high virtual addresses? b00m, you're dead. Iirc there's a tool for having windows allocate your stuff very high, at least for drivers... and a lot of other interesting things, like randomly failing memory requests etc., to test your code robustness. Handling alignment properly and using (AND masking ) lower bits is okay, but please don't make too many assumptions about virtual memory space... that apps need a special PE header flag set to take advantage of 3/1 user/kernel mapping on 32bit windows shows why. _________________ - carpe noctem |
|||
27 Dec 2007, 02:07 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.