flat assembler
Message board for the users of flat assembler.
  
|  Index
      > High Level Languages > memory fence, lockless | 
| Author | 
 | 
| vivik 12 Mar 2018, 21:19 I'm reading this: Lockless Programming Considerations for Xbox 360 and Microsoft Windows.
 I only care about windows right now. When you specify MemoryBarrier(); , do you still need _ReadWriteBarrier(); or _ReadBarrier(); or _WriteBarrier(); ? I can't compile any of the latter 3, in what header those are? This is a really scary technique, this has a potential for bugs that appear once per 100000 runs, or bugs that appear only on some CPUs (the latter was made up by me, no basis for this). I need to get it right. | |||
|  12 Mar 2018, 21:19 | 
 | 
| donn 13 Mar 2018, 00:27 I'm curious what the answer to you question is also, just some thoughts of mine though:
 "These instructions also ensure that the compiler disables any optimizations that could reorder memory operations across the barriers." - source. MemoryBarrier prevents both CPU and compiler level reordering, so it is stronger than the others. This was also discussed here. I agree, it's scary stuff, especially since there are many multi-processing platforms in use today. I'm newer to asm-level CPU multithreading (more experienced with GPU-only atomics), so you may want a second or third opinion here. Reading about the SSE memfence, xchg, the xadd instruction, and so on at the asm level may clear stuff up too. If you narrow down the scope to what you're trying to achieve, it may be even safer. XBox reordering rules are different for example, from x86, and if you can manage the performance hit, locks are far safer. Also, distinguishing between atomic and composite operations may help, but the article you referenced shows the reordering is much worse on XBox: "Even though x86 and x64 CPUs do reorder instructions, they generally do not reorder write operations relative to other writes. " | |||
|  13 Mar 2018, 00:27 | 
 | 
| revolution 13 Mar 2018, 01:29 Moved to HLL forum. | |||
|  13 Mar 2018, 01:29 | 
 | 
| revolution 14 Mar 2018, 10:41 If you want to pause a thread then use the any of the Wait* functions. Or just Sleep. It is usually best if threads handle their own runtime. | |||
|  14 Mar 2018, 10:41 | 
 | 
| vivik 14 Mar 2018, 14:02 You mean "WaitForSingleObject" and such? | |||
|  14 Mar 2018, 14:02 | 
 | 
| revolution 14 Mar 2018, 14:15 Yes. | |||
|  14 Mar 2018, 14:15 | 
 | 
| vivik 14 Mar 2018, 15:20 Hm, SwitchToThread isn't just "Sleep", more on that here, answer by Maxim Masiutin
 https://stackoverflow.com/questions/1383943/switchtothread-vs-sleep1 I probably wouldn't use neither Sleep, nor SwitchToThread. What I was planning to do is called polling, and as those links said, it's like "checking your watch every minute to see if it's 3 o'clock yet instead of just setting an alarm". I guess it's useful if you are sure the second thread will run like 75% of the time anyway, and quick response is necessary. https://blogs.msdn.microsoft.com/oldnewthing/20090727-00/?p=17353 https://blogs.msdn.microsoft.com/oldnewthing/20060124-17/?p=32553 Nice overview of what I can possibly do on windows: >If you just want to make the thread wait for a fixed amount of time, use Sleep() for that. But if you want the wait to be interruptable by some external operation, you have to wait on something, whether that be a timer (see SetTimer() and Get/PeekMessage()), an event (see CreateEvent() and WaitForSingleObject()), a message in the thread's queue (see PostThreadMessage() and MsgWaitForMultipleObjects()), an I/O Completion Port callback (see PostQueuedCompletionStatus() and GetQueuedCompletionStatus()), etc. https://stackoverflow.com/questions/12489234/wait-for-a-specific-time-in-thread-use-waitforsingleobject So, it's either CreateEvent WaitForSingleObject, or PostThreadMessage MsgWaitForMultipleObjects. I'm not sure what they do, is it possible for each thread to have it's own message loop or something? By the way, there is some odd warning against WaitForSingleObject, something about deadlock if you use COM and multiple threads and something, I'm not sure if this applies to me. https://marc.durdin.net/2012/08/waitforsingleobject-why-you-should-never-use-it/ | |||
|  14 Mar 2018, 15:20 | 
 | 
| DimonSoft 14 Mar 2018, 20:54 vivik wrote: So, it's either CreateEvent WaitForSingleObject, or PostThreadMessage MsgWaitForMultipleObjects. I'm not sure what they do, is it possible for each thread to have it's own message loop or something? Yes, each thread gets its own message queue as soon as it calls one of a set of USER32 functions. In your case CreateEvent + (Re)SetEvent + WaitForMultipleObjects seems to be the best way, since I doubt you really want to put a message loop into the worker thread. Note also that you might need two events: one for controlling the work being done, the other for thread termination. vivik wrote: By the way, there is some odd warning against WaitForSingleObject, something about deadlock if you use COM and multiple threads and something, I'm not sure if this applies to me. The problems described there are related to threads which need to handle window/thread messages and thus must have a message loop. If your worker thread is for some work that doesn’t use COM, you’re on the safe side since you’re then the one who controls the thread. | |||
|  14 Mar 2018, 20:54 | 
 | 
| vivik 15 Mar 2018, 17:03 According to this http://www.bogotobogo.com/cplusplus/multithreading_win32A.php , there are 3 different ways to create a thread: CreateThread, _beginthread and _beginthreadex. The first one doesn't create thread-local storage for you, so this is the one I will use.
 Thread-local storage is, um, something like a global variable, but its value is different for each thread. It's often used by some memory allocators, and when there are many threads. I will use only 2 threads for now, so I'll live without it. I'll just use a local variable instead of this thread-local global. Also looks like windows xp doesn't support something about TLS, and I want my programs to work even there. | |||
|  15 Mar 2018, 17:03 | 
 | 
| vivik 15 Mar 2018, 17:34 There is [main thread], and there is [worker thread]
 Simple case is this: [main thread] --decompress this please--> [worker thread] [main thread] <--ok, done-- [worker thread] But most likely it will be something like this: [main thread] --decompress this please--> [worker thread] [main thread] --decompress this please--> [worker thread] [main thread] --decompress this please--> [worker thread] [main thread] <--ok, done-- [worker thread] [main thread] --decompress this please--> [worker thread] [main thread] <--ok, done-- [worker thread] [main thread] <--ok, done-- [worker thread] [main thread] <--ok, done-- [worker thread] [main thread] --decompress this please--> [worker thread] [main thread] <--ok, done-- [worker thread] I guess I will make the message exchange through the ring of, let's say, 0x20 dwords. First dword is "alive" flag, second dword is the meaningful data, and it goes in circle. The --decompress this please--> will look like this: Code: i=0 li=0 dword ring[0x20]={0} Code: ring[i+1]="C:/loadmepls.jpg" MemoryFence ring[i]=1 i+=2 The <--ok, done-- will look like this: Code: while true: if ring[i]==0: sleep or some other winapi function that will pause the thread else: path = ring[i+1] decompressed_data = decompress(path) ring[i+1] = decompressed_data MemoryFence ring[i]=0 i+=2 The waiting for "ok, done": Code: if ring[li]==0: decompressed_data = ring[li+1] load_to_gpu(decompressed_data) li+=2 I guess I will use CreateEvent WaitForSingleObject for --decompress this please--> (so that it doesn't waste cpu when there is no need for this), and that ring for <--ok, done-- Using the per thread message loop will simplify things and my life, but I'm not sure if it's the right thing to do. This ring looks more lightweight than the usual message loop. Also, if I malloc'd some memory in [main thread], I shouldn't free it in the [worker thread], and vice versa? That "C:/loadmepls.jpg" is being overwritten without getting freed. Should I use 2 or 3 rings instead, and add a --uploaded to gpu, now free memory--> message? The os provided heaps are probably thread safe, but I'm planning on replacing them with my own... With a custom memory allocator, more tuned for this program. Something like in golang, but more dumb. Eh, just make it work first, make it good afterwards. This looks like a good opportunity to learn. | |||
|  15 Mar 2018, 17:34 | 
 | 
| DimonSoft 15 Mar 2018, 17:46 vivik wrote: According to this http://www.bogotobogo.com/cplusplus/multithreading_win32A.php , there are 3 different ways to create a thread: CreateThread, _beginthread and _beginthreadex. The first one doesn't create thread-local storage for you, so this is the one I will use. It’s not about TLS. TLS is implemented as a separate set of functions in Windows API. It’s not CreateThread that doesn’t create a TLS, it’s _beginthread[ex] that (might) do. If memory serves me (I haven’t used C/C++ for ages), _beginthreadex is the way to go if you expect the CRT to work correctly on the thread. Since this includes memory management which is done automatically in some places in C++, you’ll almost always want to use _beginthreadex. Also a recent article by Raymond Chen about the cons of _beginthread: If I call GetExitCodeThread for a thread that I know for sure has exited, why does it still say STILL_ACTIVE? | |||
|  15 Mar 2018, 17:46 | 
 | 
| vivik 15 Mar 2018, 20:07 >If memory serves me (I haven’t used C/C++ for ages), _beginthreadex is the way to go if you expect the CRT to work correctly on the thread. Since this includes memory management which is done automatically in some places in C++, you’ll almost always want to use _beginthreadex. 
 Hm, yes, I will use CreateThread. I'm trying to get rid of crt anyway, I'm using C instead of C++ (kind of), and I'll use a custom memory allocator. Don't ask. | |||
|  15 Mar 2018, 20:07 | 
 | 
| revolution 15 Mar 2018, 22:44 vivik wrote: Also, if I malloc'd some memory in [main thread], I shouldn't free it in the [worker thread], and vice versa? Have a clear policy on memory ownership. You can send pointers to allocated memory to another thread and also transfer ownership to that thread. Or if you expect to get results returned to you in the memory then you can pass a pointer to it and retain ownership. The last owner of each memory allocation is responsible to either free the memory or transfer ownership to another thread. | |||
|  15 Mar 2018, 22:44 | 
 | 
| vivik 16 Mar 2018, 16:12 Found more info on it, it's the simpliest case of lock-free.
 https://en.wikipedia.org/wiki/Non-blocking_algorithm Quote: a single-reader single-writer ring buffer FIFO, with a size which evenly divides the overflow of one of the available unsigned integer types, can unconditionally be implemented safely using only a memory barrier https://en.wikipedia.org/wiki/Producer%E2%80%93consumer_problem#Without_semaphores_or_monitors One thing I don't get yet, should I waste an entire cache line (64 bytes, probably) for each counter? This is to avoid "false sharing", which will probably happen no matter what. Some quotes: https://stackoverflow.com/questions/16699247/what-is-cache-friendly-code Quote: false sharing https://stackoverflow.com/questions/14707803/line-size-of-l1-and-l2-caches Quote: Cache-Lines size is (typically) 64 bytes. Um, for now I will place 6 such counters into the same 64bytes cache line, just in case. | |||
|  16 Mar 2018, 16:12 | 
 | 
| < Last Thread | Next Thread > | 
| Forum Rules: 
 | 
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.