flat assembler
Message board for the users of flat assembler.

Index > Main > 64bit RingBuffered Array QUEUE !!!!

Goto page Previous  1, 2, 3  Next
Author
Thread Post new topic Reply to topic
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 10 Oct 2008, 01:48
bitRAKE, that one seems to be the concurrent queue from the article I posted above implemented in x86, cool Very Happy

Thanks again bitRAKE!!
Post 10 Oct 2008, 01:48
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4173
Location: vpcmpistri
bitRAKE 10 Oct 2008, 03:29
The discussion is very interesting - guess much research was done at IBM on lock-free algorithms for big iron. Atleast now I can classify my own algorithm as locking non-blocking, lol. Which isn't a problem because my threads (normally) exit gracefully (leaving no locks hung) and the granularity is very fine (bit locks). It's frowned apon because it isn't fault tolerant - which wait-free algorithms are.

Multi-threaded stuff really has anyone's head spinning around. There are situations where a qword store can be broken into two operations with other memory operations happening between. No doubt lots of testing is needed. Your error conditions above have provided much food for thought. You're welcome.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 10 Oct 2008, 03:29
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 10 Oct 2008, 20:29
It took two years, but this thread is turning out well.
(First Post = 23 Sep 2006, 00:20)

When you need to consider concurrent Removes and Adds at the same time it certainly heightens the difficulty. Of course using a mutex is the generally anointed solution and it seems to work well. But because this is an ASM forum some of us try to search for the fastest/most optimized solution.

But when you work towards optimization you always end up at the 'usage cases'. If you know how something will be used you can improve it's efficiency. So will a mutex be faster in some cases (or all cases) or will a lock free implementation be superior.
Post 10 Oct 2008, 20:29
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20524
Location: In your JS exploiting you and your system
revolution 10 Oct 2008, 21:36
I prefer to use semaphores when I implement queues.
Post 10 Oct 2008, 21:36
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 10 Oct 2008, 22:10
Quote:

I prefer to use semaphores when I implement queues.

Why not mutexes or critical sections?
Post 10 Oct 2008, 22:10
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20524
Location: In your JS exploiting you and your system
revolution 10 Oct 2008, 22:19
Semaphores have a built in the counter so that after the wait returns the code uses a critical section to lock the queue and then insert the data into the queue. I found the mutex to require an extra step once we get the mutex we have to check to see of the queue has a free slot before inserting the data.
Post 10 Oct 2008, 22:19
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 10 Oct 2008, 22:46
Ah, it wasn't a semaphores-only solution then. The extra step is sometimes prefered anyway revolution, because if the semaphore reaches zero, the next enqueuer will get blocked waiting an indefinite time, something not desirable as a general solution. If there is too much work enqueued perhaps it is time to report the problem to user rather than get hung till someone else dequeue something Razz

It is a matter of problem domain I suppose, but in average I think that it is preferable to get notified of a full queue situation instead of get blocked waiting for space. The handling of this situation could be either reporting the error in whatever form or reallocating the queue to a bigger size, for example.
Post 10 Oct 2008, 22:46
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20524
Location: In your JS exploiting you and your system
revolution 10 Oct 2008, 22:59
The wait functions have an optional timeout, so you can simply wait for 0ms and get an instant success/failure.
Post 10 Oct 2008, 22:59
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 10 Oct 2008, 23:12
Is that the way you use it? Before using a semaphore that way isn't more appropriate to do the count-checking yourself? (where appropriate means "more efficient in both CPU time and system resources usage").
Post 10 Oct 2008, 23:12
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20524
Location: In your JS exploiting you and your system
revolution 10 Oct 2008, 23:15
I doubt you can make it more efficient. After getting a mutex you have to check the queue status and subsequently release the mutex if the queue is full and either wait again or report a failure. A semaphore is just one call and, if you wait more than 0ms, the OS will automatically put the thread in an efficient wait state until there is an available slot.
Post 10 Oct 2008, 23:15
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 10 Oct 2008, 23:31
Quote:

I doubt you can make it more efficient.

We are talking about the 0 ms of timeout case.

It is not wrong the semaphore if it is suitable in your domain anyway (in case that saying that it is not a general purpose solution was considered as "it can't be applied to nothing" Razz)
Post 10 Oct 2008, 23:31
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20524
Location: In your JS exploiting you and your system
revolution 10 Oct 2008, 23:39
LocoDelAssembly wrote:
It is not wrong the semaphore if it is suitable in your domain anyway (in case that saying that it is not a general purpose solution was considered as "it can't be applied to nothing" Razz)
I don't get your meaning there Question
Post 10 Oct 2008, 23:39
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 10 Oct 2008, 23:51
Just that I'm not saying it is wrong passing through a semaphore first, BUT that it is not always a good idea to use it if the thread could use its time doing something else than just waiting.
Post 10 Oct 2008, 23:51
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4173
Location: vpcmpistri
bitRAKE 11 Oct 2008, 00:51
Keeping threads busy is easier if the tasks can be completed in any order:
Code:
GetTask:
        mov edx,[tasks]
        mov ecx,[task_count] ; task_count > 0
        xor eax,eax
.scan:  xchg [edx+ecx*4-4],eax
        test eax,eax
        jne .okay
        dec ecx
        jne .scan
        ; fail with EAX/ECX zero, ZF=1
.okay:  retn    
...a non-zero pointer represents a task. Threads can do something else if no tasks are availible. Downside being the O(n) of scanning an (n) task array, but LOCK XADD can be used to update the count. As r22 said it comes down to use cases. For example, dependant tasks could be linked together and only the head pointer put in the task list - using a data structure to guide the creation & execution of tasks. Guess I've avoided the complexity of the general solution thus far in my work.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup


Last edited by bitRAKE on 12 Oct 2008, 03:28; edited 1 time in total
Post 11 Oct 2008, 00:51
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 12 Oct 2008, 02:43
@bitRAKE
That's a novel concurrent stack POP snippet.
But if task_count is originally 0 it'll fail.
Post 12 Oct 2008, 02:43
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4173
Location: vpcmpistri
bitRAKE 12 Oct 2008, 07:15
r22 wrote:
@bitRAKE
That's a novel concurrent stack POP snippet.
But if task_count is originally 0 it'll fail.
Not only should task_count be initialized to >0, but an array of null pointers is needed for [tasks]. The name is mis-leading too - maybe it should be called task_scan. It isn't a count of the tasks.

Anyone tried to code the complementing PUSH? It's a little tricky.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 12 Oct 2008, 07:15
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 12 Oct 2008, 15:41
Here my (untested) attempt
Code:
PutTask: ; >EDX = task ; <EAX = 0 on FAIL
        push    ebx

        mov     ecx, [task_count] ; Which means array size I suppose
        mov     ebx, [tasks]
        lea     ebx, [ebx+ecx*4]
        neg     ecx

.scan:
        xor     eax, eax
   lock cmpxchg [ebx+ecx*4], edx
        je      .exit

        inc     ecx
        jnz     .scan

.exit:
        mov     eax, ecx
        pop     ebx
        ret    
Post 12 Oct 2008, 15:41
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 12 Oct 2008, 17:02
Concurrent Array Stack v0.5
If you PUSH 0 to this stack you'll be sorry when you try to POP it Very Happy
Code:
.data
_STACK_PTR dd 0
_STACK_MAX = 32
_STACK rd 32

.code
STACK_PUSH: ;;esp+4 = item to add ;; eax = 0 FAIL ;; eax != 0 SUCCESS
mov eax,[_STACK_PTR]
mov edx,[esp+4] ;;new item
;;update the pointer
.update:
lea ecx,[eax + 4]
;;test overflow
cmp ecx,_STACK_MAX
ja .overflow
lock cmpxchg [_STACK_PTR],ecx
jnz .update
;;add the new element
mov [_STACK + eax],edx
inc eax ;;success
ret 4
.overflow:
xor eax,eax ;;overflow
ret 4

STACK_POP:
mov eax,[_STACK_PTR]
.update:
lea ecx,[eax-4]
;;test underflow
cmp ecx,-4
je .underflow
lock cmpxchg [_STACK_PTR],ecx
jnz .update
;;get stack element
xor ecx,ecx
;;get element wait for PUSH to complete
.again:
xchg [_STACK + eax - 4],ecx
test ecx,ecx
jz .again
mov eax,ecx
ret 0
.underflow:
xor eax,eax
ret 0
    
Post 12 Oct 2008, 17:02
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4173
Location: vpcmpistri
bitRAKE 14 Oct 2008, 05:47
Code:
SetTask:
        mov edx,[tasks] 
        mov ecx,[edx] ; array length > 0
.scan:  xchg [edx+ecx*4],eax
        test eax,eax 
        je .okay
        dec ecx 
        jne .scan 
        ; fail with EAX non-zero, some task unqueued!
.okay:  retn


GetTask:
        mov edx,[tasks] 
        mov ecx,[edx] ; array length > 0
.scan:  xchg [edx+ecx*4],eax
        test eax,eax 
        jne .okay
        dec ecx 
        jne .scan 
        ; fail with EAX zero
        ; success for EAX non-zero
.okay:  retn    
See? It's tricky. Laughing (no task priority)

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 14 Oct 2008, 05:47
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 15 Oct 2008, 01:25
Concurrency with just XCHG, I like bitRAKEs task list.
Because its O(n) on the Set and Get it makes me wonder at what point (what task list size) will the task list become slower than the concurrent stack 'v0.5'.

Because of the latency on the LOCKed CMPXCHG opcode in the concurrent stack I think the task list would probably outperform it until around 16+ entries. Which means as long as you have more GETters than SETters you could probably get away with a small fast implementation like the task list.

But again it comes full circle back to use cases.

As an aside, WE should probably develop a library (in FASM) with concurrent ADTs (i.e. list, stack, queue, tree, hash map). I suspect concurrency and threading will only become more popular with Intel's new architecture 6x and more cores coming down the line. And although it's fun to reinvent the wheel, reinventing the MULTI-threaded wheel is an order of magnitude more difficult (as this thread and others have highlighted).
Post 15 Oct 2008, 01:25
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.