flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
Madis731 01 Sep 2008, 20:49
This time I will make a short introduction and then straight to the code.
I wanted to take on a challenge to port GPR bubble-sort code to SSE-style code, but got into trouble when I needed to *fully* sort it. current code is only semi-sorting it: Code: ; If your CPU is Penryn or newer then go ahead and try the SSE4.1 ; Very speedy The previous code would iterate through the array by increments of 16 sorting every 4x4B part of it. You could replace the moves with unaligned ones, but then you'd lose the point. One thought that I came up with it to cache the next 16-byte space and shuffle its 4 values with current ones. This way you will always have 8 values to compare and when you run it through N times its guaranteed to sort. The most unfortunate fact is that we need dummy entries and remember them or agree on a MAX_VAL = 2^32-1 so that the dummies would end up at the same place they started from. This value could not be used then anymore. I think this might be helpful in Quicksorting, but I think I need to be a chess-wizard to implement that ![]() Just to prove I'm not making things up: http://www.trl.ibm.com/people/inouehrs/pdf/PACT2007-SIMDsort.pdf http://www.cs.ualberta.ca/~furtak/pub/TR07-02.pdf There are more... Network sort is probably my best bet for SIMD approach: http://www.cs.brandeis.edu/~hugues/sorting_networks.html Compare Bubble-sort on a 16-wide array (up to 120 exchanges) with Network sort (10 parallel moves in ideal, but 60 exchanges). |
|||
![]() |
|
tripledot 23 Apr 2012, 16:11
Yo Madis, did you get any further with your network sort idea?
|
|||
![]() |
|
Madis731 23 Apr 2012, 18:51
Unfortunately not. I have seen your topic (http://board.flatassembler.net/topic.php?p=143190#143190) where you ended up with network sort, but I didn't find anything valuable to add. I know its not a dead subject, but it needs a lot of investigation. Maybe if we get some wizards on board like MHajduk, r22 etc who would give a second look at the topic. I don't think I have any new ideas regarding network sort. Even taking into account the capabilities of AVX and AVX2.
|
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.