flat assembler
Message board for the users of flat assembler.
Index
> Macroinstructions > Efficient generic sorting algorithms 
Author 

l_inc 10 Jan 2015, 01:35
Hello, everyone. I've been working on basic sorting algorithms in fasm macrolanguage for the last two weeks. Now I'd like to share some results.
Following macros are generic sorting algorithms. They could be used to sort whatever values during compile time by providing a custom comparison macro. E.g., if you need to sort an array of pointers to strings you'd do something like this: Code: include 'shsort.inc' irp s,'we','are','gonna','sort','this','sentence' { forward local p p db s,0 common label pStrings dword forward dd p common countof.pStrings = ($pStrings)/4 } macro ifless s1*,s2* { local c1,c2 while 1 load c1 from s1+%1 load c2 from s2+%1 if c1 = 0  c1 <> c2 break end if end while if c1 < c2 } mShellSort pStrings,countof.pStrings,ifless,4 I.e., the first argument must be a pointer to an array (with an optional specification of an addressing space), the second argument is a number of elements in the array, the third argument is a name of a custom comparison macro that ends with an ifstatement header, and the forth argument is the size of an individual array element (1,2,4 or 8 ). I guess, the most commonly used example for the above sorting is the standard export building macro in fasm, which btw. uses a somewhat inferior implementation of a binary Shell Sort. The following macros are implemented: 1) mInsertionSort, which is the only algorithm here having quadratic complexity. It's advantages however are its simplicity together with the stability property. 2) mBinShellSort, which is a nonalgorithmic improvement of the sorting used by the fasm standard export building macros. This implementation is 10 to 40 percent faster on average. 3) mCombSort. It's somewhat faster than the previous algorithm. 4) mShellSort. This is presumably the fastest Shell Sort known until now. Contrary to the descending powers of two used by mBinShellSort for gaps this implementation uses the Tokuda sequence. 5) mHeapSort. A heavily optimized Heap Sort well known for its guaranteed n*log(n) complexity. 6) mQuickSort. A heavily optimized iterative Quick Sort. Besides the fact that recursive isn't really possible, iterative implementations are also more efficient. This implementation uses 0x20 pairs of dwords to simulate the stack. It also uses the split [x<=pivot; pivot<=x]. The split together with the choice of pivot in the middle avoids degradation on trivial inputs such as sorted arrays ([x<pivot; x==pivot; pivot<x] becomes 50 to 100 percent worse) or arrays with many equal elements ([x<=pivot; pivot<x] becomes quadratic). However I'd still generally avoid Quick Sort as there's always a killer sequence that makes it quadratic and hence unacceptably slow (see the graph below). 7) mMergeSort. A heavily optimized Merge Sort. Very fast. And the second stable (!) in this set of macros. It's only disadvantage is that it has linear (auxiliary) space complexity, which means for this implementation half the largest array sorted during the compilation. The halfarray approach makes it somewhat adaptive. I could hardly imagine that the macros can be improved any further with respect to performance. Except for mInsertionSort that could be using binary search, but remain quadratic, and maybe mQuickSort with respect to the choice of the pivot, which is IMHO pointless for reasons mentioned above. The performance of all the macros was tested with the following inputs: Code: macro caseBigRandom { file 'rdrand_8M_0.bin':0, ARRAY_COUNT*4 } macro caseBigSorted { file 'rdrand_8M_1.bin':0, ARRAY_COUNT*4 } macro caseBigReversed { file 'rdrand_8M_2.bin':0, ARRAY_COUNT*4 } macro caseBigSplitSorted { file 'rdrand_8M_3.bin':0, ARRAY_COUNT*2 file 'rdrand_8M_3.bin':$400000, ARRAY_COUNT*2 } macro caseSplitCopy { times ARRAY_COUNT dd % mod (ARRAY_COUNT/2) } macro caseManyCopy { times ARRAY_COUNT dd % mod $FF } macro caseAllCopy { times ARRAY_COUNT dd $01234567 } , where "rdrand_8M_0.bin" contains rdrand output from my Haswell, "rdrand_8M_1.bin" — presorted rdrand output, "rdrand_8M_2.bin" — reversely presorted rdrand output, and "rdrand_8M_3.bin" — output of rdrand that was split into 2 chunks, then each was sorted separately, and the result was concatenated back. Each file is 8MB large. The last one comes a bit close to the Quick Sort killer sequence, and caseSplitCopy is even closer. Additionally all macros have been tested for correctness using random inputs of all sizes from 2 to 1024. Some neat pictures are attached (more pictures in the archive*). The binary input files (random data) are not attached (mainly because of size, and because it's pointless to attach them). To shortly summarize: Merge Sort is the winner. * the 7z archive inside is actually an *.ods, which I had to repack because of the board limitation for attachment size. Hence, you'll need to reverse that to see the performance data and more graphs.
_________________ Faith is a superposition of knowledge and fallacy 

10 Jan 2015, 01:35 

revolution 10 Jan 2015, 02:23
How about a radix sort? That would be good to see as a preprocessor macro.


10 Jan 2015, 02:23 

JohnFound 10 Jan 2015, 07:39
l_inc, this is really great! I always knew you are FASM macro genious!


10 Jan 2015, 07:39 

l_inc 10 Jan 2015, 14:56
JohnFound
Implementing existing algorithms is rather a mechanical coding work than a sign of geniality, but still thank you. I appreciate appreciation. _________________ Faith is a superposition of knowledge and fallacy 

10 Jan 2015, 14:56 

< Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992023, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.