flat assembler
Message board for the users of flat assembler.
![]() Goto page Previous 1, 2 |
Author |
|
Madis731 23 Apr 2006, 21:32
I should really see what is wrong with my sort
![]() P!!! 700MHz 384MB RAM Code: sorting test - sorting 10485760 signed numbers, using seed 0xBEEFF00D *** testing for correctness testing routine libc qsort() testing routine hutch asm qsort() failed at index 7 (953 > 479) testing routine C trimedian qsort() testing routine STL std::sort() testing routine r22 FASM RADIX sort() testing routine David Garcia C++ radix() testing routine Madis731 Quick&Insert() failed at index 633 (2 > 1) *** testing for SPEED libc qsort() took 8743 ticks hutch asm qsort() took 5649 ticks C trimedian qsort() took 4466 ticks STL std::sort() took 5227 ticks r22 FASM RADIX sort() took 3084 ticks David Garcia C++ radix() took 3686 ticks Madis731 Quick&Insert() took 4557 ticks EDIT: OK, I know where the problem is ![]() You forgot to call insertion sort, after the quicksort Code: mov eax,[l] mov edx,[h] call quicksort mov eax,[l] mov edx,[h] call insertsort I guess its my fault ![]() ![]() |
|||
![]() |
|
shism2 23 Apr 2006, 22:53
NICE!
|
|||
![]() |
|
f0dder 24 Apr 2006, 05:39
Thanks Madis, that fixed it - perhaps Hutch's version is faster, but at least your version works
![]() shism2: the crash happens inside the @@Next2 loop it seems. Any idea if there's something that prevents your code from working with ~10 million numbers, or have I fscked up something? I've modified the test a bit so it actually genereates full DWORD randoms, instead of only 16bit randomness, and I fixed the typo so it now says UNsigned. The testbed really should be updated so seed, amount to sort etc. can be specified at runtime... New numbers: Quote:
_________________ ![]() |
|||||||||||
![]() |
|
r22 24 Apr 2006, 05:45
When I have some time, I'll make a SIGNED version, and OPTIMIZE for speed instead of memory usage.
-The SIGNED radix sort will lose a little speed since I'll have to check the MSB. -The OPTIMIZATION should gain me (prediction) 40% speed improvement. Since I'll be sorting by WORDs instead of BYTEs. But it'll be a hit to the amount of memory the function uses. And I'll probably use SSE to clear the "count" memory, so it won't run on the old boxes. MEMORY USAGE /// TIME Current: n*4+256*4 bytes /// 3*256 + 4*n After Opt: n*4+65536*4 bytes /// 3*65536 + 2*n The extra 200KB is really nothing on modern pcs anyways, so it should be interesting to see the speed improvement. The extra constant time will hurt small list sorts, but if your sorting a small list you can use a bubblesort and it wouldn't matter. |
|||
![]() |
|
f0dder 24 Apr 2006, 06:09
Quote:
Speed is mainly interesting when sorting large amounts of stuff, and 200kb extra overhead is just about nothing when doing large sorts - so go ahead ![]() SSE for clearing count memory, well, go ahead - should be easy enough to substitute MMX, plain ALU, etc. Will be interesting to see the bench results ![]() |
|||
![]() |
|
Madis731 24 Apr 2006, 11:55
Okey, boys&girls
![]() Here we have a "branchlesser" version ![]() Code: format ms coff public _QISort@8 section '.text' code readable executable _QISort@8: push ebx push ebp push esi push edi mov eax, [esp + 16 + 4] ;# IN eax=low address mov edx, [esp + 16 + 8] ;# IN edx=high address of the array call quicksort mov eax, [esp + 16 + 4] ;# IN eax=low address mov edx, [esp + 16 + 8] ;# IN edx=high address of the array call insertsort pop edi pop esi pop ebp pop ebx ret 8 ;# IN eax=low address ;# IN edx=high address of the array ;# OUT sorted array M=4*4 ;4*DWORD quicksort: ;mov eax,[esp+8];low ;mov edx,[esp+4];high mov ecx,edx sub ecx,eax cmp ecx,M jc .exit lea ecx,[eax+edx] shr ecx,3 lea ecx,[ecx*4]; i mov ebx,[eax]; a[l] mov esi,[ecx]; a[i] mov edi,[edx]; a[h] ;;#version1 ; Use this version if your CPU can handle CMOV ; mov ebp,ebx ; extremely well. Maybe you can use it on some ; cmp esi,ebx ; PMMX or Mobilitys etc. ; cmovna ebx,ebp ; cmovna esi,ebp ;; mov ebp,ebx ; cmp edi,ebx ; cmovna ebx,ebp ; cmovna edi,ebp ; mov ebp,esi ; cmp edi,esi ; cmovna esi,ebp ; cmovna edi,ebp ;;#version1end ;#version2 ; Use this version if your CPU beats the CMOV push edx ; with XOR by far. This is P4, some PIIIs etc. mov edx,ebx xor ebp,ebp xor edx,esi cmp ebx,esi cmovna edx,ebp xor ebx,edx xor esi,edx mov edx,ebx xor ebp,ebp xor edx,edi cmp ebx,edi cmovna edx,ebp xor ebx,edx xor edi,edx mov edx,esi xor ebp,ebp xor edx,edi cmp esi,edi cmovna edx,ebp xor esi,edx xor edi,edx pop edx ;#version2end mov [eax],ebx mov [ecx],esi mov [edx],edi mov ebp,edx sub ebp,4 ; j mov esi,[ebp] mov edi,[ecx] mov [ebp],edi mov [ecx],esi mov ecx,eax ; i mov ebx,[ebp]; a[j] .loop: add ecx,4 cmp [ecx],ebx jc .loop @@: sub ebp,4 cmp [ebp],ebx ja @b cmp ebp,ecx jc @f mov esi,[ecx] mov edi,[ebp] mov [ebp],esi mov [ecx],edi jmp .loop @@: mov esi,[ecx] mov ebx,edx sub ebx,4 mov edi,[ebx] mov [ebx],esi mov [ecx],edi add ecx,4 push ecx edx mov edx,ebp call quicksort pop edx eax call quicksort .exit: ret insertsort: mov esi,eax .start: add esi,4 cmp esi,edx ja .exit mov ebp,[esi] mov edi,esi @@: cmp [edi-4],ebp jna @f cmp edi,eax jna @f mov ebx,[edi-4] mov [edi],ebx sub edi,4 jmp @b @@: mov [edi],ebp jmp .start .exit: ret |
|||
![]() |
|
r22 25 Apr 2006, 16:52
EDIT:
I fixed the bug, BUT instead of a 25% speed INCREASE it was a DECREASE. I can only assume that the extra memory for the counts (262KB) can't be held in processor cache easily and as a result causes bad performance. I've also messed around with unrolling the loops a bit in the original, but the speed increase so far has been nothing worth noting. Last edited by r22 on 26 Apr 2006, 00:52; edited 1 time in total |
|||
![]() |
|
f0dder 25 Apr 2006, 17:21
r22, your new code crashes here :/
|
|||
![]() |
|
Madis731 26 Apr 2006, 07:34
Mine should be INcrease if I did my tests right - but I don't have the knowledge and tools to build it into that testsuite of f0dder's myself :$
|
|||
![]() |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.