flat assembler
Message board for the users of flat assembler.

Index > Main > Sorting Algorithm, RadixSort

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 23 Apr 2006, 20:40
shism2 wrote:
That's funny fodder because it doesn't produce a GPF for me


I probably did something wrong when I integrated your code in my test bed... but of course there's also a difference between sorting a couple of numbers (your original) and sorting ~10 million Smile. I didn't have time to test properly and track down the problem Sad

_________________
Image - carpe noctem
Post 23 Apr 2006, 20:40
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 23 Apr 2006, 21:32
I should really see what is wrong with my sort Smile

P!!! 700MHz 384MB RAM
Code:
sorting test - sorting 10485760 signed numbers, using seed 0xBEEFF00D

*** testing for correctness
testing routine libc qsort()
testing routine hutch asm qsort()               failed at index 7 (953 > 479)
testing routine C trimedian qsort()
testing routine STL std::sort()
testing routine r22 FASM RADIX sort()
testing routine David Garcia C++ radix()
testing routine Madis731 Quick&Insert()         failed at index 633 (2 > 1)

*** testing for SPEED
libc qsort() took 8743 ticks
hutch asm qsort() took 5649 ticks
C trimedian qsort() took 4466 ticks
STL std::sort() took 5227 ticks
r22 FASM RADIX sort() took 3084 ticks
David Garcia C++ radix() took 3686 ticks
Madis731 Quick&Insert() took 4557 ticks
    


EDIT: OK, I know where the problem is Smile but that is not a good thing.
You forgot to call insertion sort, after the quicksort
Code:
    mov  eax,[l]
    mov  edx,[h]
    call quicksort
    mov  eax,[l]
    mov  edx,[h]
    call insertsort
    

I guess its my fault Sad because the function should be called only once, not like this and the other thing that happens is my time will go UP Razz - somewhere where hutch is now
Post 23 Apr 2006, 21:32
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
shism2



Joined: 14 Sep 2005
Posts: 248
shism2 23 Apr 2006, 22:53
NICE!
Post 23 Apr 2006, 22:53
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 24 Apr 2006, 05:39
Thanks Madis, that fixed it - perhaps Hutch's version is faster, but at least your version works Smile

shism2: the crash happens inside the @@Next2 loop it seems. Any idea if there's something that prevents your code from working with ~10 million numbers, or have I fscked up something?

I've modified the test a bit so it actually genereates full DWORD randoms, instead of only 16bit randomness, and I fixed the typo so it now says UNsigned. The testbed really should be updated so seed, amount to sort etc. can be specified at runtime...

New numbers:
Quote:

D:\src\test\sort>driver
sorting test - sorting 10485760 UNsigned numbers, using seed 0xBEEFF00D

*** testing for correctness
testing routine libc qsort()
testing routine hutch asm qsort() failed at index 4 (143277874 > 18307664)
testing routine C trimedian qsort()
testing routine STL std::sort()
testing routine r22 FASM RADIX sort()
testing routine David Garcia C++ radix()
testing routine Madis731 Quick&Insert()

*** testing for SPEED
libc qsort() took 2781 ticks
hutch asm qsort() took 1000 ticks
C trimedian qsort() took 1125 ticks
STL std::sort() took 2047 ticks
r22 FASM RADIX sort() took 546 ticks
David Garcia C++ radix() took 750 ticks
Madis731 Quick&Insert() took 1250 ticks


Description:
Download
Filename: sort_test_1.1.zip
Filesize: 29.37 KB
Downloaded: 581 Time(s)


_________________
Image - carpe noctem
Post 24 Apr 2006, 05:39
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 24 Apr 2006, 05:45
When I have some time, I'll make a SIGNED version, and OPTIMIZE for speed instead of memory usage.

-The SIGNED radix sort will lose a little speed since I'll have to check the MSB.
-The OPTIMIZATION should gain me (prediction) 40% speed improvement. Since I'll be sorting by WORDs instead of BYTEs. But it'll be a hit to the amount of memory the function uses. And I'll probably use SSE to clear the "count" memory, so it won't run on the old boxes.

MEMORY USAGE /// TIME
Current: n*4+256*4 bytes /// 3*256 + 4*n
After Opt: n*4+65536*4 bytes /// 3*65536 + 2*n
The extra 200KB is really nothing on modern pcs anyways, so it should be interesting to see the speed improvement. The extra constant time will hurt small list sorts, but if your sorting a small list you can use a bubblesort and it wouldn't matter.
Post 24 Apr 2006, 05:45
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 24 Apr 2006, 06:09
Quote:

The extra 200KB is really nothing on modern pcs anyways, so it should be interesting to see the speed improvement. The extra constant time will hurt small list sorts, but if your sorting a small list you can use a bubblesort and it wouldn't matter.


Speed is mainly interesting when sorting large amounts of stuff, and 200kb extra overhead is just about nothing when doing large sorts - so go ahead Smile

SSE for clearing count memory, well, go ahead - should be easy enough to substitute MMX, plain ALU, etc. Will be interesting to see the bench results Smile
Post 24 Apr 2006, 06:09
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 24 Apr 2006, 11:55
Okey, boys&girls Very Happy
Here we have a "branchlesser" version Wink (or versions)

Code:
format ms coff

public _QISort@8

section '.text' code readable executable 

_QISort@8:
        push    ebx
        push    ebp
        push    esi
        push    edi

        mov             eax, [esp + 16 + 4]             ;# IN eax=low address
        mov             edx, [esp + 16 + 8]             ;# IN edx=high address of the array
        call    quicksort

        mov             eax, [esp + 16 + 4]             ;# IN eax=low address
        mov             edx, [esp + 16 + 8]             ;# IN edx=high address of the array
        call    insertsort

        pop             edi
        pop             esi
        pop             ebp
        pop             ebx

        ret             8
        
        
        
;# IN eax=low address
;# IN edx=high address of the array
;# OUT sorted array
        M=4*4 ;4*DWORD
    quicksort:
       ;mov  eax,[esp+8];low
       ;mov  edx,[esp+4];high
        mov  ecx,edx
        sub  ecx,eax
        cmp  ecx,M
        jc   .exit
        lea  ecx,[eax+edx]
        shr  ecx,3
        lea  ecx,[ecx*4];   i
        mov  ebx,[eax]; a[l]
        mov  esi,[ecx]; a[i]
        mov  edi,[edx]; a[h]

;;#version1                    ; Use this version if your CPU can handle CMOV
;        mov  ebp,ebx          ; extremely well. Maybe you can use it on some
;        cmp  esi,ebx          ; PMMX or Mobilitys etc.
;        cmovna ebx,ebp
;        cmovna esi,ebp
;;        mov  ebp,ebx
;        cmp  edi,ebx
;        cmovna ebx,ebp
;        cmovna edi,ebp
;        mov  ebp,esi
;        cmp  edi,esi
;        cmovna esi,ebp
;        cmovna edi,ebp
;;#version1end

;#version2                     ; Use this version if your CPU beats the CMOV
        push edx               ; with XOR by far. This is P4, some PIIIs etc.

        mov  edx,ebx
        xor  ebp,ebp
        xor  edx,esi
        cmp  ebx,esi
        cmovna edx,ebp
        xor  ebx,edx
        xor  esi,edx

        mov  edx,ebx
        xor  ebp,ebp
        xor  edx,edi
        cmp  ebx,edi
        cmovna edx,ebp
        xor  ebx,edx
        xor  edi,edx

        mov  edx,esi
        xor  ebp,ebp
        xor  edx,edi
        cmp  esi,edi
        cmovna edx,ebp
        xor  esi,edx
        xor  edi,edx

        pop edx
;#version2end

        mov [eax],ebx
        mov [ecx],esi
        mov [edx],edi

        mov  ebp,edx
        sub  ebp,4      ;   j
        mov  esi,[ebp]
        mov  edi,[ecx]
        mov  [ebp],edi
        mov  [ecx],esi
        mov  ecx,eax    ;   i
        mov  ebx,[ebp]; a[j]
      .loop:
        add  ecx,4
        cmp  [ecx],ebx
        jc   .loop
      @@:
        sub  ebp,4
        cmp  [ebp],ebx
        ja   @b
        cmp  ebp,ecx
        jc   @f
        mov  esi,[ecx]
        mov  edi,[ebp]
        mov  [ebp],esi
        mov  [ecx],edi
        jmp  .loop
      @@:
        mov  esi,[ecx]
        mov  ebx,edx
        sub  ebx,4
        mov  edi,[ebx]
        mov  [ebx],esi
        mov  [ecx],edi
        add  ecx,4
        push ecx edx
        mov  edx,ebp
        call quicksort
        pop  edx eax
        call quicksort
      .exit:
    ret
    insertsort:
        mov  esi,eax
      .start:
        add  esi,4
        cmp  esi,edx
        ja   .exit
        mov  ebp,[esi]
        mov  edi,esi
      @@:
        cmp  [edi-4],ebp
        jna  @f
        cmp  edi,eax
        jna  @f
        mov  ebx,[edi-4]
        mov  [edi],ebx
        sub  edi,4
        jmp  @b
      @@:
        mov  [edi],ebp
        jmp  .start
      .exit:
    ret
    

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 24 Apr 2006, 11:55
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 25 Apr 2006, 16:52
EDIT:
I fixed the bug, BUT instead of a 25% speed INCREASE it was a DECREASE.
I can only assume that the extra memory for the counts (262KB) can't be held in processor cache easily and as a result causes bad performance.

I've also messed around with unrolling the loops a bit in the original, but the speed increase so far has been nothing worth noting.


Last edited by r22 on 26 Apr 2006, 00:52; edited 1 time in total
Post 25 Apr 2006, 16:52
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 25 Apr 2006, 17:21
r22, your new code crashes here :/
Post 25 Apr 2006, 17:21
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 26 Apr 2006, 07:34
Mine should be INcrease if I did my tests right - but I don't have the knowledge and tools to build it into that testsuite of f0dder's myself :$
Post 26 Apr 2006, 07:34
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.