flat assembler
Message board for the users of flat assembler.
Index
> Windows > Mandelbrot Benchmark FPU/SSE2 released Goto page Previous 1, 2, 3 ... 7, 8, 9 ... 18, 19, 20 Next |
Author |
|
Madis731 16 Dec 2007, 18:18
I can see the same patterns here like in Tom's and Anand's articles:
http://www.tomshardware.com/2007/11/19/the_spider_weaves_its_web/page22.html http://www.anandtech.com/showdoc.aspx?i=3153&p=6 Like 'come again!?!?' AMD Phenom 9900 (@2.6GHz) is beaten by Intel Core 2 Quad Q6600 (@2.4GHz). Wait a minute. Even in power consumption? ... Yes - even in that - http://www.anandtech.com/showdoc.aspx?i=3153&p=10 |
|||
16 Dec 2007, 18:18 |
|
revolution 16 Dec 2007, 18:23
I think the AMD execs will be crying over Phenom. It was supposed to be something special, instead it is something below average.
|
|||
16 Dec 2007, 18:23 |
|
Madis731 16 Dec 2007, 18:30
I wouldn't say average right away - its actually the next step from them and its rather good - just Intel did something even more remarkable
|
|||
16 Dec 2007, 18:30 |
|
revolution 16 Dec 2007, 18:36
There's only two players, AMD and Intel, one is below ave and the other above ave AMD need to lower their price to match the performance else no one will buy!
|
|||
16 Dec 2007, 18:36 |
|
Kuemmel 19 Dec 2007, 18:42
@Xorpd!
I recently looked again at the quickman-code, it seems to me that he counts iterations in a different way. We only add iterations when a point is not diverged, but I think the quickman-code still adds iteration counts if one of the two points in an SSE register is already diverged...it can be quite a difference, so in the end he will get more iteration counts than us and finally a higher speed value...what wouldn't make it comparable...got some time to check his code also ?...I'm not 100 % sure, though. Okay, in the end it's also a matter of definition. In a non SSE code you would stop iteration in an SSE code you wouldn't due to speed, though it can be still seen as a computed but useless iteration... |
|||
19 Dec 2007, 18:42 |
|
revolution 19 Dec 2007, 19:53
Here are some nice high-precision Madelbrot images. If you haven't seen them before you will be amazed.
Deepzooming with Fractint Fractint Fractal Gallery |
|||
19 Dec 2007, 19:53 |
|
Xorpd! 19 Dec 2007, 22:00
Oh well, I composed a big long reply and the forum software seems to have timed me out. What a waste of time.
|
|||
19 Dec 2007, 22:00 |
|
f0dder 20 Dec 2007, 14:47
Xorpd!: use FireFox and the It's All Text! extension (possibly with a bbcode-knowing editor) to avoid being bitten by timeouts and crashes. It's worth it
|
|||
20 Dec 2007, 14:47 |
|
bitRAKE 21 Dec 2007, 05:34
The DDRAW.INC file in the current version of FASM is different: (Un)LockSurface has changed to (Un)Lock. A major change is in the removal of the RET macro - which was just plain crazy in the first place, iyam.
(Sorry, I have not read the whole thread - maybe, this has been stated already. Having multiple include paths for the same files is not my idea of fun - recipe for error actually.) I don't believe it fair to compare a single core running 16 threads to multiple cores running 16 threads. Unless the goal is to see how bad single cores perform on software designed for multiple cores. :-/ |
|||
21 Dec 2007, 05:34 |
|
bitRAKE 21 Dec 2007, 07:44
I changed the frame_counter_loop:
Code: NUMBER_OF_THREADS = 2 frame_counter_loop: stdcall Frame_Loop, NUMBER_OF_THREADS ... Frame_Loop: label .threads dword at esp+4 xor ebx,ebx mov esi,[.threads] lea edi,[threadhandles] .0: if ALGO=0 invoke CreateThread,NULL, 0, thread_draw_fpu, ebx, \ REALTIME_PRIORITY_CLASS or CREATE_SUSPENDED, tId else if ALGO=1 invoke CreateThread,NULL, 0, thread_draw_sse2, ebx, \ REALTIME_PRIORITY_CLASS or CREATE_SUSPENDED, tId else if ALGO=2 invoke CreateThread,NULL, 0, thread_draw_sse3, ebx, \ REALTIME_PRIORITY_CLASS or CREATE_SUSPENDED, tId else if ALGO=3 invoke CreateThread,NULL, 0, thread_draw_sse3_vodnaya, ebx, \ REALTIME_PRIORITY_CLASS or CREATE_SUSPENDED, tId end if mov [edi],eax add edi,4 add ebx,LINE_INTERLEAVE dec esi jne .0 mov [current_line],ebx ; start drawing here next mov ebx,[.threads] inc esi ; affinity mask lea edi,[threadhandles] .1: invoke SetThreadAffinityMask,dword [edi],esi shl esi,1 add edi,4 dec ebx jne .1 mov ebx,[.threads] lea edi,[threadhandles] .2: invoke ResumeThread,dword [edi] add edi,4 dec ebx jne .2 invoke WaitForMultipleObjectsEx,[.threads],threadhandles,1,-1,0 mov ebx,[.threads] lea edi,[threadhandles] .3: invoke CloseHandle, dword [edi] add edi,4 dec ebx jne .3 retn 4 Is this correct in the orginal code? Code: mov [current_line], LINE_INTERLEAVE*15 ; start drawing here next |
|||
21 Dec 2007, 07:44 |
|
Kuemmel 21 Dec 2007, 10:45
bitRAKE wrote: Virtually no change in performance, but the program is smaller. Seems there is a slight improvement (+0.25% ) when the number of threads is one greater than the number of cores. Ups, may be, may be not, didn't look at the code for more than a year...actually, the 0.53 version from me can't be seen as a masterpeice of coding It was my first x86 assembly programming and involved lots of help here from others and lots of copy and paste from examples...so quite messy... Thanks a lot for your help, so it seems you modified it and compiled it may be with the lates FASM version ? Could you attach the file here ? Would be nice, it could be a new start version for me to enhance it ! |
|||
21 Dec 2007, 10:45 |
|
bitRAKE 21 Dec 2007, 17:03
There was another bug in the stack frame I suggested. This file has ALGO=1 changed to new frame (that was good for more than +1% ). Few other changes, too.
Lots of room for improvement, imho.
|
|||||||||||
21 Dec 2007, 17:03 |
|
Xorpd! 24 Dec 2007, 06:23
A couple of random notes:
Saving this in notepad before posting... |
|||
24 Dec 2007, 06:23 |
|
asmfan 25 Dec 2007, 09:38
2 bitRAKE - algo 4 fails with error at this point:
Code: ; get color word shl ecx, 2 mov eax, [edx+ecx] mov [ebx], eax ; (!!!) exactly this WinXP 32bit sp2+ _________________ Any offers? |
|||
25 Dec 2007, 09:38 |
|
madmatt 25 Dec 2007, 10:15
asmfan wrote: 2 bitRAKE - algo 4 fails with error at this point: same thing happens to me too, here is a crash dump from DrWatson: (Look for the word 'FAULT' to know where the crash occurred, around the center of the listing) Code: Application exception occurred: App: C:\Documents and Settings\Matt Childress\My Documents\Matts Stuff\Assembly Code Examples\KMB_V0.53_MT(bitRAKE).exe (pid=4016) When: 12/25/2007 @ 05:10:48.046 Exception number: c0000005 (access violation) *----> State Dump for Thread Id 0x920 <----* eax=00d4d484 ebx=89f2a890 ecx=00000000 edx=004036ab esi=000003a0 edi=00000020 eip=0040157f esp=007eff00 ebp=007effa4 iopl=0 nv up ei pl zr na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 function: KMB_V0.53_MT(bitRAKE) 00401464 c9 leave 00401465 c21000 ret 0x10 00401468 90 nop 00401469 90 nop 0040146a 90 nop 0040146b 90 nop 0040146c 90 nop 0040146d 90 nop 0040146e 90 nop 0040146f 90 nop 00401470 90 nop 00401471 90 nop 00401472 90 nop 00401473 90 nop 00401474 90 nop 00401475 90 nop 00401476 90 nop 00401477 90 nop 00401478 90 nop 00401479 90 nop 0040147a 90 nop 0040147b 90 nop 0040147c 90 nop 0040147d 90 nop 0040147e 90 nop 0040147f 90 nop 00401480 8d6c24ec lea ebp,[esp-0x14] 00401484 81ec93000000 sub esp,0x93 0040148a 896d0c mov [ebp+0xc],ebp 0040148d 897d08 mov [ebp+0x8],edi 00401490 897504 mov [ebp+0x4],esi 00401493 895d00 mov [ebp],ebx 00401496 83e4c0 and esp,0xffffffc0 00401499 a19e364000 mov eax,[KMB_V0.53_MT(bitRAKE)+0x369e (0040369e)] 0040149e 6bc018 imul eax,eax,0x18 004014a1 dd80c0314000 fld qword ptr [eax+0x4031c0] 004014a7 dd80c8314000 fld qword ptr [eax+0x4031c8] 004014ad dd80d0314000 fld qword ptr [eax+0x4031d0] 004014b3 dd5c2410 fstp qword ptr [esp+0x10] 004014b7 dd5c2408 fstp qword ptr [esp+0x8] 004014bb dd1c24 fstp qword ptr [esp] 004014be dd442408 fld qword ptr [esp+0x8] 004014c2 dc2424 fsub qword ptr [esp] 004014c5 d9e1 fabs 004014c7 da3530314000 fidiv dword ptr [KMB_V0.53_MT(bitRAKE)+0x3130 (00403130)] 004014cd dd5c2418 fstp qword ptr [esp+0x18] 004014d1 c744243800000000 mov dword ptr [esp+0x38],0x0 004014d9 baab364000 mov edx,0x4036ab 004014de 8b1db8304000 mov ebx,[KMB_V0.53_MT(bitRAKE)+0x30b8 (004030b8)] 004014e4 81c390010000 add ebx,0x190 004014ea 8b35a4304000 mov esi,[KMB_V0.53_MT(bitRAKE)+0x30a4 (004030a4)] 004014f0 8b4514 mov eax,[ebp+0x14] 004014f3 0fafc6 imul eax,esi 004014f6 01c3 add ebx,eax 004014f8 81ee60090000 sub esi,0x960 004014fe 8b4514 mov eax,[ebp+0x14] 00401501 83c001 add eax,0x1 00401504 89442434 mov [esp+0x34],eax 00401508 dd442418 fld qword ptr [esp+0x18] 0040150c da4d14 fimul dword ptr [ebp+0x14] 0040150f dc442410 fadd qword ptr [esp+0x10] 00401513 dd5c2420 fstp qword ptr [esp+0x20] 00401517 c744243000000000 mov dword ptr [esp+0x30],0x0 0040151f ded9 fcompp 00401521 ded9 fcompp 00401523 ded9 fcompp 00401525 dd0520314000 fld qword ptr [KMB_V0.53_MT(bitRAKE)+0x3120 (00403120)] 0040152b dd442418 fld qword ptr [esp+0x18] 0040152f b860090000 mov eax,0x960 00401534 da4c2430 fimul dword ptr [esp+0x30] 00401538 dc0424 fadd qword ptr [esp] 0040153b dd442420 fld qword ptr [esp+0x20] 0040153f d9c1 fld st(1) 00401541 d9c1 fld st(1) 00401543 d9c1 fld st(1) 00401545 d8ca fmul st,st(2) 00401547 d9c1 fld st(1) 00401549 d8ca fmul st,st(2) 0040154b d9ca fxch st(2) 0040154d decb fmulp st(3),st 0040154f d9c0 fld st(0) 00401551 d8c2 fadd st,st(2) 00401553 d9cb fxch st(3) 00401555 d8c0 fadd st,st(0) 00401557 d9ca fxch st(2) 00401559 dee9 fsubrp st(1),st 0040155b d9ca fxch st(2) 0040155d dff5 fcomip st,st(5) 0040155f d8c2 fadd st,st(2) 00401561 d9c9 fxch st(1) 00401563 d8c3 fadd st,st(3) 00401565 d9c9 fxch st(1) 00401567 7705 ja KMB_V0.53_MT(bitRAKE)+0x156e (0040156e) 00401569 83e801 sub eax,0x1 0040156c 75d5 jnz KMB_V0.53_MT(bitRAKE)+0x1543 (00401543) 0040156e b960090000 mov ecx,0x960 00401573 29c1 sub ecx,eax 00401575 014c2438 add [esp+0x38],ecx 00401579 c1e102 shl ecx,0x2 0040157c 8b040a mov eax,[edx+ecx] FAULT ->0040157f 8903 mov [ebx],eax ds:0023:89f2a890=???????? 00401581 8344243001 add dword ptr [esp+0x30],0x1 00401586 83c304 add ebx,0x4 00401589 817c243058020000 cmp dword ptr [esp+0x30],0x258 00401591 758c jnz KMB_V0.53_MT(bitRAKE)+0x151f (0040151f) 00401593 83451401 add dword ptr [ebp+0x14],0x1 00401597 01f3 add ebx,esi 00401599 8b442434 mov eax,[esp+0x34] 0040159d 394514 cmp [ebp+0x14],eax 004015a0 0f8562ffffff jne KMB_V0.53_MT(bitRAKE)+0x1508 (00401508) 004015a6 b801000000 mov eax,0x1 004015ab f00fc105a6364000 lock xadd [KMB_V0.53_MT(bitRAKE)+0x36a6 (004036a6)],eax 004015b3 83c001 add eax,0x1 004015b6 894514 mov [ebp+0x14],eax 004015b9 3d58020000 cmp eax,0x258 004015be 0f8c1affffff jl KMB_V0.53_MT(bitRAKE)+0x14de (004014de) 004015c4 8b442438 mov eax,[esp+0x38] 004015c8 f0010528314000 lock add [KMB_V0.53_MT(bitRAKE)+0x3128 (00403128)],eax 004015cf 8b6d0c mov ebp,[ebp+0xc] 004015d2 8b7d08 mov edi,[ebp+0x8] 004015d5 8b7504 mov esi,[ebp+0x4] 004015d8 8b5d00 mov ebx,[ebp] 004015db 8d6514 lea esp,[ebp+0x14] 004015de c20400 ret 0x4 004015e1 0000 add [eax],al 004015e3 0000 add [eax],al 004015e5 0000 add [eax],al 004015e7 0000 add [eax],al 004015e9 0000 add [eax],al 004015eb 0000 add [eax],al 004015ed 0000 add [eax],al 004015ef 0000 add [eax],al 004015f1 0000 add [eax],al 004015f3 0000 add [eax],al 004015f5 0000 add [eax],al 004015f7 0000 add [eax],al 004015f9 0000 add [eax],al 004015fb 0000 add [eax],al 004015fd 0000 add [eax],al 004015ff 0000 add [eax],al 00401601 0000 add [eax],al 00401603 0000 add [eax],al 00401605 0000 add [eax],al 00401607 0000 add [eax],al 00401609 0000 add [eax],al 0040160b 0000 add [eax],al 0040160d 0000 add [eax],al 0040160f 0000 add [eax],al 00401611 0000 add [eax],al 00401613 0000 add [eax],al 00401615 0000 add [eax],al 00401617 0000 add [eax],al 00401619 0000 add [eax],al 0040161b 0000 add [eax],al 0040161d 0000 add [eax],al 0040161f 0000 add [eax],al 00401621 0000 add [eax],al 00401623 0000 add [eax],al 00401625 0000 add [eax],al 00401627 0000 add [eax],al 00401629 0000 add [eax],al 0040162b 0000 add [eax],al 0040162d 0000 add [eax],al 0040162f 0000 add [eax],al 00401631 0000 add [eax],al 00401633 0000 add [eax],al 00401635 0000 add [eax],al 00401637 0000 add [eax],al 00401639 0000 add [eax],al 0040163b 0000 add [eax],al 0040163d 0000 add [eax],al 0040163f 0000 add [eax],al 00401641 0000 add [eax],al 00401643 0000 add [eax],al 00401645 0000 add [eax],al 00401647 0000 add [eax],al 00401649 0000 add [eax],al 0040164b 0000 add [eax],al 0040164d 0000 add [eax],al 0040164f 0000 add [eax],al 00401651 0000 add [eax],al 00401653 0000 add [eax],al 00401655 0000 add [eax],al 00401657 0000 add [eax],al 00401659 0000 add [eax],al 0040165b 0000 add [eax],al 0040165d 0000 add [eax],al 0040165f 0000 add [eax],al 00401661 0000 add [eax],al 00401663 0000 add [eax],al 00401665 0000 add [eax],al 00401667 0000 add [eax],al 00401669 0000 add [eax],al 0040166b 0000 add [eax],al 0040166d 0000 add [eax],al 0040166f 0000 add [eax],al 00401671 0000 add [eax],al 00401673 0000 add [eax],al 00401675 0000 add [eax],al 00401677 0000 add [eax],al 00401679 0000 add [eax],al |
|||
25 Dec 2007, 10:15 |
|
bitRAKE 25 Dec 2007, 16:07
need to add a line to thread setup: dec ebx.
Code: mov [edi],eax add edi,4 add ebx,LINE_INTERLEAVE dec esi jne .0 dec ebx ;<-------- mov [current_line],ebx ALGO=1 is the only one I made changes to directly. |
|||
25 Dec 2007, 16:07 |
|
Kuemmel 26 Dec 2007, 09:15
Here we go,
I made an optimized version trying to implement some ideas from Quickman and Xorpd for SSE2. My basic thought was to compute 4 points in one iteration loop with independet instruction lines, like Quickman, with minimizing local variables for intermediates (thanks to Bitrake for the help). I found that at least at the Core 2 Duo interleaving instruction by instruction pays less of than interleaving bigger blocks...the CPU seems to do the job by itself, I guess, lots of try and error at least for me... Still way to go to catch up with Xorpds 64bit X4 version, and I still got to see how the lack of registers for the 32bit version can be compensated or not...at the moment it looks quite awfull regarding memory access but pays off, of course I think for sure that's not the end of optimization...here's the result: KMB_0.53_SSE2: 520,206 Million Iterations/s KMB_0.53B_SSE2: 817,714 Million Iterations/s (Core 2 Duo, 1,87 GHz) ...so at least a gain of about 57 %. The code is attached (also the include files, in case of problems compiling)...any comments welcome !!!
|
|||||||||||
26 Dec 2007, 09:15 |
|
Madis731 26 Dec 2007, 21:59
846.045MiPS T7200 SSE2
lower-case "i" is iterations - not instructions PS. The "back" button seems to work most of the times - browser retains the value of the input-box. When I suspect some errors, I go back and copy the text => then check the boards that it *really* went there and: 1) forget the text in the clipboard 2) or try to post again, when it failed |
|||
26 Dec 2007, 21:59 |
|
bitRAKE 26 Dec 2007, 23:10
here are some interesting location in .log file format for Quickman.
|
|||||||||||
26 Dec 2007, 23:10 |
|
Goto page Previous 1, 2, 3 ... 7, 8, 9 ... 18, 19, 20 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.