flat assembler
Message board for the users of flat assembler.

Index > Windows > Mandelbrot Benchmark FPU/SSE2 released

Goto page Previous  1, 2, 3 ... 10, 11, 12 ... 18, 19, 20  Next
Author
Thread Post new topic Reply to topic
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
...strange...hm, does the former version 0.53E SSE2 work ? I also changed that a small line memory array is set up and when one line is finished then only it is plot on the screen...but why is it working on Core 2 and not on Quad...?
Post 30 Mar 2008, 22:09
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
Damn - I was meant to try that at home if it crashed on my quad, but I forgot Sad I'll do it today!
Post 31 Mar 2008, 06:22
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan
sse2 crashes at 4025b0h too
Code:
.finished_end_of_line:
;Crash in there:
        mov ebp, [ebx+edx*4]                    ; get colour word
    

Did you test it?
Post 31 Mar 2008, 07:18
View user's profile Send private message Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
@asmfan: Thanks for the hint...I discovered that in very few cases there seems to be a result of the iteration count that is negative.

So for the moment I just capture the error and I uploaded that modified version again to my homepage...anybody still got crashes ?

I'll try to find the error in my logic how this does come up, though I don't expect that the error harms the basic performance much.

To find out when this happens, is there a nice routine in Direct-Draw that saves a screendump on the harddrive ?

@edit: I found the problem...in like 9 cases cases of the millions of iterations done it happens that in a pixel pair of one xmm register one pixel reached maximum iterations and the other is diverged at the same time...fixed that by first detecting for maximum iterations. I guess now it's working...though I have to check also if the case is vice versa...at least no more crash should happen. Just the whole logic needs some corrections.
Post 31 Mar 2008, 17:05
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
SSE2 crashes - even when affinity set to any ONE CPU or any two or three Razz
FPU runs fine with some nearly 600 result.
Q6600, W2K3s, x64, SP2
Post 31 Mar 2008, 22:13
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
The FPU version also works here, but there's some rendering artifacts - perhaps there's a few bugs in the graphics code? Gives 756 result here, I'm running my Q6600@3.0GHz.
Post 31 Mar 2008, 23:04
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
@Madis731 and Fodder,

I'm still on the search for bugfix, think I didn't catach all overflows with my split 16bit counter in one 32bit register...for the moment I try to capture all the overflows. I attached this version to test directly here...please test again if you got time. Thanks for the help ! I still can't reproduce any crashes on an Core 2 Duo 6300.


Description:
Download
Filename: KMB_V0.53F-32b-MT_SSE2.zip
Filesize: 3.53 KB
Downloaded: 30 Time(s)

Post 01 Apr 2008, 20:37
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4240
Location: 2018
edfed
i've tested. it crashes my machine.
must shutdown with brute force. no keyboard, only the mouse pointer.

my machine is not sse2, but at least it shall tell me it require sse2. please try to correct this problem. Smile
Post 01 Apr 2008, 20:47
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
@edfed: Yep, something on my list...CPU/SSE2 detection and warning message...

In the meantime I got positive results from an Core 2 Duo 6550 with no crashes any more (crashed about every second time). Now the latest version is on my webpage for download. Hope it's working now also on the Quad Core's.

http://www.mikusite.de/x86/KMB_V0.53F-32b-MT.zip
Post 02 Apr 2008, 19:12
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Works - Q6600@3GHz scored 3976 with SSE2 version Smile
Post 02 Apr 2008, 22:00
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2937
Location: vpcmipstrm
bitRAKE
1.6Ghz Dothan / 179.06 SSE2 / 106.72 FPU

About +10% for SSE2.

_________________
¯\(°_o)/¯ unlicense.org
Post 02 Apr 2008, 22:58
View user's profile Send private message Visit poster's website Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan
Kuemmel you are using CloseWindow at the end of program is it purposely? I beleave you need to use DestroyWindow instead because CloseWindow just minimizes window? I met the situation when it wasn't minimized and was top level black box above the final Message Box with results. Had to alt+tabbed to MB blindly.
Post 03 Apr 2008, 06:49
View user's profile Send private message Reply with quote
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt
No crashes here. My results for 'KMB_V0.53F-32b-MT'
CPU: Intel Celeron D 352 3.2Ghz (1.5GB ram)
SSE2: 472.975 <---> FPU: 101.665
Post 03 Apr 2008, 10:13
View user's profile Send private message Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
@asmfan: I tried 'DestroyWindow' but then my result message window wasn't shown...hm,...I don't have much of a clue about OS-coding, that part was done by plenty guys of the forum, still learning about that...so if somebody else got a solution...?

@edfed: I implemented a SSE2 detection now and uploaded it (still available on the same location on my webpage, didn't update the version for that)...hopefully it works and doesn't harm anything Wink Tested on my old Athlon Thunderbird and Sempron where it works well. I don't check for FPU as it should be there since all of these 486SX cpu's...

If there's anybody with a P4 and can test again with and without Hyperthreading...would be nice...should give a clue about how the upcoming Core 2 Duo Nehalem might behave may be...
Post 05 Apr 2008, 14:18
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4240
Location: 2018
edfed
look at teh three videos on this link, you'll see some mandelbrot zoom very interresting.
http://perso.numericable.fr/haasjn/haasjn/
Wink
Post 10 Apr 2008, 20:20
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
...nice movies @edfed...!

Finally I gave the FPU-version also a big push, encouraged by what Xorpd! said to apply the same techniques to that shitty stack design...

So I tried the following:
1) 2 points iteration in inner loop, seperate exits from the loop
2) 2 points iteration in inner loop, loop unroling 1 time, seperate exits from the loop
3) 3 points iteration in inner loop, seperate exits from the loop

I tried all on a AMD Sempron, Core 2 Duo, P4 (no HT). With interesting overall results, gains written in percentage compared to the old 1 point iteration loop:

AMD Sempron 1800 MHz
1) + 39 % 2) + 45 % 3) + 32 %

Core 2 Duo, 1867 MHz
1) + 48 % 2) + 67 % 3) + 25 %

P4, no HT, 2667 MHz
1) + 13 % 2) + 90 % 3) + 69 %

So the clear winner is Version 2. 3 points only helped the P4 due to it's shitty long pipeline design and gives another hint why they implemented HT at that time. Still looking for P4 HT results. Anyone ?

The problem with the 3 points loop is also that I couldn't use the fast FCOMIP instruction any more directly (lack of registers) as that one is not capable of using a memory operand, so I had to do a FLD before...

Anyway, I'm quite happy that some lessons learned for SSE2 gave the FPU version finally a boost. Find the Version 2) attached here. I will update my website soon also with it. Any comments welcome like always.


Description:
Download
Filename: KMB_V0.53G-32b-MT.zip
Filesize: 19.78 KB
Downloaded: 36 Time(s)

Post 12 Apr 2008, 15:38
View user's profile Send private message Visit poster's website Reply with quote
AlexP



Joined: 14 Nov 2007
Posts: 561
Location: Out the window. Yes, that one.
AlexP
Smile THat FPU program is pretty cool, unfortunately under Vista 32-bit it screwed up my computer Sad. THe taskbar is totally missing, windows are off the screen, Smile I'll reboot and be right back...

[edit]: I got 969.453 for the SSE2, 304.922 for the FPU
Post 12 Apr 2008, 16:31
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Q9660@3.0GHz: FPU: 1261, SSE2: 3976.
Post 12 Apr 2008, 16:44
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
AlexP wrote:
Smile THat FPU program is pretty cool, unfortunately under Vista 32-bit it screwed up my computer Sad. THe taskbar is totally missing, windows are off the screen, Smile I'll reboot and be right back...

[edit]: I got 969.453 for the SSE2, 304.922 for the FPU

Ups...but now it's working !? Hopefully no other harm...what is your CPU and MHZ ? Anyone with a Pentium-M here or other CPU's ?
Post 12 Apr 2008, 18:57
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4240
Location: 2018
edfed
pentium III M. is it good?
Post 12 Apr 2008, 19:16
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3 ... 10, 11, 12 ... 18, 19, 20  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.