flat assembler
Message board for the users of flat assembler.

Index > Windows > Mandelbrot Benchmark FPU/SSE2 released

Goto page Previous  1, 2, 3 ... 18, 19, 20
Author
Thread Post new topic Reply to topic
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 24 Jan 2011, 12:14
AFAIK the Linpack benchmark with AVX can show better than 80% improvement over SSE. It would be interesting to see what it could do for Mandelbrot.
Post 24 Jan 2011, 12:14
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 27 Mar 2011, 08:17
Has anyone updated this CPU burn to AVX support yet?
Post 27 Mar 2011, 08:17
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 28 Mar 2011, 09:19
yeah, that would be sweet!
Post 28 Mar 2011, 09:19
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 01 Apr 2011, 13:09
Ok! I'll see what I can do to modify the previously posted code.
I'll start a new thread in the projects section within the next week.
Post 01 Apr 2011, 13:09
View user's profile Send private message Reply with quote
kalambong



Joined: 08 Nov 2008
Posts: 165
kalambong 14 Mar 2012, 08:57
tthsqe wrote:
Ok! I'll see what I can do to modify the previously posted code.
I'll start a new thread in the projects section within the next week.



Can someone point me to the new thread that tthsqe mentioned above, please?

Thank you !

Since last discussion Win7 has been out of beta for quite some time, and Sandy Bridge is about to be replaced by Ivy Bridge in a few months

Wonder what has happened to the Mandelbrot benchmark? Has it been updated with AVX?
Post 14 Mar 2012, 08:57
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20301
Location: In your JS exploiting you and your system
revolution 14 Mar 2012, 09:05
kalambong wrote:
Can someone point me to the new thread that tthsqe mentioned above, please?
It appears to be this:

http://board.flatassembler.net/topic.php?p=127809#127809

It is the very next post by tthsqe after the one above.
Post 14 Mar 2012, 09:05
View user's profile Send private message Visit poster's website Reply with quote
kalambong



Joined: 08 Nov 2008
Posts: 165
kalambong 01 Jun 2012, 06:33
Thanks !!
Post 01 Jun 2012, 06:33
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 01 Jun 2012, 16:20
You need at least Win7 SP1 to use AVX, right? <sarcasm> Anyways, AVX is obsolete, AVX2 is teh real dealz!!! </sarcasm>
Post 01 Jun 2012, 16:20
View user's profile Send private message Visit poster's website Reply with quote
Vitor_boss



Joined: 04 Jun 2012
Posts: 3
Location: Brazil
Vitor_boss 04 Jun 2012, 01:16
Intel Core i3 M330 2130MHz
Cores: 2
Threads: 4
FPU: 528,867
SSE2: 1493,606
SSE4.1: 1511,689

EDIT: Using KMB V 0.53I-32b-MT
Post 04 Jun 2012, 01:16
View user's profile Send private message Reply with quote
Bernhard Schornak



Joined: 19 Dec 2009
Posts: 5
Location: Augsburg, Germany
Bernhard Schornak 21 Aug 2012, 09:07
AMD FX-8150 (3600 MHZ)

FPU : 1318.538
SSE2 : 4641.840
SSE4.1: 4711.906

Processors with one execution pipe per core greatly benefit from your code, because it performs multiple operations on one and the same register in a row. Hence, the results do not show how 'performant' a CPU/FPU combination really is.

Your current code simply puts additional execution pipes to sleep. In the case of the FX-8xxx, three out of four pipes are not fed with appropriate food most of the time, rendering them quite useless. Reordering instructions properly can gain much better results for processors with multiple execution pipes.
Post 21 Aug 2012, 09:07
View user's profile Send private message Send e-mail Reply with quote
Xorpd!



Joined: 21 Dec 2006
Posts: 161
Xorpd! 21 Aug 2012, 21:28
Serial operations on a single register is the path to using a big chunk of the physical register file given limited architected registers and out of order execution. In fact the numbers attained in this thread are not all that far off the maximum floating point throughput for pre-AVX processors.

The problem with Bulldozer is that an 8-core CPU only has 4 FPUs and you need to do as much work as possible using FMACs to attain maximum throughput. I am not sure that the code in the companion thread http://board.flatassembler.net/topic.php?p=127809#127809 does this; I think it's all written for Intel processors which would mean that it does not.

You just can't schedule 3 or 4 instruction streams, expecially in 32-bit mode, without counting on the out of order properties of the processor to interleave instruction streams for you. Maybe with 128 FP registers, but that is an expensive processor and doesn't provide SIMD as far as I know.
Post 21 Aug 2012, 21:28
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 25 Aug 2012, 02:30
hey - it looks like someone with a bulldozer chip!
Could you tell me how bulldozer performs on the program at http://board.flatassembler.net/topic.php?p=127809#127809?
You have to hit "R" to change the calculation path used.
Just explore a little bit and tell me the max GFLOPS (upper left) you encountered on each calculation path.
Post 25 Aug 2012, 02:30
View user's profile Send private message Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 27 Aug 2012, 16:57
I would be also curious if tthsqe's benchmark shows an improvement regarding the "Bulldozer" issue. My benchmark is let's say kind of obselete in times of AVX, but of course not everyone got an AVX chip yet so it's still relevant for those...

I can only state that my benchmark was never developed to favour either Intel or AMD. I tested reordering of instructions with old Semprons and Phenoms with almost no difference.

I remember when I got the help from Xorpd regarding the multiple instruction streams the Intel's with Core and Core 2 architecture just where so much faster out of the box, while AMD's didn't do much. Phenom just picked up the speed because (I think) doubleing the path at that time.

Either the out-of-order design or the overall amount of instruction units of the AMD's is just not as good as Intels.

As Xorpd stated with AVX fused multiply add instructions it might be a different story. My benchmark maybe reflects more the past and current -non-AVX software where I would say Intel's FPU/SSE is just superior.
Post 27 Aug 2012, 16:57
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3 ... 18, 19, 20

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.