flat assembler
Message board for the users of flat assembler.

Index > Projects and Ideas > Procbench - Multiplatform CPU benchmark in FASM

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
Octavio



Joined: 21 Jun 2003
Posts: 366
Location: Spain
Octavio
> back5:
> mov dword [buffer], ecx
> mov eax, dword [buffer]
> loop back5

Note that the 'loop' instruction requires many cpu clocks ,and modern
computers can execute many instruccions at a time, so the measuring method is not good.

> get_cpu_speed:
> mov esi, 100
> call sleep2
> dw 310Fh ;rdtsc
wich kind of assembler are you using?
fasm supports rdtsc.
Post 04 Jun 2006, 09:10
View user's profile Send private message Visit poster's website Reply with quote
donkey7



Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska
donkey7
Quote:

Note that the 'loop' instruction requires many cpu clocks

only on intel processors. amd recommends loop label over dec ecx, jzn label, because it's faster and shorter.
Quote:

modern computers can execute many instruccions at a time

only if they are indepedent and there are free computation units (eg. there can be only one div at a time).
Post 04 Jun 2006, 09:33
View user's profile Send private message Visit poster's website Reply with quote
kuscsikp



Joined: 07 May 2006
Posts: 19
kuscsikp
Hi all!
New version has been added!
Version 0.32
https://developer.berlios.de/project/showfiles.php?group_id=6505

In the newest version i am using this code:

mov ecx, XXX
back:
call add32_start
loop back

add32_start:
repeat 1000
add eax, ebx
add ebx, eax
end repeat
ret

So, 1 call
2000 times add
and 1 loop


The measuring method is not precize but good. (I think)
+/- 1 percent

In this case the computer can not execute many instructions at a time,
because the values are computed recursively:
eax=eax+ebx
ebx=ebx+eax

If someone know a better method, please, tell me!!!!

> get_cpu_speed:
> mov esi, 100
> call sleep2
> dw 310Fh ;rdtsc

I am using Fasm, dw 310Fh is a bad habitude Smile


Last edited by kuscsikp on 04 Jun 2006, 12:18; edited 1 time in total
Post 04 Jun 2006, 12:05
View user's profile Send private message ICQ Number Reply with quote
WiESi



Joined: 15 May 2006
Posts: 14
Location: Austria
WiESi
On P4, 2.8 GHz this were my old results (0.31):
16 bit addition [million/sec] : 5586
32 bit addition [million/sec] : 5571
16 bit multiply [million/sec] : 185
32 bit multiply [million/sec] : 199
RAM read test [mill DW/sec] : 2789
RAM write test [mill DW/sec] : 1581
Stack [mill of push&pop/sec] : 1883
FPU Additions [100 000/sec] : 27
FPU Multiply [100 000/sec] : 27
FPU Square root [10 000/sec] : 160
FPU Sinus [10 000/sec] : 160

And this with this new version (0.32):
16 bit addition [million/sec] : 5571
32 bit addition [million/sec] : 5586
16 bit multiply [million/sec] : 185
32 bit multiply [million/sec] : 199
RAM read test [mill DW/sec] : 2785
RAM write test [mill DW/sec] : 1582
Stack [mill of push&pop/sec] : 1883
FPU Additions [100 000/sec] : 27
FPU Multiply [100 000/sec] : 26
FPU Square root [10 000/sec] : 160
FPU Sinus [10 000/sec] : 160

So in fact nothing has changed.
Post 04 Jun 2006, 12:49
View user's profile Send private message Visit poster's website Reply with quote
kuscsikp



Joined: 07 May 2006
Posts: 19
kuscsikp
At benchmarks nothing,
I am waiting for good ideas Very Happy
Post 04 Jun 2006, 14:19
View user's profile Send private message ICQ Number Reply with quote
WiESi



Joined: 15 May 2006
Posts: 14
Location: Austria
WiESi
What about SSE?
Post 04 Jun 2006, 18:02
View user's profile Send private message Visit poster's website Reply with quote
kuscsikp



Joined: 07 May 2006
Posts: 19
kuscsikp
In July the benchmarks will be completely rewritten.
Some SSE(2) will be also added:)
Post 08 Jun 2006, 11:24
View user's profile Send private message ICQ Number Reply with quote
WytRaven



Joined: 08 Sep 2004
Posts: 45
Location: Canberra, Australia
WytRaven
Basic CPUID info:
~~~~~~~~~~~~~~~~~
Vendor : AuthenticAMD
Family : 6
Model : 10
Revision : 0
Name : AMD Athlon(TM) MP 2800+
Features : fpu vme de pse tsc msr pae mce cxchg8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse


Data TLB (2 MB and 4 MB pages):4-way set associative, 8 entries
Instruction TLB (2 MB and 4 MB pages): Fully associative, 8 entries
Data TLB (4 KB pages): Fully associative, 32 entries
Instruction TLB (4 KB pages): Fully associative, 16 entries
1st-level instr cache: 64 KBytes, 2-way set associative, 64 byte line size
1st-level data cache: 64 KBytes, 2-way set associative, 64 byte line size
2nd-level cache: 512 KBytes, 8-way set associative, 64 byte line size

Please wait!!!

Frequency [MHz]: 2133
16 bit addition [million/sec] : 2134
32 bit addition [million/sec] : 2136
16 bit multiply [million/sec] : 711
32 bit multiply [million/sec] : 533
RAM read test [mill DW/sec] : 4008
RAM write test [mill DW/sec] : 2136
Stack [mill of push&pop/sec] : 2134
FPU Additions [100 000/sec] : 4282
FPU Multiply [100 000/sec] : 4008
FPU Square root [10 000/sec] : 2288
FPU Sinus [10 000/sec] : 2328

_________________
All your opcodes are belong to us
Post 12 Jun 2006, 12:00
View user's profile Send private message Reply with quote
kuscsikp



Joined: 07 May 2006
Posts: 19
kuscsikp
Hi all!
I have rewritten some parts of this program.
Please, post some results!
(or send it : kuscsikp (a) g m a i l . cóm )
Thanks!


Description:
Download
Filename: procb-full-v0.5alpha.zip
Filesize: 50.9 KB
Downloaded: 341 Time(s)

Post 10 Jul 2006, 09:05
View user's profile Send private message ICQ Number Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7106
Location: Slovakia
vid
i had to restart my machine after running it for about 3 mins - so rather save your data before running Wink
Post 10 Jul 2006, 20:17
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
kuscsikp



Joined: 07 May 2006
Posts: 19
kuscsikp
Average test length is nearly 1 minute.
If you have a too old CPU, it can slow down your computer for minutes.
Yes, it is still in alpha phase. So, be patient. And if you a run-time error,
report it!

Donkey7 have wrote:
..."Only on intel processors. amd recommends loop label over dec ecx, jzn label, because it's faster and shorter. "...

dec ecx, jzn label is always faster!!!
Now, i have dec ecx, jzn...
Post 10 Jul 2006, 20:32
View user's profile Send private message ICQ Number Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7106
Location: Slovakia
vid
(i believe it's "jnz") Very Happy Wink
Post 11 Jul 2006, 06:45
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
kuscsikp



Joined: 07 May 2006
Posts: 19
kuscsikp
Thanks! It was important! Wink
Post 11 Jul 2006, 07:25
View user's profile Send private message ICQ Number Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7106
Location: Slovakia
vid
you're welcome, just as important as "dec ecx, jzn label is always faster!!! Now, i have dec ecx, jzn..."
Post 11 Jul 2006, 16:51
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
kuscsikp



Joined: 07 May 2006
Posts: 19
kuscsikp
Octavio wrote:
" 'loop' instruction requires many cpu clocks, so the measuring method is not good."

Donkey7 wrote:
"Amd recommends loop label over dec ecx, jzn label, because it's faster and shorter"

I have tested it on AMD cpus, and the loop is slower than the "dec ecx, jnz", so i have replaced the loop with jnz (in my code).
Yes, it is important, for Octavio, for Donkey7, for me!
Post 11 Jul 2006, 17:45
View user's profile Send private message ICQ Number Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Okay, well, sorry, I'm not exactly able to close all processes on this computer at this time (and results probably aren't too useful, just a plain ol' common P4 2.52Ghz). Of course, it did seem to crash the computer (no response to keys, no movement of mouse) when it got to the 32-bit SSE part, but I ran away to the other room (played with old DOS computer, heh) to see if it'd finally come to its senses after a few minutes ... and it did. I wouldn't consider this computer "too old" though. Razz

<EDIT> My P4's results below: Smile </EDIT>

Procbench V0.5 Alpha, Peter Kuscsik, 2006-07-

Basic CPUID info:
~~~~~~~~~~~~~~~~~
Vendor : GenuineIntel
Family : 15
Model : 2
Revision : 4
Name : Intel(R) Pentium(R) 4 CPU 2.53GHz
Features : fpu vme de pse tsc msr pae mce cxchg8 apic sep mtrr pge mca cmov pat pse36 clfl dtes acpi mmx fxsr sse sse2 ss htt tm1

Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 64 entries
Data TLB: 4 KByte and 4 MByte pages, 64 entries
1st-level data cache: 8 KByte, 4-way set associative, 64 byte line size
2nd-level cache: 512 KByte, 8-way set associative, 64 byte line size, 2 lines per sector
No 2nd-level cache or, if processor contains a valid 2nd-level cache, no 3rd-level cache
Trace cache: 12 K-µop, 8-way set associative

Benchmarks
~~~~~~~~~~
Frequency [MHz]: 2519.135

Speed of registers measured by add instructions via 1, 2, 3 and 4 registers.
Speeds adding to 1 Register 2 Registers 3 Registers 4 Registers
16 bit Integer MIPS 5454 6451 6382 7692
32 bit Integer MIPS 4800 7692 6382 7692
32 bit MMX Integer MIPS 1237 2564 2400 2564
64 bit FPU MFLOPS ---- ---- ---- ----
32 bit 3DNow MFLOPS ---- ---- ---- ----
32 bit SSE MFLOPS 1 1 1 2
64 bit SSE2 MFLOPS 629 1239 1237 1282

Memory read performance test :
Read Buffer Speed[MB/s] Read Buffer Speed[MB/s]
4 KBytes 18181 8 KBytes 18348
16 KBytes 9852 32 KBytes 9852
64 KBytes 9803 128 KBytes 9852
256 KBytes 9852 512 KBytes 8000
1 MByte 1939 2 MBytes 1910
4 MBytes 1910

Generating:
1. Random numbers [200mills] : 0.547 seconds
2. Fibonicci numbers [200mills] : 0.172 seconds
4. Cycle with Loop [500mill times] : 0.390 seconds
5. Cycle with Jump [500mill times] : 0.297 seconds


Last edited by rugxulo on 19 Jul 2006, 20:39; edited 1 time in total
Post 11 Jul 2006, 18:57
View user's profile Send private message Visit poster's website Reply with quote
kuscsikp



Joined: 07 May 2006
Posts: 19
kuscsikp
I know what is the problem. On some machines
the SSE and/or SSE2 is too slow.
Here is two example:
http://www.ocforums.com/attachment.php?attachmentid=51567&d=1152892208
http://www.ocforums.com/attachment.php?attachmentid=51566&d=1152892200
So, in the latest version, in 0.51 I have deleted the SSE benchmarks.
I will rewrite them.
+Added a prime generator. Thanks to Garthower:)
Post 12 Jul 2006, 09:44
View user's profile Send private message ICQ Number Reply with quote
kuscsikp



Joined: 07 May 2006
Posts: 19
kuscsikp
Post 09 Aug 2006, 08:57
View user's profile Send private message ICQ Number Reply with quote
penang



Joined: 01 Oct 2004
Posts: 59
penang
Here's the result of my old Pentium-D


Description:
Download
Filename: Result.txt
Filesize: 2.08 KB
Downloaded: 254 Time(s)

Post 05 May 2008, 11:14
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4215
Location: 2018
edfed
Page Fault:
Code:
PROCB a causé une défaillance de page dans
 le module PROCB.EXE à 015f:004012ca.
Registres :
EAX=00000003 CS=015f EIP=004012ca EFLGS=00010246
EBX=756e6547 SS=0167 ESP=0096fe24 EBP=0096ff78
ECX=6c65746e DS=0167 ESI=00000064 FS=2997
EDX=7ffe0000 ES=0167 EDI=00000000 GS=0000
Octets à CS : EIP :
8b 02 f7 62 04 0f ac d0 18 5a c3 53 51 68 f0 5e 
État de la pile :
49656e69 00401333 00000003 00402774 0040107d 00401005 bff8b537 00000000 819d0d78 00950000 636f7250 58450062 819d0045 00950000 636f7270 65780062     


Crying or Very sad
Post 05 May 2008, 11:31
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2019, Tomasz Grysztar.

Powered by rwasa.