flat assembler
Message board for the users of flat assembler.
Index
> Projects and Ideas > Procbench - Multiplatform CPU benchmark in FASM Goto page Previous 1, 2 |
Author |
|
donkey7 04 Jun 2006, 09:33
Quote:
only on intel processors. amd recommends loop label over dec ecx, jzn label, because it's faster and shorter. Quote:
only if they are indepedent and there are free computation units (eg. there can be only one div at a time). |
|||
04 Jun 2006, 09:33 |
|
kuscsikp 04 Jun 2006, 12:05
Hi all!
New version has been added! Version 0.32 https://developer.berlios.de/project/showfiles.php?group_id=6505 In the newest version i am using this code: mov ecx, XXX back: call add32_start loop back add32_start: repeat 1000 add eax, ebx add ebx, eax end repeat ret So, 1 call 2000 times add and 1 loop The measuring method is not precize but good. (I think) +/- 1 percent In this case the computer can not execute many instructions at a time, because the values are computed recursively: eax=eax+ebx ebx=ebx+eax If someone know a better method, please, tell me!!!! > get_cpu_speed: > mov esi, 100 > call sleep2 > dw 310Fh ;rdtsc I am using Fasm, dw 310Fh is a bad habitude Last edited by kuscsikp on 04 Jun 2006, 12:18; edited 1 time in total |
|||
04 Jun 2006, 12:05 |
|
WiESi 04 Jun 2006, 12:49
On P4, 2.8 GHz this were my old results (0.31):
16 bit addition [million/sec] : 5586 32 bit addition [million/sec] : 5571 16 bit multiply [million/sec] : 185 32 bit multiply [million/sec] : 199 RAM read test [mill DW/sec] : 2789 RAM write test [mill DW/sec] : 1581 Stack [mill of push&pop/sec] : 1883 FPU Additions [100 000/sec] : 27 FPU Multiply [100 000/sec] : 27 FPU Square root [10 000/sec] : 160 FPU Sinus [10 000/sec] : 160 And this with this new version (0.32): 16 bit addition [million/sec] : 5571 32 bit addition [million/sec] : 5586 16 bit multiply [million/sec] : 185 32 bit multiply [million/sec] : 199 RAM read test [mill DW/sec] : 2785 RAM write test [mill DW/sec] : 1582 Stack [mill of push&pop/sec] : 1883 FPU Additions [100 000/sec] : 27 FPU Multiply [100 000/sec] : 26 FPU Square root [10 000/sec] : 160 FPU Sinus [10 000/sec] : 160 So in fact nothing has changed. |
|||
04 Jun 2006, 12:49 |
|
kuscsikp 04 Jun 2006, 14:19
At benchmarks nothing,
I am waiting for good ideas |
|||
04 Jun 2006, 14:19 |
|
WiESi 04 Jun 2006, 18:02
What about SSE?
|
|||
04 Jun 2006, 18:02 |
|
kuscsikp 08 Jun 2006, 11:24
In July the benchmarks will be completely rewritten.
Some SSE(2) will be also added:) |
|||
08 Jun 2006, 11:24 |
|
WytRaven 12 Jun 2006, 12:00
Basic CPUID info:
~~~~~~~~~~~~~~~~~ Vendor : AuthenticAMD Family : 6 Model : 10 Revision : 0 Name : AMD Athlon(TM) MP 2800+ Features : fpu vme de pse tsc msr pae mce cxchg8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse Data TLB (2 MB and 4 MB pages):4-way set associative, 8 entries Instruction TLB (2 MB and 4 MB pages): Fully associative, 8 entries Data TLB (4 KB pages): Fully associative, 32 entries Instruction TLB (4 KB pages): Fully associative, 16 entries 1st-level instr cache: 64 KBytes, 2-way set associative, 64 byte line size 1st-level data cache: 64 KBytes, 2-way set associative, 64 byte line size 2nd-level cache: 512 KBytes, 8-way set associative, 64 byte line size Please wait!!! Frequency [MHz]: 2133 16 bit addition [million/sec] : 2134 32 bit addition [million/sec] : 2136 16 bit multiply [million/sec] : 711 32 bit multiply [million/sec] : 533 RAM read test [mill DW/sec] : 4008 RAM write test [mill DW/sec] : 2136 Stack [mill of push&pop/sec] : 2134 FPU Additions [100 000/sec] : 4282 FPU Multiply [100 000/sec] : 4008 FPU Square root [10 000/sec] : 2288 FPU Sinus [10 000/sec] : 2328 _________________ All your opcodes are belong to us |
|||
12 Jun 2006, 12:00 |
|
kuscsikp 10 Jul 2006, 09:05
Hi all!
I have rewritten some parts of this program. Please, post some results! (or send it : kuscsikp (a) g m a i l . cóm ) Thanks!
|
|||||||||||
10 Jul 2006, 09:05 |
|
vid 10 Jul 2006, 20:17
i had to restart my machine after running it for about 3 mins - so rather save your data before running
|
|||
10 Jul 2006, 20:17 |
|
kuscsikp 10 Jul 2006, 20:32
Average test length is nearly 1 minute.
If you have a too old CPU, it can slow down your computer for minutes. Yes, it is still in alpha phase. So, be patient. And if you a run-time error, report it! Donkey7 have wrote: ..."Only on intel processors. amd recommends loop label over dec ecx, jzn label, because it's faster and shorter. "... dec ecx, jzn label is always faster!!! Now, i have dec ecx, jzn... |
|||
10 Jul 2006, 20:32 |
|
vid 11 Jul 2006, 06:45
(i believe it's "jnz")
|
|||
11 Jul 2006, 06:45 |
|
kuscsikp 11 Jul 2006, 07:25
Thanks! It was important!
|
|||
11 Jul 2006, 07:25 |
|
vid 11 Jul 2006, 16:51
you're welcome, just as important as "dec ecx, jzn label is always faster!!! Now, i have dec ecx, jzn..."
|
|||
11 Jul 2006, 16:51 |
|
kuscsikp 11 Jul 2006, 17:45
Octavio wrote:
" 'loop' instruction requires many cpu clocks, so the measuring method is not good." Donkey7 wrote: "Amd recommends loop label over dec ecx, jzn label, because it's faster and shorter" I have tested it on AMD cpus, and the loop is slower than the "dec ecx, jnz", so i have replaced the loop with jnz (in my code). Yes, it is important, for Octavio, for Donkey7, for me! |
|||
11 Jul 2006, 17:45 |
|
rugxulo 11 Jul 2006, 18:57
Okay, well, sorry, I'm not exactly able to close all processes on this computer at this time (and results probably aren't too useful, just a plain ol' common P4 2.52Ghz). Of course, it did seem to crash the computer (no response to keys, no movement of mouse) when it got to the 32-bit SSE part, but I ran away to the other room (played with old DOS computer, heh) to see if it'd finally come to its senses after a few minutes ... and it did. I wouldn't consider this computer "too old" though.
<EDIT> My P4's results below: </EDIT> Procbench V0.5 Alpha, Peter Kuscsik, 2006-07- Basic CPUID info: ~~~~~~~~~~~~~~~~~ Vendor : GenuineIntel Family : 15 Model : 2 Revision : 4 Name : Intel(R) Pentium(R) 4 CPU 2.53GHz Features : fpu vme de pse tsc msr pae mce cxchg8 apic sep mtrr pge mca cmov pat pse36 clfl dtes acpi mmx fxsr sse sse2 ss htt tm1 Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 64 entries Data TLB: 4 KByte and 4 MByte pages, 64 entries 1st-level data cache: 8 KByte, 4-way set associative, 64 byte line size 2nd-level cache: 512 KByte, 8-way set associative, 64 byte line size, 2 lines per sector No 2nd-level cache or, if processor contains a valid 2nd-level cache, no 3rd-level cache Trace cache: 12 K-µop, 8-way set associative Benchmarks ~~~~~~~~~~ Frequency [MHz]: 2519.135 Speed of registers measured by add instructions via 1, 2, 3 and 4 registers. Speeds adding to 1 Register 2 Registers 3 Registers 4 Registers 16 bit Integer MIPS 5454 6451 6382 7692 32 bit Integer MIPS 4800 7692 6382 7692 32 bit MMX Integer MIPS 1237 2564 2400 2564 64 bit FPU MFLOPS ---- ---- ---- ---- 32 bit 3DNow MFLOPS ---- ---- ---- ---- 32 bit SSE MFLOPS 1 1 1 2 64 bit SSE2 MFLOPS 629 1239 1237 1282 Memory read performance test : Read Buffer Speed[MB/s] Read Buffer Speed[MB/s] 4 KBytes 18181 8 KBytes 18348 16 KBytes 9852 32 KBytes 9852 64 KBytes 9803 128 KBytes 9852 256 KBytes 9852 512 KBytes 8000 1 MByte 1939 2 MBytes 1910 4 MBytes 1910 Generating: 1. Random numbers [200mills] : 0.547 seconds 2. Fibonicci numbers [200mills] : 0.172 seconds 4. Cycle with Loop [500mill times] : 0.390 seconds 5. Cycle with Jump [500mill times] : 0.297 seconds Last edited by rugxulo on 19 Jul 2006, 20:39; edited 1 time in total |
|||
11 Jul 2006, 18:57 |
|
kuscsikp 12 Jul 2006, 09:44
I know what is the problem. On some machines
the SSE and/or SSE2 is too slow. Here is two example: http://www.ocforums.com/attachment.php?attachmentid=51567&d=1152892208 http://www.ocforums.com/attachment.php?attachmentid=51566&d=1152892200 So, in the latest version, in 0.51 I have deleted the SSE benchmarks. I will rewrite them. +Added a prime generator. Thanks to Garthower:) |
|||
12 Jul 2006, 09:44 |
|
kuscsikp 09 Aug 2006, 08:57
New version available at
http://prdownload.berlios.de/procbench/procb-v0.61-full.zip you can upload the results at the http://158.197.33.91/~kuscsikp/php/upload.php Thanks! |
|||
09 Aug 2006, 08:57 |
|
penang 05 May 2008, 11:14
Here's the result of my old Pentium-D
|
|||||||||||
05 May 2008, 11:14 |
|
edfed 05 May 2008, 11:31
Page Fault:
Code: PROCB a causé une défaillance de page dans le module PROCB.EXE à 015f:004012ca. Registres : EAX=00000003 CS=015f EIP=004012ca EFLGS=00010246 EBX=756e6547 SS=0167 ESP=0096fe24 EBP=0096ff78 ECX=6c65746e DS=0167 ESI=00000064 FS=2997 EDX=7ffe0000 ES=0167 EDI=00000000 GS=0000 Octets à CS : EIP : 8b 02 f7 62 04 0f ac d0 18 5a c3 53 51 68 f0 5e État de la pile : 49656e69 00401333 00000003 00402774 0040107d 00401005 bff8b537 00000000 819d0d78 00950000 636f7250 58450062 819d0045 00950000 636f7270 65780062 |
|||
05 May 2008, 11:31 |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.