flat assembler
Message board for the users of flat assembler.
Index
> Windows > Speed test |
Author |
|
revolution 07 Nov 2013, 16:41
Your float timings for single/double/extended are all comparing the same operation. You will need to change the control word (FLDCW) to set the precision. And your memory read tests are only testing the cache timing for L1.
|
|||
07 Nov 2013, 16:41 |
|
HaHaAnonymous 07 Nov 2013, 17:48
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 19:10; edited 1 time in total |
|||
07 Nov 2013, 17:48 |
|
A$M 07 Nov 2013, 23:52
revolution wrote: Your float timings for single/double/extended are all comparing the same operation. You will need to change the control word (FLDCW) to set the precision. And your memory read tests are only testing the cache timing for L1. The new code: Code: format PE CONSOLE 4.0 include 'win32a.inc' cinvoke system, _title cinvoke system, _pause cinvoke printf, _nl cinvoke printf, _test1 mov ecx, 1000000000 invoke GetTickCount mov ebx, eax @@: add ecx, edx loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _test2 mov ecx, 1000000000 invoke GetTickCount mov ebx, eax @@: sub ecx, edx loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _test3 mov ecx, 1000000000 invoke GetTickCount mov ebx, eax @@: imul ebx loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _test4 mov edx, 0 mov ecx, 1000000000 invoke GetTickCount mov ebx, eax mov eax, 0 @@: idiv ebx loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _nl cinvoke printf, _test5 mov ecx, 1000000000 invoke GetTickCount mov ebx, eax @@: mov al, [_test1] loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _test6 mov ecx, 1000000000 invoke GetTickCount mov ebx, eax @@: mov ax, word[_test1] loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _test7 mov ecx, 1000000000 invoke GetTickCount mov ebx, eax @@: mov eax, dword[_test1] loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _nl cinvoke printf, _test8 finit fstcw [_cword] xor [_cword], 1100000000b fldcw [_cword] mov ecx, 1000000000 invoke GetTickCount mov ebx, eax fld [_float] fld [_float] @@: fadd st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _test9 finit fstcw [_cword] xor [_cword], 100000000b fldcw [_cword] mov ecx, 1000000000 invoke GetTickCount mov ebx, eax fld [_double] fld [_double] @@: fadd st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _test10 finit mov ecx, 1000000000 invoke GetTickCount mov ebx, eax fld [_longd] fld [_longd] @@: fadd st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _nl cinvoke printf, _test11 finit fstcw [_cword] xor [_cword], 1100000000b fldcw [_cword] mov ecx, 1000000000 invoke GetTickCount mov ebx, eax fld [_float] fld [_float] @@: fsub st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _test12 finit fstcw [_cword] xor [_cword], 100000000b fldcw [_cword] mov ecx, 1000000000 invoke GetTickCount mov ebx, eax fld [_double] fld [_double] @@: fsub st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _test13 finit mov ecx, 1000000000 invoke GetTickCount mov ebx, eax fld [_longd] fld [_longd] @@: fsub st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done, eax cinvoke printf, _nl cinvoke printf, _test14 finit fstcw [_cword] xor [_cword], 1100000000b fldcw [_cword] mov ecx, 10000000 invoke GetTickCount mov ebx, eax fld [_float] fld [_float] @@: fmul st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done2, eax cinvoke printf, _test15 finit fstcw [_cword] xor [_cword], 100000000b fldcw [_cword] mov ecx, 10000000 invoke GetTickCount mov ebx, eax fld [_double] fld [_double] @@: fmul st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done2, eax cinvoke printf, _test16 finit mov ecx, 10000000 invoke GetTickCount mov ebx, eax fld [_longd] fld [_longd] @@: fmul st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done2, eax cinvoke printf, _nl cinvoke printf, _test17 finit fstcw [_cword] xor [_cword], 1100000000b fldcw [_cword] mov ecx, 10000000 invoke GetTickCount mov ebx, eax fld [_float] fld [_float] @@: fdiv st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done2, eax cinvoke printf, _test18 finit fstcw [_cword] xor [_cword], 100000000b fldcw [_cword] mov ecx, 10000000 invoke GetTickCount mov ebx, eax fld [_double] fld [_double] @@: fdiv st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done2, eax cinvoke printf, _test19 finit mov ecx, 10000000 invoke GetTickCount mov ebx, eax fld [_longd] fld [_longd] @@: fdiv st0, st1 loop @b invoke GetTickCount sub eax, ebx cinvoke printf, _done2, eax cinvoke printf, _nl cinvoke system, _pause invoke ExitProcess, 0 _title db "TITLE Speed Test",0 _pause db "PAUSE",0 _test1 db "LONG INT ADD... ",0 _test2 db "LONG INT SUBTRACTION... ",0 _test3 db "LONG INT MULTIPLICATION... ",0 _test4 db "LONG INT DIVISION... ",0 _test5 db "MOVE BYTE/CHAR IN MEMORY... ",0 _test6 db "MOVE WORD/INT IN MEMORY... ",0 _test7 db "MOVE DOUBLE WORD/LONG INT IN MEMORY... ",0 _test8 db "FLOAT ADD... ",0 _test9 db "DOUBLE ADD... ",0 _test10 db "LONG DOUBLE ADD... ",0 _test11 db "FLOAT SUBTRACTION... ",0 _test12 db "DOUBLE SUBTRACTION... ",0 _test13 db "LONG DOUBLE SUBTRACTION... ",0 _test14 db "FLOAT MULTIPLICATION... ",0 _test15 db "DOUBLE MULTIPLICATION... ",0 _test16 db "LONG DOUBLE MULTIPLICATION... ",0 _test17 db "FLOAT DIVISION... ",0 _test18 db "DOUBLE DIVISION... ",0 _test19 db "LONG DOUBLE DIVISION... ",0 _done db "1 billion operations made in %i milliseconds.",10,13,0 _done2 db "10 million operations made in %i milliseconds.",10,13,0 _failed db "Failed!" _nl db 10,13,0 _cword dw ? _float dd 1.2345 _double dq 1.2345 _longd dt 1.2345 data import library kernel32,'KERNEL32.DLL',\ crtdll,'CRTDLL.DLL' import kernel32,\ GetTickCount,'GetTickCount',\ ExitProcess,'ExitProcess' import crtdll,\ printf,'printf',\ system,'system' end data New results, but only division really change: Quote: Pressione qualquer tecla para continuar. . . MOVE (...) IN MEMORY tests are not important (for me). PS.: Please, post your results. |
|||
07 Nov 2013, 23:52 |
|
revolution 08 Nov 2013, 00:24
When you divide zero (or small numbers) by something there is special circuitry that will complete the instruction quickly. You might want to ensure that the numerator is always non-zero to get a better idea of the long execution nature of division.
|
|||
08 Nov 2013, 00:24 |
|
HaHaAnonymous 08 Nov 2013, 12:46
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 19:09; edited 1 time in total |
|||
08 Nov 2013, 12:46 |
|
revolution 08 Nov 2013, 12:49
HaHaAnonymous what is your CPU?
|
|||
08 Nov 2013, 12:49 |
|
HaHaAnonymous 08 Nov 2013, 13:36
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 19:09; edited 1 time in total |
|||
08 Nov 2013, 13:36 |
|
revolution 08 Nov 2013, 13:48
The bug is here:
Code: mov ecx, 10000000 invoke GetTickCount |
|||
08 Nov 2013, 13:48 |
|
cod3b453 08 Nov 2013, 18:25
The FPU only differentiates between single/double/extended for load/stores; arithmetic has the same internal precision so there should be no difference.
Also it is likely that the tight loop is the dominating factor and preventing maximum potential pipeline utilisation - I believe Intel CPUs should be able to perform two non-interlocked/dependent adds in the same cycle time as one mul for example (I may be wrong) |
|||
08 Nov 2013, 18:25 |
|
Xorpd! 10 Nov 2013, 09:05
Your processor is capable of something like 38.4 GFLOPS at double precision. As revolution points out above you have to stop clobbering ecx (among other things) to get any information at all out of your test.
|
|||
10 Nov 2013, 09:05 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.