flat assembler
Message board for the users of flat assembler.
![]() Goto page 1, 2 Next |
Author |
|
Fanael
As usual - it depends upon what CPU are you optimizing for (and upon several other things, also).
|
|||
![]() |
|
edfed
you can also make this:
Code: xor ax,eax mov ecx,Y/2 A: stosd stosd loop A: or Code: xor eax,eax mov ecx,Y rep stosd i believe that the faster is to do it with only one instruction. with rep prefix. i will compare it right now and give feedback |
|||
![]() |
|
ouadji
Quote: i will compare it right now and give feedback |
|||
![]() |
|
edfed
mov qword[edi],mm1 doesn't compile.
|
|||
![]() |
|
ouadji
not "mov" ... but "movq" |
|||
![]() |
|
edfed
i have results.
the faster on my PIII is test1 (~1180 clocks). test 3 is a little slower (~1260clocks), and test2 is twice slower (~2200 clocks). Code: X: Y=10h rd Y*2 @@: db 'test1',0 align 4 dd test1.size dd @b test1: xor eax,eax mov edi,X mov ecx,Y @@: stosd loop @b .size=$-test1 mov eax,0 ret ;------------------------ @@: db 'test2',0 align 4 dd test2.size dd @b test2: xor eax,eax movd mm1,eax mov edi,X mov ecx,Y/2 @@: movq [edi],mm1 add edi,8 loop @b .size= $-test2 mov eax,0 ret ;------------------------ @@: db 'test3',0 align 4 dd test3.size dd @b Y=10h test3: xor eax,eax mov edi,X mov ecx,Y rep stosd .size=$-test1 mov eax,0 ret ;------------------------ |
|||
![]() |
|
ouadji
surprising result (thank you edfed) "loop" is faster than "rep" ... and regarding "movq/mm1", we forget that! |
|||
![]() |
|
edfed
there are many things to take in consideration.
the fat that we write to ES, and ES is the video memory in my test application. the fact that the buffer is only 10 dwords. and many other things. but i am really sure when i say test1 is relativelly the faster. [edit] but, i just tested with 100 dwords, and test3 becomes the faster. [edit1] but i tested with CPU overloaded, and MMX is faster. as a conclusion, a short table should use test1 (loop), a medium table use test3 (rep), and a long table use test2(mmx). |
|||
![]() |
|
revolution
Testing for speed on a single CPU with a single test and one test set over a short run is rather pointless.
|
|||
![]() |
|
edfed
but very interresting.
many things are deveiled by the time capture, and scroll the plotted results. it shows many interactions with the real state of the machine. ![]() i correct, test3 is the faster. and the difference is huge. test3 = 500 clocks, test1 = 2500 clocks, test2 = 2700 it really depends on the load of the machine, it is interresting to see that an algorythm can be influenced by the rest of the system. |
|||
![]() |
|
ouadji
This final results seem rather logical. The test3 is the only one with which the loop is performed inside the processor. Thank you edfed for all tests and results. |
|||
![]() |
|
edfed
then, a capture of the results.
![]() alternates test1, test2, test3 and test4 test 3 = lower plots, test 1 = higher plots. then, even it doesn't give absolute results, it gives a good idea of relative speed between several methods. i am currently trying to simplify the process, because for the moment, it is only a compile time application that needs this kind of implementation: Code: align 4 .list: List \ test1,\ test2,\ test3,\ test4 Y=100h X: rd Y*2 @@: db 'test1',0 align 4 dd test1.size dd @b test1: xor eax,eax mov edi,X mov ecx,Y @@: stosd loop @b .size=$-test1 mov eax,0 ret ;------------------------ @@: db 'test2',0 align 4 dd test2.size dd @b test2: xor eax,eax movd mm1,eax mov edi,X mov ecx,Y/2 @@: movq [es:edi],mm1 add edi,8 loop @b .size= $-test2 mov eax,0 ret ;------------------------ @@: db 'test3',0 align 4 dd test3.size dd @b test3: xor eax,eax mov edi,X mov ecx,Y rep stosd .size=$-test3 mov eax,0 ret ;------------------------ @@: db 'test4',0 align 4 dd test4.size dd @b test4: xor eax,eax mov edi,X mov ecx,Y @@: mov [es:edi],eax add edi,4 loop @b .size=$-test4 mov eax,0 ret ;------------------------ maybe a .bin include at load time is better. compile each snippet in .bin, with org 0 just reference them in a list of paths, at compile time, or at execution time with a command line, and then, begin the test???
|
||||||||||
![]() |
|
ouadji
great result edfed, good work. |
|||
![]() |
|
bitRAKE
Code: test8: pxor mm0,mm0 mov edi,X mov ecx,Y/2 @@: movntq [es:edi+ecx*8-8],mm0 loop @b ret test9: pxor xmm0,xmm0 mov edi,X mov ecx,Y/4 @@: movntdq [es:edi],xmm0 add edi,16 loop @b ret |
|||
![]() |
|
ouadji
I'd be interested to know the test9 result thank you edfed |
|||
![]() |
|
edfed
from what i see, i can say that test9 is not working properlly, but as i am not sure to have SSE2 on my PIII, i am not sure of the exact result.
you can test yourself on your machine if you want. i renamed test9, it is now test5. i didn't included test8 because it looks like the mmx solution in test2 Last edited by edfed on 24 Sep 2010, 19:38; edited 1 time in total |
|||
![]() |
|
ouadji
Sorry for this reply in french, to complex for me in english. Edfed, j'ai testé sur mon PC. C'est un 4 coeurs Q6600 sous XP pro. Je peux sélectionner les tests avec les flèches haut et bas, mais quand je dépasse "test4" le programme se ferme de lui même. Je n'ai donc pas eu accès à votre "test5", dommage ! |
|||
![]() |
|
edfed
ok, it means that you don't support sse2. lol. but j'ai des doutes. faudrait voir si ce n'est pas le système qui bloque l'utilisation de ce jeux d'instruction en mode dos.
i think you will need to test it with an other type of code. for example just test it in a PE console code instead of .com hem. i think the problem come form the use of CPUID in test9. i will correct it and post a new version. but i propose you to directlly compile my source after adding your own tests.
Last edited by edfed on 28 Feb 2011, 14:06; edited 1 time in total |
|||||||||||
![]() |
|
ouadji
Q6600 does not support sse2 ? avx ,d'accord, mais sse2 ? j'espère quand même "que oui" ... ! (j'ai un marteau juste à côté du PC) ![]() ps: no source, only a" file.com" |
|||
![]() |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.