Message board for the users of flat assembler.
> Linux > Benchmarking 64-bit and SSE 128-bit?
Inspired by OpenGL - Inverse alpha? topic of discussion, where Madis731 mentioned:
So, out of curiosity, I made two Linux x64 programs to compare the speed of memory copying in 64-bit (general register) and 128-bit (SSE, XMM registers).
(My CPU does not support AVX, so I would not be able to run test locally)
And this is the result! The output are verified correct, 8192KB of data transfer.
boo@debian:~/fasm$ diff -s x64.txt sse.txt Files x64.txt and sse.txt are identical
This is how I run the test on Linux:
boo@debian:~/fasm$ time ./x64 > x64.txt real 0m0.060s user 0m0.004s sys 0m0.035s boo@debian:~/fasm$ time ./sse > sse.txt real 0m0.044s user 0m0.001s sys 0m0.041s boo@debian:~/fasm$
SSE.asm is faster than x64.asm!!!
This is not the first time I perform benchmark test, but definitely is the first time using CPU extension set!
My code might be not optimum, so I need your advice for correct benchmarking, please.
; SSE Benchmark 1.0 ; ; Requires Pentium III series of CPU and above format ELF64 executable 3 segment readable executable entry $ lea rbx, [_msg] lea rdx, [_sse] movups xmm0,[rbx] ;copy unaligned double quad word from 128-bit memory location to register mov rcx, 8192000 .redo: movups [rdx+rcx-16],xmm0 ;copy unaligned double quad word from register to 128-bit memory location sub rcx, 16 jne .redo mov edx,8192000 lea rsi,[_sse] mov edi,1 ; STDOUT mov eax,1 syscall xor edi,edi mov eax,60 syscall segment readable writeable _sse rb 8192000 _msg db '1234567812345678'
format ELF64 executable 3 segment readable executable entry $ lea rbx, [_msg] lea rdx, [_x64] mov r8,[rbx] ;copy quad word from 64-bit memory location to register mov rcx, 8192000 .redo: mov [rdx+rcx-8],r8 ;copy quad word from register to 64-bit memory location sub rcx, 8 jne .redo mov edx,8192000 lea rsi,[_x64] mov edi,1 ; STDOUT mov eax,1 syscall xor edi,edi mov eax,60 syscall segment readable writeable _x64 rb 8192000 _msg db '12345678'
Thank you for reading![/code]
|30 Dec 2021, 17:04||
Each system is different.
For each system you have you can run the benchmarks to see the results for that system.
Also: Sadly, benchmarks give only a very slight indication about how code will run in an entire program. So be wary of being too confident about what will happen when benchmarked code is included in a whole program, the outcome can be different.
|30 Dec 2021, 17:48||
< Last Thread | Next Thread >
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.