flat assembler
Message board for the users of flat assembler.

Index > Main > Unsure about performance test results in string printing

Author
Thread Post new topic Reply to topic
Anthony S.



Joined: 12 Apr 2024
Posts: 3
Location: There
Anthony S. 12 Apr 2024, 15:25
Hi everyone

I've been working on a personal project where I'm crafting a library with macros and defines. One of the functions I've been implementing is a 'write' function that can print a string without specifying length, using a '\0' character to mark the end of the string. I've come up with two possible implementations:

Code:
macro ef_sys_write_it buf* {
    xor esp, esp                        ; Initialize string index counter

    @@:
        mov al, byte[buf + esp]         ; Move next character into 'al'

        cmp al, 0                       ; Compare character to '\0'
        je @f                           ; End loop when reaches '\0'

        mov eax, sys_write              ; Set 'write' syscall
        mov edi, sys_stdout             ; Set output file
        lea esi, [buf + esp]            ; Load character address to print
        mov edx, 1                      ; Set to print one character
        syscall

        inc esp                         ; Increment character index
        jmp @b                          ; Restarts loop
    @@:
}

macro ef_sys_write_ct buf* {
    xor edx, edx                        ; Initialize string index counter

    @@:
        mov al, byte[buf + edx]         ; Move next character into 'al'

        cmp al, 0                       ; Compare character to '\0'
        je @f                           ; End loop when reaches '\0'

        inc edx                         ; Increment character index
        jmp @b                          ; Restarts loop
    @@:
        mov esi, buf                    ; Load character address to print
        mov eax, sys_write              ; Set 'write' syscall
        mov edi, sys_stdout             ; Set output file
        syscall
}
    


The ef_sys_write_it implementation prints one character at a time, utilizing multiple syscalls to print the whole string while iterating through the array. Conversely, the ef_sys_write_ct implementation calculates the length of the array by iterating through it and then uses a single syscall to print the entire array.

Initially, I believed that the ef_sys_write_ct implementation would perform better due to its single syscall usage. However, to challenge this assumption, I conducted a "performance test" as follows:

I executed both implementations separately, each printing a simple "Hello, World!" 20000 times. Here's the bash command I used on my Linux system:

Code:
#!/bin/bash

echo "[INFO] Syscall at each iteration implementation"
time for x in {0..20000}; do ./implementation_one; done

echo "[INFO] Single syscall implementation"
time for x in {0..20000}; do ./implementation_two; done
    


And, surprisingly i got the following results:

Code:
[INFO] Syscall at each iteration implementation

real 0m8.545s
user 0m5.741s
sys 0m3.537s

[INFO] Single syscall implementation

real 0m9.885s
user 0m6.430s
sys 0m4.257s
    


Shocked

Surprisingly, the supposedly slower implementation appears to be performing better. Can someone explain why this is happening? I've always thought that syscalls were resource-intensive operations, so it seems counterintuitive to me that even though the second implementation seems better, it still runs slower.

_________________
In the beginning you always want the results. In the end all you want is control.
Post 12 Apr 2024, 15:25
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20430
Location: In your JS exploiting you and your system
revolution 12 Apr 2024, 16:55
The timings will be meaningless because of the overhead of many other things in the system that dwarf the thing you want to measure.

Better to start the process only once and run the loop in the process itself.
Post 12 Apr 2024, 16:55
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1024
Location: Russia
macomics 12 Apr 2024, 18:41
Quote:
~$ fasm test_print.asm test_print1 -d TEST_NAME=print1 -d TEST_COUNT=20000000
flat assembler version 1.73.32 (16384 kilobytes memory)
2 passes, 223 bytes.

~$ fasm test_print.asm test_print2 -d TEST_NAME=print2 -d TEST_COUNT=20000000
flat assembler version 1.73.32 (16384 kilobytes memory)
2 passes, 223 bytes.

~$ time ./test_print1 > /dev/null

real 0m23.263s
user 0m7.888s
sys 0m15.335s
~$ time ./test_print2 > /dev/null

real 0m1.874s
user 0m0.675s
sys 0m1.194s

I don't know about you, but I have a significant difference. I did not test your code, but an algorithm similar in description.
Code:
format ELF64 executable 3
segment executable
entry   $
match =TEST_COUNT, TEST_COUNT { define TEST_COUNT 10000 }
match =TEST_NAME, TEST_NAME { define TEST_NAME print0 }
        mov     rcx, TEST_COUNT
  @@:
        push    rcx
        lea     rax, [hello_world]
match name, TEST_NAME { call    name }
        pop     rcx
        loop    @b
        mov     rax, 60
        xor     dil, dil
        syscall

  print0:
        retn

  print1:
        push    1
        pop     rdi
        mov     rsi, rax
        mov     rax, rdi
  @@:
        cmp     byte [rsi], 0
        jz      @f
        mov     rax, rdi
        mov     rdx, rdi
        syscall
        inc     rsi
        jmp     @b
  @@:
        retn

  print2:
        push    1
        or      rdx, -1
        mov     rsi, rax
        pop     rdi
  @@:
        inc     rdx
        cmp     byte [rax + rdx], 0
        jnz     @b
        mov     rax, rdi
        syscall
        retn

  hello_world           db 'Hello world!', 10, 0
    


Last edited by macomics on 12 Apr 2024, 18:52; edited 1 time in total
Post 12 Apr 2024, 18:41
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20430
Location: In your JS exploiting you and your system
revolution 12 Apr 2024, 18:48
macomics wrote:
I don't know about you, but I have a significant difference.
OP used the shell to launch the task repeatedly, and the task was very short, so the overhead of launching and terminating will be the major bottleneck there. Add to that the normal randomness of other processes interrupting everything and you get useless variable results.
Post 12 Apr 2024, 18:48
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1024
Location: Russia
macomics 12 Apr 2024, 18:55
revolution wrote:
OP used the shell to launch the task repeatedly, and the task was very short, so the overhead of launching and terminating will be the major bottleneck there. Add to that the normal randomness of other processes interrupting everything and you get useless variable results.

That's why I wrote my own version for testing

In his version, I am at least confused by the command xor esp, esp
Post 12 Apr 2024, 18:55
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20430
Location: In your JS exploiting you and your system
revolution 12 Apr 2024, 19:04
macomics wrote:
In his version, I am at least confused by the command xor esp, esp
ESP is used as a normal register. If you don't need the stack then it is perfectly fine to do that (as long as you use an OS in protected mode, or you disable interrupts in real mode).

There is another thread I posted about using ESP that way. And nothing goes wrong, it all works just fine.
Post 12 Apr 2024, 19:04
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1024
Location: Russia
macomics 12 Apr 2024, 20:02
revolution wrote:
ESP is used as a normal register. If you don't need the stack then it is perfectly fine to do that (as long as you use an OS in protected mode, or you disable interrupts in real mode).

There is another thread I posted about using ESP that way. And nothing goes wrong, it all works just fine.

That's not the point. In compatibility with another program. Macros are good, but they must take into account the many possibilities of using code. Besides xor esp, esp, I am also confused by @@ labels instead of named local labels.
Post 12 Apr 2024, 20:02
View user's profile Send private message Reply with quote
Anthony S.



Joined: 12 Apr 2024
Posts: 3
Location: There
Anthony S. 12 Apr 2024, 20:23
revolution wrote:
The timings will be meaningless because of the overhead of many other things in the system that dwarf the thing you want to measure.

Better to start the process only once and run the loop in the process itself.


Ah, I overlooked the shell acting as a bottleneck because of the loop I implemented. Thank you for pointing that out!

(btw, i'm still learning assembly fundamentals so yeah, the code is a mess)

_________________
In the beginning you always want the results. In the end all you want is control.
Post 12 Apr 2024, 20:23
View user's profile Send private message Reply with quote
Anthony S.



Joined: 12 Apr 2024
Posts: 3
Location: There
Anthony S. 12 Apr 2024, 20:31
macomics wrote:
revolution wrote:
ESP is used as a normal register. If you don't need the stack then it is perfectly fine to do that (as long as you use an OS in protected mode, or you disable interrupts in real mode).

There is another thread I posted about using ESP that way. And nothing goes wrong, it all works just fine.

That's not the point. In compatibility with another program. Macros are good, but they must take into account the many possibilities of using code. Besides xor esp, esp, I am also confused by @@ labels instead of named local labels.


Thanks for the feedback on my design choices. I mainly just messed around and tested stuff out.

I checked out your test code and picked up a few tricks I hadn’t thought of before. Really helpful, thanks!

_________________
In the beginning you always want the results. In the end all you want is control.
Post 12 Apr 2024, 20:31
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.