flat assembler
Message board for the users of flat assembler.

Index > Main > lodsb or mov?

Author
Thread Post new topic Reply to topic
thecf



Joined: 23 Dec 2006
Posts: 23
thecf 03 Oct 2007, 14:03
I dont understand why lodsb exists when you can do the following:

Code:
mov  esi,_string
mov  al,[esi+0]    


or is this way faster? and thats why it exists?

Code:
mov  esi,_string
lodsb    


Let me know. Cheers! Confused
Post 03 Oct 2007, 14:03
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 03 Oct 2007, 14:12
It is because of the CISC nature of early x86 instruction set. In short, there are two groups of instructions: general and specialized, like MOV versus LODS, JNZ versus LOOP, MOVSX versus CWDE.

BTW, those two blocks of code don't perform the same. LODS increments/decrements ESI additionaly.
Post 03 Oct 2007, 14:12
View user's profile Send private message Visit poster's website Reply with quote
xspeed



Joined: 16 Aug 2007
Posts: 22
xspeed 03 Oct 2007, 17:29
lodsb, especially if you run it with repXX (rep function) it will tend to run 5-10 times faster then mov.
Post 03 Oct 2007, 17:29
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 03 Oct 2007, 17:31
REP LODSB? Interesting instruction. Very useful! Laughing
Post 03 Oct 2007, 17:31
View user's profile Send private message Visit poster's website Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 515
Location: Czech republic, Slovak republic
Feryno 04 Oct 2007, 08:42
A little out of topic, but I see an usage of REP LODSB e.g. in self-modifying protected code:
1. code prepares hardware breakpoint at the begin of memory to be accessed by rep lodsb
2. debug exception handler decrytps byte causing exception and increments debug register 0-3 so debug exception occures until rep lodsb ends

code skeleton:
; code section is readable + WRITEABLE

- set exception handler
- set debug registers in the thread context so DR0 or DR1 or DR2 or DR3 points to encrypted_start and DR7 is set to trigger on memory read/write
lea esi,[encrypted_start]
cld
mov ecx,encrypted_size
repz lodsb

encrypted_start:
; some encrypted code here
; end of encrypted code
encrypted_size = $ - encrypted_start


exception01_handler:
- get DR0 or DR1 or DR2 or DR3 set before (from ThreadContext)
- decrypt byte at that address
- increment DR0 or DR1 or DR2 or DR3
- write incremented debug register back to the thread context
; end of exception handled


Don't suppose my brain/thinking to be crazy... I'm just now thinking about such a protection of code. Thank you for the tip. I used similar rep scasd in my recent demo. But rep lodsd looks even crazier !!!
Thing looking useless may make big pleasure for someone else...
Post 04 Oct 2007, 08:42
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
16bitPM



Joined: 08 Jul 2011
Posts: 30
16bitPM 29 Mar 2012, 09:58
It's useful to load code in the cache if timing is crucial. EVEN on cached 286 systems Wink
Post 29 Mar 2012, 09:58
View user's profile Send private message Reply with quote
LostCoder



Joined: 07 Mar 2012
Posts: 22
LostCoder 29 Mar 2012, 11:38
Because of size probably. At the time of the good old 16-bit media were small, and processors did not have the advanced instruction caches, etc.
therefore, the program was so fast as they are short, and so to the size of the code have paid much attention. Check yourself:
Code:
; code with lodsb
mov esi,_string  ; 6 bytes
cld              ; 1 byte
lodsb            ; 1 bytes
                 ; 8 bytes total

; same things for oldschool 16-bit
mov si,_string   ; 3 bytes
cld              ; 1 byte
lodsb            ; 1 bytes
                 ; 5 bytes total

; emulation
mov esi,_string  ; 6 bytes
; emulate cld
xor edx,edx      ; 3 bytes ; use edx as "direction flag", use 0 or 1
; emulate lodsb
shl edx,1        ; 3 bytes ; convert direction flag to -1,1
dec edx          ; 2 bytes
mov al,[esi]     ; 3 bytes
add esi,edx      ; 3 bytes
                 ; 20 bytes total

; same thing for oldshool 16-bit
mov si,_string   ; 3 bytes
; emulate cld
xor dx,dx        ; 2 bytes ; use dx as "direction flag"
; emulate lodsb
shl dx,1         ; 2 bytes ; convert direction flag to usable delta
dec dx           ; 1 byte
mov al,[si]      ; 2 bytes
add si,dx        ; 2 bytes
                 ; 11 bytes total    
Also code with lodsb has only 3 instruction in compare with others so I can assume today it also should run a little bit faster.
Post 29 Mar 2012, 11:38
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.