flat assembler
Message board for the users of flat assembler.

Index > Linux > a more basic ELF64 example?

Author
Thread Post new topic Reply to topic
DrenThales



Joined: 05 Sep 2006
Posts: 11
DrenThales 12 Sep 2006, 21:16
Here's the ELF64 example that comes with Linux version of FASM compiler:
Code:
format ELF64 executable

segment readable executable
  entry $
    mov edx,msg_size       ; CPU zero extends 32-bit operation to 64-bit
                                       ; we can use less bytes than in case mov rdx,...
    lea rsi,[msg]
    mov edi,1                     ; STDOUT
    mov eax,1                    ; sys_write
    syscall

    xor edi,edi                     ; exit code 0
    mov eax,60                   ; sys_exit
    syscall

segment readable writeable
  msg db 'Hello 64-bit world!',0xA
  msg_size = $-msg
    


However, I noticed that, when the following addition/modification is made to the code:
Code:
  ...
  msg db 'Hello 64-bit world!',0xA
  db '1','2','3'
  msg_size = $-msg
  ...
    


"123" is outputed, abliet on the same line as the next console/terminal prompt.

So...my questions are:

- does anyone have a more minimal example of ELF64? (I have no idea why "msg_size" and associated code is even being used, if doing output without it seems to work)

- why are "eax" and similar 32-bit registers being used if this is suppposed to be a 64-bit application?

- as specified by ELF fomat (from what I've gathered from searching the internet), the execution code is placed in separate areas for executable code, data, and some sort of reserved memory mechanism (.bss?) right? So how many segments can we declare?

- also, how do "sections" and "segments" compare, and, likewise, how do "ELF objects" and "ELF executables" compare? (from a noob's perspective, ie mine, the ELF object examples and the ELF executable examples, from a source code perspective, appear as though they serve the same basic function, just with different syntax: though the ELF object files, from what I can discern, do not appear readily executable)

In short, how do I work with this environment in such a way as to best code with the following program flow as it were:
Code:
  ...
  some code
  text outputing info about code execution and results
  some code
  text outputing info about code execution and results
  // and so forth
  ...
    


so that I can actually learn FASM to begin with

Any help would be appreciated.
Post 12 Sep 2006, 21:16
View user's profile Send private message Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 509
Location: Czech republic, Slovak republic
Feryno 13 Sep 2006, 05:44
AMD64 CPU in long mode (64-bit) zero extends operations with 32-bit GPR (general purpose registers) into the whole 64-bit register. You can use this feature to reduce code size, because using 64-bit register requires 1 extra byte for REX opcode prefix. You can easily mistake with this feature, like when you do this: cmp eax,... the upped dword of RAX is zeroed.
If you want to put like 1 into rax, you can use mov eax,1

xor edi,edi is one old assembler trick to put 0 into register. ASM coders like it because opcode size is only 2 bytes instead of 5 bytes for mov edi,0. The most of asm coders like asm because they can reduce program size using their art and mind...
another example for reduce code size is
or rax,rax ; you can do other choices here: and rax,rax... test rax,rax
jz...
instead of cmp rax,0 / jz

Please use this to improve you code:
...
db '1','2','3', 0Ah

To find more ELF64 examples - please try to search flatassembler.net page more carrefuly


Last edited by Feryno on 14 Sep 2006, 04:59; edited 1 time in total
Post 13 Sep 2006, 05:44
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
DrenThales



Joined: 05 Sep 2006
Posts: 11
DrenThales 13 Sep 2006, 18:23
ok, that makes sense, machine-code wise

Do you know of any opcode size listings for AMD64? Or how to retrieve the opcode size of a particular instruction?

Also, as to the size growth caused by REX prefix, is that something inherit in use of 64-bit registers?, or is it just the price paid for legacy 32-bit support? (if they made a pure 64-bit processor, ie no 32-bit legacy, would it even need the REX prefix?)

In any case though, as to the ELF file format, I think that, instead, I'll just use flat binary, for the time being [ edit: ...or not..., how would I test execution with flat binary? ]

I assume the reason you used an uppercase 'A' instead of 'a' in your '0Ah' hex number is ease of reading? Is it typical for assembler programmers to prefer upper case letters when reading hex numbers? (I suppose it helps keep it separate from the base prefix at the end, ie the 'h' and such, for one thing)
Post 13 Sep 2006, 18:23
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 13 Sep 2006, 19:51
Quote:
Do you know of any opcode size listings for AMD64?

see AMD manuals... it depends on many things.. for example simple address can have 1 to 6 bytes.. there can be several optional prefixes... don't try to find some system in it, processor done to be efficient (and unfortunately, backward-compatible), not to be "clean"

Quote:
Or how to retrieve the opcode size of a particular instruction?
again, not so simple. you have to disassemble every existing "class" of instruction

Quote:
Also, as to the size growth caused by REX prefix, is that something inherit in use of 64-bit registers?, or is it just the price paid for legacy 32-bit support? (if they made a pure 64-bit processor, ie no 32-bit legacy, would it even need the REX prefix?)

if they would make pure 64bit, they will loose backward compatibility, and save one byte for REX prefix, but loose 4 bytes with every constant, because it would have to be 64bit constant, instead of 32bit.

Quote:
I assume the reason you used an uppercase 'A' instead of 'a' in your '0Ah' hex number is ease of reading? Is it typical for assembler programmers to prefer upper case letters when reading hex numbers? (I suppose it helps keep it separate from the base prefix at the end, ie the 'h' and such, for one thing)

yes, i would say it's because of ease of reading, and it is good standard among asmers.

for example, the space looks more equally filled with upcase letters in dump:
0A 4B 4C FF 13 20 2D

than with lowcase, where a-f visually appears to be something other than 0-9, but they are all digits:
0a 4b 4c ff 13 20 2d

... imho
Post 13 Sep 2006, 19:51
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 509
Location: Czech republic, Slovak republic
Feryno 14 Sep 2006, 05:12
For opcode size, you can avoid to waste the time with manuals: use biew or other disassemblers, or debuggers.
Or use fdbg for Linux AMD64 (maybe the smallest toy to do it at present time - look into projects and ides section of the forum) which shows you it after command:
c
c means 'code', you can use it without any param then it dumps code from RIP or from the end of its previous output, with params you can do something like
c rip+2A
c 40104c
numbers there are hexa, I hadn't time to do it better (with h at the end)

REX prefix is 1 extra byte only

Yes, every programmer uses it's own style
0xA = 0xa = 0x0A = 0x0a = 0Ah = 0ah = 0AH = 0aH
Post 14 Sep 2006, 05:12
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.