flat assembler
Message board for the users of flat assembler.

Index > Projects and Ideas > Writing a disassembler?

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
FlierMate11



Joined: 13 Oct 2022
Posts: 94
FlierMate11 17 Feb 2023, 20:18
Is Tomasz as the author of FASM interested in making a disassembler? (PS: Not debugger)

Can we extract certain files from FASM source code as a "common library" for disassembler project?
I mean to decode is the reversal of encode.

To refer to Intel manuals there will be a lot of work, but I don't mind if it worths it.

Sorry if this sounds like a naive idea.
Post 17 Feb 2023, 20:18
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8349
Location: Kraków, Poland
Tomasz Grysztar 17 Feb 2023, 20:24
There have been a couple of disassemblers posted on this board, one is a part of FDBG project: https://board.flatassembler.net/topic.php?t=9689
Post 17 Feb 2023, 20:24
View user's profile Send private message Visit poster's website Reply with quote
FlierMate11



Joined: 13 Oct 2022
Posts: 94
FlierMate11 17 Feb 2023, 21:24
Thanks, FDBG seems like a powerful project, the author (Feryno) knows every bit of Assembly language.

But like the other day what @revolution has discovered, FDBG (for Windows), or its FDISASM (disassembler) has support for 64-bit code only, even I modified it to process 32-bit EXE, the disassembled code is not correct:

Code:
mov     eax, 0x80000002
cpuid
mov     dword [_name], eax
mov     dword [_name + 4], ebx
mov     dword [_name + 8], ecx
mov     dword [_name + 12], edx
mov     eax, 0x80000003
cpuid
mov     dword [_name + 16], eax
mov     dword [_name + 20], ebx
mov     dword [_name + 24], ecx
mov     dword [_name + 28], edx
mov     eax, 0x80000004
cpuid
mov     dword [_name + 32], eax
mov     dword [_name + 36], ebx
mov     dword [_name + 40], ecx
mov     dword [_name + 44], edx   
    


Code:
mov eax,80000002
cpuid
mov [20111D890040200D],eax
add [rcx+4020150D],cl
add [rcx+40201915],cl
add [rax-7FFFFFFD],bh
cpuid
mov [20211D890040201D],eax
add [rcx+4020250D],cl
add [rcx+40202915],cl
add [rax-7FFFFFFC],bh
cpuid
mov [20311D890040202D],eax
add [rcx+4020350D],cl
add [rcx+40203915],cl
add [rdx+00],ch
    


It is a pity it is not correct, not a trivial issue to fix it given its complexity of the code.
Post 17 Feb 2023, 21:24
View user's profile Send private message Visit poster's website Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 712
Ali.Z 18 Feb 2023, 03:28
years ago I made a disassembler and an assembler for 8086 (it never got complete tho), first approach was to hard code everything and it was very easy especially for small number of opcodes.

the second one was more dynamic, like writing a vm but once it got more complex it got discarded. (dont quote me on this, it was my first attempt and never attempted to do it again)

but the point here is if you ever wrote a vm, then it might be easier for you than hardcoding things like me.

p.s. you may need to refer to intel sdm, I am assuming you cannot encode and decode instructions by hand; but once your are abel to then it will be a lot easier to wrap your head around encoding and decoding of different modes. (16,32,64)

_________________
Asm For Wise Humans
Post 18 Feb 2023, 03:28
View user's profile Send private message Reply with quote
I



Joined: 19 May 2022
Posts: 58
I 18 Feb 2023, 06:29
Try cutter/rizin which while are work in progress are not perfect but pretty good.
https://github.com/rizinorg/cutter

Multi OS support, multi arch's.



32-bit disassembly
Code:
  ;-- section..code:
entry0 ();
0x00401000 b8 02 00 00 80                          mov eax, 0x80000002                    ; [00] -r-x section size 4096 named .code
0x00401005 0f a2                                   cpuid
0x00401007 a3 00 20 40 00                          mov dword [section..data], eax         ; [0x402000:4]=0
0x0040100c 89 1d 04 20 40 00                       mov dword [0x402004], ebx              ; [0x402004:4]=0
0x00401012 89 0d 08 20 40 00                       mov dword [0x402008], ecx              ; [0x402008:4]=0
0x00401018 89 15 0c 20 40 00                       mov dword [0x40200c], edx              ; [0x40200c:4]=0
0x0040101e b8 03 00 00 80                          mov eax, 0x80000003
0x00401023 0f a2                                   cpuid
0x00401025 a3 10 20 40 00                          mov dword [0x402010], eax              ; [0x402010:4]=0
0x0040102a 89 1d 14 20 40 00                       mov dword [0x402014], ebx              ; [0x402014:4]=0
0x00401030 89 0d 18 20 40 00                       mov dword [0x402018], ecx              ; [0x402018:4]=0
0x00401036 89 15 1c 20 40 00                       mov dword [0x40201c], edx              ; [0x40201c:4]=0
0x0040103c b8 04 00 00 80                          mov eax, 0x80000004
0x00401041 0f a2                                   cpuid
0x00401043 a3 20 20 40 00                          mov dword [0x402020], eax              ; [0x402020:4]=0
0x00401048 89 1d 24 20 40 00                       mov dword [0x402024], ebx              ; [0x402024:4]=0
0x0040104e 89 0d 28 20 40 00                       mov dword [0x402028], ecx              ; [0x402028:4]=0
0x00401054 89 15 2c 20 40 00                       mov dword [0x40202c], edx              ; [0x40202c:4]=0
0x0040105a c3                                      ret    



64-bit disassembly
Code:
  ;-- section..code:
entry0 (int64_t arg1, int64_t arg2);
; arg int64_t arg1 @ rcx
; arg int64_t arg2 @ rdx
0x00401000 b8 02 00 00 80                          mov eax, 0x80000002                    ; [00] -r-x section size 4096 named .code
0x00401005 0f a2                                   cpuid
0x00401007 89 05 f3 0f 00 00                       mov dword [rip + 0xff3], eax           ; section..data
                                                                                          ; [0x402000:4]=0
0x0040100d 89 1d f1 0f 00 00                       mov dword [rip + 0xff1], ebx           ; [0x402004:4]=0
0x00401013 89 0d ef 0f 00 00                       mov dword [rip + 0xfef], ecx           ; [0x402008:4]=0 ; arg1
0x00401019 89 15 ed 0f 00 00                       mov dword [rip + 0xfed], edx           ; [0x40200c:4]=0 ; arg2
0x0040101f b8 03 00 00 80                          mov eax, 0x80000003
0x00401024 0f a2                                   cpuid
0x00401026 89 05 e4 0f 00 00                       mov dword [rip + 0xfe4], eax           ; [0x402010:4]=0
0x0040102c 89 1d e2 0f 00 00                       mov dword [rip + 0xfe2], ebx           ; [0x402014:4]=0
0x00401032 89 0d e0 0f 00 00                       mov dword [rip + 0xfe0], ecx           ; [0x402018:4]=0 ; arg1
0x00401038 89 15 de 0f 00 00                       mov dword [rip + 0xfde], edx           ; [0x40201c:4]=0 ; arg2
0x0040103e b8 04 00 00 80                          mov eax, 0x80000004
0x00401043 0f a2                                   cpuid
0x00401045 89 05 d5 0f 00 00                       mov dword [rip + 0xfd5], eax           ; [0x402020:4]=0
0x0040104b 89 1d d3 0f 00 00                       mov dword [rip + 0xfd3], ebx           ; [0x402024:4]=0
0x00401051 89 0d d1 0f 00 00                       mov dword [rip + 0xfd1], ecx           ; [0x402028:4]=0 ; arg1
0x00401057 89 15 cf 0f 00 00                       mov dword [rip + 0xfcf], edx           ; [0x40202c:4]=0 ; arg2
0x0040105d c3                                      ret    
Post 18 Feb 2023, 06:29
View user's profile Send private message Reply with quote
FlierMate11



Joined: 13 Oct 2022
Posts: 94
FlierMate11 18 Feb 2023, 11:01
Thank you for all of your replies.

@Ali.Z

Admired someone like you who does disassembler from scratch. I too do plan to start from simple i8086 instruction set, then i386 instruction set.
Do we need to have different disassembler for 16,32 &64 bit each?
When you say VM, I think you mean an emulator? The FDBG project already has fdisasm64_data.inc, with hex values corresponding to respective CPU opcode, so maybe I will start from there.
Covering instruction set extensions (like 3D! Now, MMX, AVX) is another challenge.

@I

The 32-bit and 64-bit disassembly look good. You didn't remember, you introduced cutter to me in the Linux GUI "Hello World" window thread before. A cyberpal of mine said "cutter is actually GUI version of Rizin, which is a fork from Radare (forked due to some disagreements between the developers)". Is it true? But using disassembler is one thing, writing a disassembler will be a totally different experience.
Post 18 Feb 2023, 11:01
View user's profile Send private message Visit poster's website Reply with quote
I



Joined: 19 May 2022
Posts: 58
I 22 Feb 2023, 01:58
Yes I didn't remember nor did I after your reminder such are the joys of age Sad

Cutter is official GUI for rizin, don't now about disagreements or maybe don't remember?

Good luck with the project.
Post 22 Feb 2023, 01:58
View user's profile Send private message Reply with quote
FlierMate11



Joined: 13 Oct 2022
Posts: 94
FlierMate11 22 Feb 2023, 10:20
Thanks @I, and don't worry it is not your memory problem.

I found https://www.sandpile.org/x86/opc_2.htm is wrong in reference to
Code:
JO JNO JB JNB JZ JNZ JBE JNBE    JS JNS JP JNP JL JNL JLE JNLE    


It should be 0x70 to 0x7F, not like sandpile.org documented starting from 0x80.

There is no way to contact sandpile.org webmaster to report the error. Looks like I must be careful when making opcode reference. Intel SDM yes, but FDISASM by Feryno is already a good reference, I will make the x86 version with reference to his/her fdisasm64_data.inc file.


Description: Correct opcode reference in FDISASM by Feryno
Filesize: 6.23 KB
Viewed: 7979 Time(s)

Screenshot 2023-02-22 173055.png


Description: Incorrect opcode reference by sandpile.org
Filesize: 21.03 KB
Viewed: 7978 Time(s)

Screenshot 2023-02-22 173028.png


Post 22 Feb 2023, 10:20
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8349
Location: Kraków, Poland
Tomasz Grysztar 22 Feb 2023, 10:47
FlierMate11 wrote:
I found https://www.sandpile.org/x86/opc_2.htm is wrong in reference to
Code:
JO JNO JB JNB JZ JNZ JBE JNBE    JS JNS JP JNP JL JNL JLE JNLE    


It should be 0x70 to 0x7F, not like sandpile.org documented starting from 0x80.
Pay attention to the captions - the sheet you linked is for two-byte opcodes, not single-byte ones. They first byte is always 0Fh, the second one is as listed in the table. Starting from 386 there are additional opcodes for conditional jumps, 0F 8x, which allow to encode them as near jumps (as opposed to short ones that have a single-byte opcode 7x).
Post 22 Feb 2023, 10:47
View user's profile Send private message Visit poster's website Reply with quote
FlierMate11



Joined: 13 Oct 2022
Posts: 94
FlierMate11 22 Feb 2023, 10:58
Tomasz Grysztar wrote:
FlierMate11 wrote:
I found https://www.sandpile.org/x86/opc_2.htm is wrong in reference to
Code:
JO JNO JB JNB JZ JNZ JBE JNBE    JS JNS JP JNP JL JNL JLE JNLE    


It should be 0x70 to 0x7F, not like sandpile.org documented starting from 0x80.
Pay attention to the captions - the sheet you linked is for two-byte opcodes, not single-byte ones. They first byte is always 0Fh, the second one is as listed in the table. Starting from 386 there are additional opcodes for conditional jumps, 0F 8x, which allow to encode them as near jumps (as opposed to short ones that have a single-byte opcode 7x).


Embarassed Oops, silly me. Thank you for the correction, I learned from my mistake.

Can I consult you or others during the development of my x86 disassembler? Thank you in advance.
Post 22 Feb 2023, 10:58
View user's profile Send private message Visit poster's website Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 712
Ali.Z 22 Feb 2023, 11:04
FlierMate11 wrote:
Looks like I must be careful when making opcode reference.

you should even when reading from intel's sdm, I have read tens of sdm and hunted lot of misleading / incorrect documentation, especially from older ones; writing a disassembler can never be perfect.
just consider this, when did intel or amd did mention that address size prefix can used to alter jumps ...
they either leave things undocument behind or fix them in an ugly misleading wordings or even fix them on a hard to get papers...

no single disassembler known to be complete.

when i said vm, I meant writing an engine that will process bytecode and make it more dynamic, but I see hardcoding things is safer as vms are error-prone to very high degree.

_________________
Asm For Wise Humans
Post 22 Feb 2023, 11:04
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8349
Location: Kraków, Poland
Tomasz Grysztar 22 Feb 2023, 11:48
FlierMate11 wrote:
Can I consult you or others during the development of my x86 disassembler? Thank you in advance.
Yes, of course. Also: have you seen my old live sessions about manually decoding x86 instructions? They could be helpful in the beginning.
Post 22 Feb 2023, 11:48
View user's profile Send private message Visit poster's website Reply with quote
FlierMate11



Joined: 13 Oct 2022
Posts: 94
FlierMate11 28 Feb 2023, 15:00
Thanks Tomasz for the YouTube link, I watched it on ad-hoc basis. (Haven't finished watching it though)
It needs 1080p(?) high-resolution or else I cannot see clearly the tiny text on screen.
I slowly understand why octal digits are needed to analyze the operands, because of Mod (aa) + Register (bbb) + R/M (ccc) = aa bbb ccc.

Today I start researching, and I start with ADD opcode, but funny the one-byte ADD opcode doesn't have register to register, or ADD Gv, Gv(?). When I assemble it, it yields
Code:
66 01 d8                add    ax,bx    


It has a prefix 0x66, so I guess it is 2-byte opcode, but upon looking up 2-byte opcode page, can't find ADD:
https://www.sandpile.org/x86/opc_2.htm

Maybe this is another silly question, but where to locate the ADD register to register opcode reference?

Thank you, Tomasz.


Description: ADD opcode reference
Filesize: 13.37 KB
Viewed: 7823 Time(s)

add.png


Post 28 Feb 2023, 15:00
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20298
Location: In your JS exploiting you and your system
revolution 28 Feb 2023, 15:10
0x66 is the opsiz prefix to give you 16-bit.

0x01 is the opcode for ADD

0xd8 is the mod-reg-r/m

mod=11 --> reg-to-reg
Post 28 Feb 2023, 15:10
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8349
Location: Kraków, Poland
Tomasz Grysztar 28 Feb 2023, 15:15
Yes, the prefixes are not considered part of the opcode. Perhaps my article in the second issue (see page 7) of the Paged Out! magazine might also help organize concepts.
Post 28 Feb 2023, 15:15
View user's profile Send private message Visit poster's website Reply with quote
FlierMate11



Joined: 13 Oct 2022
Posts: 94
FlierMate11 28 Feb 2023, 15:20
revolution wrote:
0x66 is the opsiz prefix to give you 16-bit.

0x01 is the opcode for ADD

0xd8 is the mod-reg-r/m

mod=11 --> reg-to-reg


Thanks, didn't really know the meaning "ADD Ev, Gv" as I thought "Ev" must be memory address, until I saw your reply.

I can see 0x66 is documented as "OPSIZE: (80386+)", and registers on both side if Mod =11.


Description:
Filesize: 52.84 KB
Viewed: 7806 Time(s)

mod11.png


Post 28 Feb 2023, 15:20
View user's profile Send private message Visit poster's website Reply with quote
FlierMate11



Joined: 13 Oct 2022
Posts: 94
FlierMate11 28 Feb 2023, 15:21
Tomasz Grysztar wrote:
Yes, the prefixes are not considered part of the opcode. Perhaps my article in the second issue (see page 7) of the Paged Out! magazine might also help organize concepts.


Thanks, I read it before but not really understand. Hopefully now can grasp its meaning.


As an epitome of CISC architectures, x86 has instructions of variable length, often complex to decode.
Moreover, its instruction set has grown tremendously and nowadays it is infeasible to get by without a
good disassembler. However, this quick guide aims to give at least a general idea of what is what.


True..... I will just stick to 32-bit disassembler without support of instruction extensions. Crying or Very sad


Last edited by FlierMate11 on 28 Feb 2023, 15:49; edited 2 times in total
Post 28 Feb 2023, 15:21
View user's profile Send private message Visit poster's website Reply with quote
FlierMate11



Joined: 13 Oct 2022
Posts: 94
FlierMate11 28 Feb 2023, 15:42
Also I find it confusing to have another set of opcode to take immediate value to register:

Code:
83 c3 1e                add    ebx,0x1e    


It is not 0x00 to 0x05.

Upon looking up docs, it says:
Quote:
group #1*I64
Eb,Ib

...and the screenshot attached.


Description: Additional ADD opcode reference
Filesize: 22.67 KB
Viewed: 7797 Time(s)

add8x.png


Post 28 Feb 2023, 15:42
View user's profile Send private message Visit poster's website Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 712
Ali.Z 28 Feb 2023, 16:09
look here:
https://board.flatassembler.net/topic.php?p=207477#207477&sid=38818506f18d025e9182e66c40899ea6

do not confuse yourself by looking here and there, the only thing you need is the ability to manually encode and decode the instructions then you will be able to read most of intel's sdm. (you must be a smart reader as intel tend to scatter every important detail in their sdm)

_________________
Asm For Wise Humans
Post 28 Feb 2023, 16:09
View user's profile Send private message Reply with quote
FlierMate11



Joined: 13 Oct 2022
Posts: 94
FlierMate11 12 Mar 2023, 10:58
Thanks @Ali.Z, nice infographics, but I notice that is mainly for i8086.

Another question,

I found the following reg-to-reg, can also be done....
Code:
01 d8                   add    eax,ebx
03 d8                   add    ebx,eax
    


...as below:
Code:
01 d8                   add    eax,ebx
01 c3                   add    ebx,eax
    


So when to decide 01 or 03 opcode? If use 01, just change different Mod+Reg+R/M value, or if use interchangebly with 03, just stick to 0xd8.
Post 12 Mar 2023, 10:58
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.