flat assembler
Message board for the users of flat assembler.
Index
> Projects and Ideas > Writing a disassembler? Goto page 1, 2, 3 Next |
Author |
|
Tomasz Grysztar 17 Feb 2023, 20:24
There have been a couple of disassemblers posted on this board, one is a part of FDBG project: https://board.flatassembler.net/topic.php?t=9689
|
|||
17 Feb 2023, 20:24 |
|
FlierMate11 17 Feb 2023, 21:24
Thanks, FDBG seems like a powerful project, the author (Feryno) knows every bit of Assembly language.
But like the other day what @revolution has discovered, FDBG (for Windows), or its FDISASM (disassembler) has support for 64-bit code only, even I modified it to process 32-bit EXE, the disassembled code is not correct: Code: mov eax, 0x80000002 cpuid mov dword [_name], eax mov dword [_name + 4], ebx mov dword [_name + 8], ecx mov dword [_name + 12], edx mov eax, 0x80000003 cpuid mov dword [_name + 16], eax mov dword [_name + 20], ebx mov dword [_name + 24], ecx mov dword [_name + 28], edx mov eax, 0x80000004 cpuid mov dword [_name + 32], eax mov dword [_name + 36], ebx mov dword [_name + 40], ecx mov dword [_name + 44], edx Code: mov eax,80000002 cpuid mov [20111D890040200D],eax add [rcx+4020150D],cl add [rcx+40201915],cl add [rax-7FFFFFFD],bh cpuid mov [20211D890040201D],eax add [rcx+4020250D],cl add [rcx+40202915],cl add [rax-7FFFFFFC],bh cpuid mov [20311D890040202D],eax add [rcx+4020350D],cl add [rcx+40203915],cl add [rdx+00],ch It is a pity it is not correct, not a trivial issue to fix it given its complexity of the code. |
|||
17 Feb 2023, 21:24 |
|
Ali.Z 18 Feb 2023, 03:28
years ago I made a disassembler and an assembler for 8086 (it never got complete tho), first approach was to hard code everything and it was very easy especially for small number of opcodes.
the second one was more dynamic, like writing a vm but once it got more complex it got discarded. (dont quote me on this, it was my first attempt and never attempted to do it again) but the point here is if you ever wrote a vm, then it might be easier for you than hardcoding things like me. p.s. you may need to refer to intel sdm, I am assuming you cannot encode and decode instructions by hand; but once your are abel to then it will be a lot easier to wrap your head around encoding and decoding of different modes. (16,32,64) _________________ Asm For Wise Humans |
|||
18 Feb 2023, 03:28 |
|
I 18 Feb 2023, 06:29
Try cutter/rizin which while are work in progress are not perfect but pretty good.
https://github.com/rizinorg/cutter Multi OS support, multi arch's. 32-bit disassembly Code: ;-- section..code: entry0 (); 0x00401000 b8 02 00 00 80 mov eax, 0x80000002 ; [00] -r-x section size 4096 named .code 0x00401005 0f a2 cpuid 0x00401007 a3 00 20 40 00 mov dword [section..data], eax ; [0x402000:4]=0 0x0040100c 89 1d 04 20 40 00 mov dword [0x402004], ebx ; [0x402004:4]=0 0x00401012 89 0d 08 20 40 00 mov dword [0x402008], ecx ; [0x402008:4]=0 0x00401018 89 15 0c 20 40 00 mov dword [0x40200c], edx ; [0x40200c:4]=0 0x0040101e b8 03 00 00 80 mov eax, 0x80000003 0x00401023 0f a2 cpuid 0x00401025 a3 10 20 40 00 mov dword [0x402010], eax ; [0x402010:4]=0 0x0040102a 89 1d 14 20 40 00 mov dword [0x402014], ebx ; [0x402014:4]=0 0x00401030 89 0d 18 20 40 00 mov dword [0x402018], ecx ; [0x402018:4]=0 0x00401036 89 15 1c 20 40 00 mov dword [0x40201c], edx ; [0x40201c:4]=0 0x0040103c b8 04 00 00 80 mov eax, 0x80000004 0x00401041 0f a2 cpuid 0x00401043 a3 20 20 40 00 mov dword [0x402020], eax ; [0x402020:4]=0 0x00401048 89 1d 24 20 40 00 mov dword [0x402024], ebx ; [0x402024:4]=0 0x0040104e 89 0d 28 20 40 00 mov dword [0x402028], ecx ; [0x402028:4]=0 0x00401054 89 15 2c 20 40 00 mov dword [0x40202c], edx ; [0x40202c:4]=0 0x0040105a c3 ret 64-bit disassembly Code: ;-- section..code: entry0 (int64_t arg1, int64_t arg2); ; arg int64_t arg1 @ rcx ; arg int64_t arg2 @ rdx 0x00401000 b8 02 00 00 80 mov eax, 0x80000002 ; [00] -r-x section size 4096 named .code 0x00401005 0f a2 cpuid 0x00401007 89 05 f3 0f 00 00 mov dword [rip + 0xff3], eax ; section..data ; [0x402000:4]=0 0x0040100d 89 1d f1 0f 00 00 mov dword [rip + 0xff1], ebx ; [0x402004:4]=0 0x00401013 89 0d ef 0f 00 00 mov dword [rip + 0xfef], ecx ; [0x402008:4]=0 ; arg1 0x00401019 89 15 ed 0f 00 00 mov dword [rip + 0xfed], edx ; [0x40200c:4]=0 ; arg2 0x0040101f b8 03 00 00 80 mov eax, 0x80000003 0x00401024 0f a2 cpuid 0x00401026 89 05 e4 0f 00 00 mov dword [rip + 0xfe4], eax ; [0x402010:4]=0 0x0040102c 89 1d e2 0f 00 00 mov dword [rip + 0xfe2], ebx ; [0x402014:4]=0 0x00401032 89 0d e0 0f 00 00 mov dword [rip + 0xfe0], ecx ; [0x402018:4]=0 ; arg1 0x00401038 89 15 de 0f 00 00 mov dword [rip + 0xfde], edx ; [0x40201c:4]=0 ; arg2 0x0040103e b8 04 00 00 80 mov eax, 0x80000004 0x00401043 0f a2 cpuid 0x00401045 89 05 d5 0f 00 00 mov dword [rip + 0xfd5], eax ; [0x402020:4]=0 0x0040104b 89 1d d3 0f 00 00 mov dword [rip + 0xfd3], ebx ; [0x402024:4]=0 0x00401051 89 0d d1 0f 00 00 mov dword [rip + 0xfd1], ecx ; [0x402028:4]=0 ; arg1 0x00401057 89 15 cf 0f 00 00 mov dword [rip + 0xfcf], edx ; [0x40202c:4]=0 ; arg2 0x0040105d c3 ret |
|||
18 Feb 2023, 06:29 |
|
FlierMate11 18 Feb 2023, 11:01
Thank you for all of your replies.
@Ali.Z Admired someone like you who does disassembler from scratch. I too do plan to start from simple i8086 instruction set, then i386 instruction set. Do we need to have different disassembler for 16,32 &64 bit each? When you say VM, I think you mean an emulator? The FDBG project already has fdisasm64_data.inc, with hex values corresponding to respective CPU opcode, so maybe I will start from there. Covering instruction set extensions (like 3D! Now, MMX, AVX) is another challenge. @I The 32-bit and 64-bit disassembly look good. You didn't remember, you introduced cutter to me in the Linux GUI "Hello World" window thread before. A cyberpal of mine said "cutter is actually GUI version of Rizin, which is a fork from Radare (forked due to some disagreements between the developers)". Is it true? But using disassembler is one thing, writing a disassembler will be a totally different experience. |
|||
18 Feb 2023, 11:01 |
|
I 22 Feb 2023, 01:58
Yes I didn't remember nor did I after your reminder such are the joys of age
Cutter is official GUI for rizin, don't now about disagreements or maybe don't remember? Good luck with the project. |
|||
22 Feb 2023, 01:58 |
|
FlierMate11 22 Feb 2023, 10:20
Thanks @I, and don't worry it is not your memory problem.
I found https://www.sandpile.org/x86/opc_2.htm is wrong in reference to Code: JO JNO JB JNB JZ JNZ JBE JNBE JS JNS JP JNP JL JNL JLE JNLE It should be 0x70 to 0x7F, not like sandpile.org documented starting from 0x80. There is no way to contact sandpile.org webmaster to report the error. Looks like I must be careful when making opcode reference. Intel SDM yes, but FDISASM by Feryno is already a good reference, I will make the x86 version with reference to his/her fdisasm64_data.inc file.
|
|||||||||||||||||||
22 Feb 2023, 10:20 |
|
Tomasz Grysztar 22 Feb 2023, 10:47
FlierMate11 wrote: I found https://www.sandpile.org/x86/opc_2.htm is wrong in reference to |
|||
22 Feb 2023, 10:47 |
|
FlierMate11 22 Feb 2023, 10:58
Tomasz Grysztar wrote:
Oops, silly me. Thank you for the correction, I learned from my mistake. Can I consult you or others during the development of my x86 disassembler? Thank you in advance. |
|||
22 Feb 2023, 10:58 |
|
Ali.Z 22 Feb 2023, 11:04
FlierMate11 wrote: Looks like I must be careful when making opcode reference. you should even when reading from intel's sdm, I have read tens of sdm and hunted lot of misleading / incorrect documentation, especially from older ones; writing a disassembler can never be perfect. just consider this, when did intel or amd did mention that address size prefix can used to alter jumps ... they either leave things undocument behind or fix them in an ugly misleading wordings or even fix them on a hard to get papers... no single disassembler known to be complete. when i said vm, I meant writing an engine that will process bytecode and make it more dynamic, but I see hardcoding things is safer as vms are error-prone to very high degree. _________________ Asm For Wise Humans |
|||
22 Feb 2023, 11:04 |
|
Tomasz Grysztar 22 Feb 2023, 11:48
FlierMate11 wrote: Can I consult you or others during the development of my x86 disassembler? Thank you in advance. |
|||
22 Feb 2023, 11:48 |
|
FlierMate11 28 Feb 2023, 15:00
Thanks Tomasz for the YouTube link, I watched it on ad-hoc basis. (Haven't finished watching it though)
It needs 1080p(?) high-resolution or else I cannot see clearly the tiny text on screen. I slowly understand why octal digits are needed to analyze the operands, because of Mod (aa) + Register (bbb) + R/M (ccc) = aa bbb ccc. Today I start researching, and I start with ADD opcode, but funny the one-byte ADD opcode doesn't have register to register, or ADD Gv, Gv(?). When I assemble it, it yields Code: 66 01 d8 add ax,bx It has a prefix 0x66, so I guess it is 2-byte opcode, but upon looking up 2-byte opcode page, can't find ADD: https://www.sandpile.org/x86/opc_2.htm Maybe this is another silly question, but where to locate the ADD register to register opcode reference? Thank you, Tomasz.
|
||||||||||
28 Feb 2023, 15:00 |
|
revolution 28 Feb 2023, 15:10
0x66 is the opsiz prefix to give you 16-bit.
0x01 is the opcode for ADD 0xd8 is the mod-reg-r/m mod=11 --> reg-to-reg |
|||
28 Feb 2023, 15:10 |
|
Tomasz Grysztar 28 Feb 2023, 15:15
Yes, the prefixes are not considered part of the opcode. Perhaps my article in the second issue (see page 7) of the Paged Out! magazine might also help organize concepts.
|
|||
28 Feb 2023, 15:15 |
|
FlierMate11 28 Feb 2023, 15:20
revolution wrote: 0x66 is the opsiz prefix to give you 16-bit. Thanks, didn't really know the meaning "ADD Ev, Gv" as I thought "Ev" must be memory address, until I saw your reply. I can see 0x66 is documented as "OPSIZE: (80386+)", and registers on both side if Mod =11.
|
||||||||||
28 Feb 2023, 15:20 |
|
FlierMate11 28 Feb 2023, 15:21
Tomasz Grysztar wrote: Yes, the prefixes are not considered part of the opcode. Perhaps my article in the second issue (see page 7) of the Paged Out! magazine might also help organize concepts. Thanks, I read it before but not really understand. Hopefully now can grasp its meaning. As an epitome of CISC architectures, x86 has instructions of variable length, often complex to decode. Moreover, its instruction set has grown tremendously and nowadays it is infeasible to get by without a good disassembler. However, this quick guide aims to give at least a general idea of what is what. True..... I will just stick to 32-bit disassembler without support of instruction extensions. Last edited by FlierMate11 on 28 Feb 2023, 15:49; edited 2 times in total |
|||
28 Feb 2023, 15:21 |
|
FlierMate11 28 Feb 2023, 15:42
Also I find it confusing to have another set of opcode to take immediate value to register:
Code: 83 c3 1e add ebx,0x1e It is not 0x00 to 0x05. Upon looking up docs, it says: Quote: group #1*I64 ...and the screenshot attached.
|
||||||||||
28 Feb 2023, 15:42 |
|
Ali.Z 28 Feb 2023, 16:09
look here:
https://board.flatassembler.net/topic.php?p=207477#207477&sid=38818506f18d025e9182e66c40899ea6 do not confuse yourself by looking here and there, the only thing you need is the ability to manually encode and decode the instructions then you will be able to read most of intel's sdm. (you must be a smart reader as intel tend to scatter every important detail in their sdm) _________________ Asm For Wise Humans |
|||
28 Feb 2023, 16:09 |
|
FlierMate11 12 Mar 2023, 10:58
Thanks @Ali.Z, nice infographics, but I notice that is mainly for i8086.
Another question, I found the following reg-to-reg, can also be done.... Code: 01 d8 add eax,ebx 03 d8 add ebx,eax ...as below: Code: 01 d8 add eax,ebx 01 c3 add ebx,eax So when to decide 01 or 03 opcode? If use 01, just change different Mod+Reg+R/M value, or if use interchangebly with 03, just stick to 0xd8. |
|||
12 Mar 2023, 10:58 |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.