flat assembler
Message board for the users of flat assembler.

Index > Main > 8086 disassembler

Author
Thread Post new topic Reply to topic
Ali.Z



Joined: 08 Jan 2018
Posts: 365
Ali.Z
lets say someone want to build an exclusive 8086 disassembler, assembler or both.
by mean of outputting pure 8086 assembled or disassembled instructions, operating system independent.

what is the efficient way of building this, in context of logic design or effective way to interpret instructions; and whether it should be single or multi pass; and why?

assuming we have all the requirements:
- 8086 opcode map
- knowledge on how to encode and decode instructions

if there is a missing requirement, please point out

_________________
Asm For Wise Humans
Post 18 Apr 2020, 00:09
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17639
Location: In your JS exploiting you and your system
revolution
In many cases a disassembler will also need to be able to decode various file formats.

For 16-bit code the MZ .exe format is common. There is also the simpler .com format.

Doing automatic detection of data sections can be useful also.
Post 18 Apr 2020, 22:39
View user's profile Send private message Visit poster's website Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 365
Ali.Z
i dont want to output executables, just outputting pure 8086 instructions. (like fasm's use16 with not format directive ~ raw output)

Ali.Z wrote:
lets say someone want to build an exclusive 8086 disassembler, assembler or both.
by mean of outputting pure 8086 assembled or disassembled instructions, operating system independent.

what is the efficient way of building this, in context of logic design or effective way to interpret instructions; and whether it should be single or multi pass; and why?

assuming we have all the requirements:
- 8086 opcode map
- knowledge on how to encode and decode instructions

if there is a missing requirement, please point out


treat them by bytes:
- 1 byte instructions
- 2 bytes instructions
and 3,4,5,6.
instructions like or word [0FFAAh],9900h, are 6 bytes 81 0E AA FF 00 99 so those 6 bytes long must be treated differently.

so i believe this is one efficient way of the design, but to find the length i must interpret the instruction.

and i did not find any effective way to interpret the instruction for encoding level, on top of my head is to token registers, and mnemonics; but i believe its not a very good way.

because we have groups, control transfer, data movements and arithmetic etc.

most opcodes have 6-bit field for instruction operation type, i.e. and,add,cmp... and two bits D=direction, and W=word or byte.

some have 7-bits for operation type, and some none.

we also have have S=sign extension for immediate operands and in many cases is used in conjunction with W to indicate whether its an 8-bit sign extension or 16.

we also have V= variable shift/rotate, when the bit is SET, the count for shifting/rotating stored in CL. (so no immediate)

and Z is used for loop/repeat, when SET it loops while ZF=1, otherwise while ZF=0.



then Mod reg r/m byte, mod and r/m is kinda easy to handle, but the reg field depending on the opcode it can either be a register or an instruction stored in either group1,2.

so i think i have to classify instructions by type, or by encoding bits?
cant find a good way Sad

_________________
Asm For Wise Humans
Post 19 Apr 2020, 01:44
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7783
Location: Kraków, Poland
Tomasz Grysztar
I had once given some hints for doing a length-disassembly in another thread.

I have also published an article in Page Out! #2 (on page 7) where I demonstrate the concept on some examples.

For original 8086 this is all relatively simple - the opcode is always just a single byte, and there is no SIB.
Post 20 Apr 2020, 12:17
View user's profile Send private message Visit poster's website Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 365
Ali.Z
disassembly is a lot easier to implement, programmatically and theoretically.

as for an assembler i can encode whatever instruction by just looking at the opcode map (from an old 8086 manual), however coding an assembler is kina difficult to me since i never made an assembler so i dont know how to categorize things.

anyhow i have already started to separate some instruction type, but still not sure whether its good or not as im guessing while typing.

i expect a failure, but i hope for success.

_________________
Asm For Wise Humans
Post 20 Apr 2020, 14:41
View user's profile Send private message Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 365
Ali.Z
edit invalid post

_________________
Asm For Wise Humans


Last edited by Ali.Z on 24 Apr 2020, 05:36; edited 1 time in total
Post 21 Apr 2020, 10:45
View user's profile Send private message Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 365
Ali.Z
im little bit confused, the 8086 family users manual clearly says all Jcc use short relative offset.

that means, for example:
Code:
 jno short 70h ; correct ; according to the manual
 jno near 0070h ; incorrect ; according to the manual    


however fasm assembles near-relative in 16-bit mode with no issues, but i trust fasm more than intel Sad
@tomasz help, who is right you or intel Laughing

_________________
Asm For Wise Humans
Post 22 Apr 2020, 06:27
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17639
Location: In your JS exploiting you and your system
revolution
The long forms of Jcc are valid in 8086 mode.

But be aware that your CPU won't be able to execute it if it is a genuine 8086. For any modern CPU with 32-bit instructions it will run fine.
Post 22 Apr 2020, 06:31
View user's profile Send private message Visit poster's website Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 365
Ali.Z
oh okay, so NEAR is only valid in modern x86 realmode/protected mode.

hmm, i will stick with manual as i want real 8086 instructions. thanks.

_________________
Asm For Wise Humans
Post 22 Apr 2020, 06:37
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7783
Location: Kraków, Poland
Tomasz Grysztar
Ali.Z wrote:
hmm, i will stick with manual as i want real 8086 instructions. thanks.
You can also try 8086.INC for fasmg, which implements the original 8086 instruction set with no additions of later CPUs.
Post 22 Apr 2020, 14:53
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 814
Roman
Hmm. 8086.inc very interesting
Tomasz you decided try to write on the high level language ?
Or somthing close to high level language and change write vector\style on fasm.
Post 22 Apr 2020, 16:59
View user's profile Send private message Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 365
Ali.Z
ok as im making progress (very slow) i want to ask you guys if any of you have an 8087 fpu manual.

i searched many different times using many different key words and search engines, but didnt find anything, as i want to implement 8087 fpu encoding as well probably after completing the 8086 instruction set.

to be honest, whenever i look at the opcode map for 8086 i get mad because of the ESC codes for the coprocessor, which was a separate cpu but had it opcode in another cpu's manual wtf...

so either i encode everything or nothing Mad Evil or Very Mad or maybe promote it to 80386 and 80387 which will take even more time Sad Crying or Very sad

_________________
Asm For Wise Humans
Post 23 Apr 2020, 12:00
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7783
Location: Kraków, Poland
Tomasz Grysztar
Ali.Z wrote:
ok as im making progress (very slow) i want to ask you guys if any of you have an 8087 fpu manual.
I only ever had one for 80387. See here.

Now I looked through Bitsavers' site and I found one that seems to have a chapter about 8087:
http://bitsavers.trailing-edge.com/components/intel/80186/210911-001_iAPX86_88_186_188_Programmers_Reference_1983.pdf
But I think it is missing the opcode table.
Post 23 Apr 2020, 12:23
View user's profile Send private message Visit poster's website Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 365
Ali.Z
interesting, i got 80C187 80-bit math coprocessor manual and 80287 as well as 80387 manuals, i can share them if you want. (IIRC i think i have 80487 as well)

i asked a friend of mine (i call him mr.google because he is at god level when it come to searching) he is no programmer but i asked him to search for 8087 manual and he did it.

after 15-20 mins he found the document i want, and passed me the pdf.
when i asked him how did you found it, he answered "using advanced multi-search engine" wow.

although i dont need it right now, but its good to have it so when im done with 8086 i go for 8087.



btw tomasz, the 80186 pdf you shared is little bit different than the one i have.
so its good to have it in my collection, yes i collect manuals. (i have three 8086 manuals all intel genuine)

_________________
Asm For Wise Humans
Post 23 Apr 2020, 13:36
View user's profile Send private message Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 365
Ali.Z
hmm, i have to redesign and write couple things from scratch.

i wanted to do less hardcoded values, but as it appears that requires much more math than what im capable of.

so i guess i will do as many hardcoded values as possible, poor approach but its little bit simpler to handle.

_________________
Asm For Wise Humans
Post 24 Apr 2020, 05:49
View user's profile Send private message Reply with quote
donn



Joined: 05 Mar 2010
Posts: 194
donn
Haven't looked in the Intel manuals recently, but aside from the Opcode Maps in the appendix, the AMD docs have an Instruction Encoding Syntax flowchart, Figure 1.1, Page 2 of my General and System Instructions doc.

I used this, the first couple pages, and particular instruction pages (the bulk of the manual that describes an instruction and lists its arguments and params) when building Self Modified Instructions and it worked. I only used a subset of x86 instructions, allocated space so params could be encoded in them, and so forth so assembling every encoding would be different. But from the flow chart, you could see which step requires what when assembling. REX prefixes could be beginning step, for example. Or you could go instruction by instruction and call these prefix building steps from there, I think that's how fasmg did it, could be wrong. Paged Out #2 (link above) has a similar breakdown as the flow chart.

Even with a flow chart, opcode map, instruction documentation, there are challenges just in building the instructions, as you must be aware from the example you posted above, but it's very rewarding. I think when I first tried encoding some, I wasn't using a debugger so I couldn't really see examples of what I wanted, but using a debugger could be another piece.
Post 29 Apr 2020, 20:22
View user's profile Send private message Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 365
Ali.Z
donn wrote:
Even with a flow chart, opcode map, instruction documentation, there are challenges just in building the instructions


to me the challenge is not in how to encode instructions programmatically, its more on the design and flexibility.

also im working on both assembling and disassembling so i switch between them every now and then, depending on my mood.

still making progress very slow, but i believe i will be able finish either the assembler or disassembler within a month.

its gonna be the disassembler (most likely), since its easier and and faster to write code for it.

_________________
Asm For Wise Humans
Post 30 Apr 2020, 19:08
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.