flat assembler
Message board for the users of flat assembler.

Index > Main > Table of Bytecode???

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
1800askgeek



Joined: 04 Apr 2004
Posts: 10
Location: Hawaii
1800askgeek
Does anyone have a table of the equivalent of FASM statements with their binary counterparts???

Example:
Quote:

==MOV Commands==
B8 XX YY = MOV AX, YYXXh
BB XX YY = MOV BX, YYXXh
B9 XX YY = MOV CX, YYXXh
BA XX YY = MOV DX, YYXXh
B0 XX = MOV AL, XXh
B4 XX = MOV AH, XXh
B3 XX = MOV BL, XXh
B7 XX = MOV BH, XXh
B1 XX = MOV CL, XXh
B5 XX = MOV CH, XXh
B2 XX = MOV DL, XXh
B6 XX = MOV DH, XXh

==Interupt Command==
CD XX = INT XXh


except, completed? (This is what I've been able to work out so far on my own, so I'm not 100% sure it's all correct.)

Please note that this is based on the hex. equivilents of binary....
Post 04 Apr 2004, 06:01
View user's profile Send private message Visit poster's website AIM Address Reply with quote
roticv



Joined: 19 Jun 2003
Posts: 374
Location: Singapore
roticv
You are attacking it in the wrong manner, if you are coding a disassembler or assembler. Just look into teh intel manual or sandpile and it would show you how the opcodes are encoded.
Post 04 Apr 2004, 07:26
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
1800askgeek



Joined: 04 Apr 2004
Posts: 10
Location: Hawaii
1800askgeek
... perhaps you could tell me where I might find such a manual? (Preferably the Intel one)
Post 04 Apr 2004, 20:09
View user's profile Send private message Visit poster's website AIM Address Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731


Last edited by Madis731 on 08 May 2004, 07:21; edited 1 time in total
Post 04 Apr 2004, 21:42
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
1800askgeek



Joined: 04 Apr 2004
Posts: 10
Location: Hawaii
1800askgeek
... Well, judging by the fact that there have been 100+ views to this topic, I assume it must be interesting to some people. Maybe I'm not the only one who'd prefer a simple straight out table. Anyway for those of you who want it, I've worked out the XCHG commands. (They're the weirdest yet...)

90=XCHG AX,AX
93=XCHG AX,BX
91=XCHG AX,CX
92=XCHG AX,DX

93=XCHG BX,AX
87DB=XCHG BX,BX
87D9=XCHG BX,CX
87DA=XCHG BX,DX

91=XCHG CX,AX
87D9=XCHG CX,BX
87C9=XCHG CX,CX
87CA=XCHG CX,DX

92=XCHG DX,AX
87DA=XCHG DX,BX
87CA=XCHG DX,CX
87D2=XCHG DX,DX

again this is hex equivilents.
edit: I really don't get the purpose of using a byte for "mov ax,ax" and 2 bytes for "bx,bx" "cx,cx" etc... why on earth would anyone design a processor to do that?! it seems... useless. Someone want to take a stab at it?
Post 08 May 2004, 07:12
View user's profile Send private message Visit poster's website AIM Address Reply with quote
roticv



Joined: 19 Jun 2003
Posts: 374
Location: Singapore
roticv
I can post you the codes to my disassembler engine, but I must tell you even if such a table exist it would be gigantic. Anyway the xchg is optmised for eax in terms of eax.

Search for The Svin's opcode tutorial.. Too bad win32asmcommunity is down.
Post 08 May 2004, 07:34
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
ShortCoder



Joined: 07 May 2004
Posts: 105
ShortCoder
Using a disassembler, and hand-assembling hex codes into files, then disassembling them, I was able to find out most of the hexcodes (only ones missing might be some SSE or SSE2 ..whatever that disassembler couldn't handle)

I saved that list on CD somewhere (oh gosh I HOPE I did because that took me like a WEEK to do) but it is fairly complete---I started with first opcode of 00 and worked up to FF, sequentially, and so on with the second byte, third byte, etc, until I worked it out.

I think I ended up just writing out patterns once I found them because, otherwise, the list would be too HUGE (is big enough already).

It should be on a CD somewhere that I have, so I'll look when I've got the time which might be a while especially considering I've got a lot of CDs and because I'm not exactly sure what I saved on what.

If you can find it somewhere else first, be my guest, or try the disassembler thing like I did---took me about a week doing it that way, so more-or-less, guage on that (how worth it to you it is).
Post 08 May 2004, 08:32
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
machine code isn't that complicated (just bloated by IBM and Intel, thank you guys). 1byte instructions with register only operand usually have 3 bits that specify register:
000:ax
001:cx
010:dx
011:bx
100:sp
101:bp
110:si
111:di
(i hope). Other instructions with one operand, and instruction with 2 operands have first byte followed by another byte, whose format is always almost same:
lower 3 bits: operand 1 (or operand that access memory)
next 3 bits: operand 2, on instructions with only one operand unused/
chooses 8 bit operand / chooses another instruction /
chooses types of operands
last 2 bits: addressing mode, 0 = use register, 1=memory
2=memory + byte, 3 = memory + word
Operand can be register (table above) or memory (optionally + constant called "displacement"). In 16 bit code, there are 8 addressing types (3 bits for operand => 2^3 = 8). I don't remember values of these addressing types, sorry.

In 32 bit code, if memory variable operand (it's 3 bits) contain some special value, then next byte of instruction specified extended addressing mode, thus allowing you all addressing by all combination of registers, *2 and *4. Then follows optional displacement and then optinal constant argument to instruction.

I am not sure all these information are correct, but main idea is. In future will I probably add something on this to my tutorial, but it is far future.

Maybe I could write an other article (like article on preprocessor) about this, if enough people encourage me to (eg. if it is wanted).
Post 08 May 2004, 17:55
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
ShortCoder



Joined: 07 May 2004
Posts: 105
ShortCoder
Here it is---found it! It is fairly complete, I think but, like I said, probably is missing some SSE2 or SSE functions (whatever my disassembler I used didn't know of)

I am pleased to be able to contribute to the community in this way

You will see lots of ... ---that is where it follows the pattern of the previous section (or a previous section)(should be pretty intuitive) and cuts out a LOT on size of the file that way.

Took about a week to do so, I hope it helps you:)


Description: my own personal gathering of hex codes using a disassembler
Download
Filename: hexcodes.txt
Filesize: 39.73 KB
Downloaded: 491 Time(s)

Post 09 May 2004, 00:29
View user's profile Send private message Reply with quote
ShortCoder



Joined: 07 May 2004
Posts: 105
ShortCoder
Obviously, if anyone knows opcodes for any of the omissions there (ones that have <undefined>), please post here what the opcode is and the omitted instruction from there so I'll know as well:)

Also, it is a "best-faith" effort and, like the TODO says, I didn't test for signed/unsigned numbers and so I'm not sure which of those are for signed and which for unsigned (as long as the number is 7F or below, it'll be the same regardless (for a single byte) for example but 80 and up (for a single byte) and you'll have problems if you use the wrong one)

If anyone wants to comment as to which are signed and which are unsigned, that would aid tremendously (and make it so I don't have to go through and do that;))
Post 09 May 2004, 00:53
View user's profile Send private message Reply with quote
roticv



Joined: 19 Jun 2003
Posts: 374
Location: Singapore
roticv
Huh? What do you mean by test for sign and unsigned numbers? It does not matter.

Anyway there's quite a number of opcodes you missed out like cmovx, bswap, the dword conditional jumps.

I do not know if Thomasz wants me to place this link, but http://aod.anticrack.de/ might explain how opcodes are encoded.

The encoding of opcodes can be found in intel manual A1 to B52
Post 09 May 2004, 02:12
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
ShortCoder



Joined: 07 May 2004
Posts: 105
ShortCoder
roticv wrote:
Huh? What do you mean by test for sign and unsigned numbers? It does not matter.

Anyway there's quite a number of opcodes you missed out like cmovx, bswap, the dword conditional jumps.

I do not know if Thomasz wants me to place this link, but http://aod.anticrack.de/ might explain how opcodes are encoded.

The encoding of opcodes can be found in intel manual A1 to B52


Actually, it makes a big difference whether an operation is signed or unsigned version.

For example, suppose we wish to add 175 and 20.
suppose the size we are working with is size of a byte.
Choose unsigned, and this works--175 in hex becomes AF and 20 becomes 14
added and you get C3 which is 195 in decimal, what you'd expect

Choose signed, though, and you'll have MUCH different!
Adding AF and 14 in signed will give you BD (NOT C3!!!) which, is then equal to -67

Reason? Because AF in signed is equal to -81 decimal and 20 - 81 = -67, or BD in hex.
Post 09 May 2004, 06:02
View user's profile Send private message Reply with quote
roticv



Joined: 19 Jun 2003
Posts: 374
Location: Singapore
roticv
That I know but how does it affect opcodes? Not very sure what you want.
Post 09 May 2004, 06:45
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
ShortCoder: how can you add 175 and 20 with signed bytes. 175 can't fit into signed byte.

Anyway, you are wrong, AF+14 is C3 not depending if signed or not. You mistaken when adding AD + 14 signed, you added decimal 14, which is E, and which gives you BD (AD+0E = BC). When adding and substracting it doesn't matter if number are signed or not.
Post 09 May 2004, 11:17
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
ShortCoder



Joined: 07 May 2004
Posts: 105
ShortCoder
vid wrote:
ShortCoder: how can you add 175 and 20 with signed bytes. 175 can't fit into signed byte.

Anyway, you are wrong, AF+14 is C3 not depending if signed or not. You mistaken when adding AD + 14 signed, you added decimal 14, which is E, and which gives you BD (AD+0E = BC). When adding and substracting it doesn't matter if number are signed or not.


You're right. Hmmm....I'm not really quite sure what I was thinking then.

Well, if there is no difference between signed and unsigned, from a binary perspective, then what now puzzles me is why there are multiple opcodes for, perceivably, the exact same instruction (not counting ones where one set of opcodes reverses left and right operand and both operands happen to be the same, hence the same as well)
Post 09 May 2004, 13:12
View user's profile Send private message Reply with quote
roticv



Joined: 19 Jun 2003
Posts: 374
Location: Singapore
roticv
First there are some opcodes that are optimised, small in size if certain registers are used. For instance xchg eax, edx can be encoded as 1 byte while, xchg ebx, ecx are encoded as 2 bytes. The same applies to conditional jumps and jumps. jmp can be encoded in its byte displacement and the dword displacement variants. So why is there 2 different method of encoding? I am not sure, but personally I think it is for optimisation of size.

For the other question,
Taking adc reg,reg as example

adc is supposed to be encoded as

0001 00dw : mod/r/m

w and d are bits.

w tells the processor whether you are dealing with dword/word (w=1) or byte(w=0) in 32bit and word (w=1) or byte(w=0) in 16bit mode.

d tells the processor whether it is adc rrr,mmm or adc mmm,rrr (refer below)

modrm is encoded as xx:rrr:mmm
where x,r,m are all bits.

xx is the mod telling the processor how you are dealing with values in rrr and mmm. rrr is always a register while mmm can be a register or can be refered to as memory. if mod= 11b, means both of rrr and mmm are registers. if xx = 10b, it would be adc rrr, [reg+byte]. if xx = 01b, it would be adc rrr,[mem+dword] for 32bit. The encoding for 16bit is different, but i would not explain because I am not very sure of it and it is hard to explain. if xx = 00b, it would be adc reg,[reg]. For encoding of adc reg, [memory] it would be a special case of xx = 00b, but I would not go deeper into it.

Because of the d flag and mod 11 it is possible for 2 opcodes to mean the same.
Post 09 May 2004, 14:08
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
ShortCoder



Joined: 07 May 2004
Posts: 105
ShortCoder
Well, for example, what is the difference, then, between starting hex code of 82 and starting hex code of 83?

82 00 00 add word [bx+si],0000
83 00 00 add word [bx+si],0000

As far as I can tell, 82 hex and 83 hex behave the same but I can't believe they would be the same. Why? The purpose?

And, yes I realize there are quite a few which have quicker variants which rely on accumulator register being one of the registers in the instruction, but I mean even besides those.

There are more like this---this was just an example.
Post 09 May 2004, 14:57
View user's profile Send private message Reply with quote
roticv



Joined: 19 Jun 2003
Posts: 374
Location: Singapore
roticv
It should be

82 00 00 add byte [bx+si], 0
83 00 00 add word [bx+si], 0
Post 09 May 2004, 15:38
View user's profile Send private message Visit poster's website MSN Messenger Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
ShortCoder:
one posibility is at intruction with two register operands. Look at my brief description of encoding - i said that in 2operand instruction there is one "main" operand (that can be register or memory, called "rm") and other operand (register only, called "r"). But nothing there tells which operand is main and which other, for example
mov [bx],ax ;rm,r
mov ax,[si+5] ;r,rm
but no bit in second byte (where rm and r values are stored) tells you which is which. For that reason, there is separate opcode for "mov rm,r" and other opcode for "mov r,rm".
And for instruction of type "mov r,r" both these opcodes can be used.

This doesn't apply only to mov, but too to "add","sub","xchg","and" etc., there is basic set of instructions (8086 instructions) which all have same rules.
Post 09 May 2004, 19:44
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Jaques



Joined: 07 Jun 2004
Posts: 79
Location: Everywhere
Jaques
Where can i find a GOOD raw binary editor?

-- sorry if im an idiot
Post 16 Jun 2004, 23:49
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.