flat assembler
Message board for the users of flat assembler.
Index
> Main > Table of Bytecode??? Goto page 1, 2 Next |
Author |
|
roticv 04 Apr 2004, 07:26
You are attacking it in the wrong manner, if you are coding a disassembler or assembler. Just look into teh intel manual or sandpile and it would show you how the opcodes are encoded.
|
|||
04 Apr 2004, 07:26 |
|
1800askgeek 04 Apr 2004, 20:09
... perhaps you could tell me where I might find such a manual? (Preferably the Intel one)
|
|||
04 Apr 2004, 20:09 |
|
Madis731 04 Apr 2004, 21:42
http://www.intel.com/design/Pentium4/documentation.htm
and you can look under manuals - one of them leads you to http://www.intel.com/design/pentium4/manuals/253665.htm ... Last edited by Madis731 on 08 May 2004, 07:21; edited 1 time in total |
|||
04 Apr 2004, 21:42 |
|
1800askgeek 08 May 2004, 07:12
... Well, judging by the fact that there have been 100+ views to this topic, I assume it must be interesting to some people. Maybe I'm not the only one who'd prefer a simple straight out table. Anyway for those of you who want it, I've worked out the XCHG commands. (They're the weirdest yet...)
90=XCHG AX,AX 93=XCHG AX,BX 91=XCHG AX,CX 92=XCHG AX,DX 93=XCHG BX,AX 87DB=XCHG BX,BX 87D9=XCHG BX,CX 87DA=XCHG BX,DX 91=XCHG CX,AX 87D9=XCHG CX,BX 87C9=XCHG CX,CX 87CA=XCHG CX,DX 92=XCHG DX,AX 87DA=XCHG DX,BX 87CA=XCHG DX,CX 87D2=XCHG DX,DX again this is hex equivilents. edit: I really don't get the purpose of using a byte for "mov ax,ax" and 2 bytes for "bx,bx" "cx,cx" etc... why on earth would anyone design a processor to do that?! it seems... useless. Someone want to take a stab at it? |
|||
08 May 2004, 07:12 |
|
roticv 08 May 2004, 07:34
I can post you the codes to my disassembler engine, but I must tell you even if such a table exist it would be gigantic. Anyway the xchg is optmised for eax in terms of eax.
Search for The Svin's opcode tutorial.. Too bad win32asmcommunity is down. |
|||
08 May 2004, 07:34 |
|
ShortCoder 08 May 2004, 08:32
Using a disassembler, and hand-assembling hex codes into files, then disassembling them, I was able to find out most of the hexcodes (only ones missing might be some SSE or SSE2 ..whatever that disassembler couldn't handle)
I saved that list on CD somewhere (oh gosh I HOPE I did because that took me like a WEEK to do) but it is fairly complete---I started with first opcode of 00 and worked up to FF, sequentially, and so on with the second byte, third byte, etc, until I worked it out. I think I ended up just writing out patterns once I found them because, otherwise, the list would be too HUGE (is big enough already). It should be on a CD somewhere that I have, so I'll look when I've got the time which might be a while especially considering I've got a lot of CDs and because I'm not exactly sure what I saved on what. If you can find it somewhere else first, be my guest, or try the disassembler thing like I did---took me about a week doing it that way, so more-or-less, guage on that (how worth it to you it is). |
|||
08 May 2004, 08:32 |
|
vid 08 May 2004, 17:55
machine code isn't that complicated (just bloated by IBM and Intel, thank you guys). 1byte instructions with register only operand usually have 3 bits that specify register:
000:ax 001:cx 010:dx 011:bx 100:sp 101:bp 110:si 111:di (i hope). Other instructions with one operand, and instruction with 2 operands have first byte followed by another byte, whose format is always almost same: lower 3 bits: operand 1 (or operand that access memory) next 3 bits: operand 2, on instructions with only one operand unused/ chooses 8 bit operand / chooses another instruction / chooses types of operands last 2 bits: addressing mode, 0 = use register, 1=memory 2=memory + byte, 3 = memory + word Operand can be register (table above) or memory (optionally + constant called "displacement"). In 16 bit code, there are 8 addressing types (3 bits for operand => 2^3 = 8). I don't remember values of these addressing types, sorry. In 32 bit code, if memory variable operand (it's 3 bits) contain some special value, then next byte of instruction specified extended addressing mode, thus allowing you all addressing by all combination of registers, *2 and *4. Then follows optional displacement and then optinal constant argument to instruction. I am not sure all these information are correct, but main idea is. In future will I probably add something on this to my tutorial, but it is far future. Maybe I could write an other article (like article on preprocessor) about this, if enough people encourage me to (eg. if it is wanted). |
|||
08 May 2004, 17:55 |
|
ShortCoder 09 May 2004, 00:29
Here it is---found it! It is fairly complete, I think but, like I said, probably is missing some SSE2 or SSE functions (whatever my disassembler I used didn't know of)
I am pleased to be able to contribute to the community in this way You will see lots of ... ---that is where it follows the pattern of the previous section (or a previous section)(should be pretty intuitive) and cuts out a LOT on size of the file that way. Took about a week to do so, I hope it helps you:)
|
|||||||||||
09 May 2004, 00:29 |
|
ShortCoder 09 May 2004, 00:53
Obviously, if anyone knows opcodes for any of the omissions there (ones that have <undefined>), please post here what the opcode is and the omitted instruction from there so I'll know as well:)
Also, it is a "best-faith" effort and, like the TODO says, I didn't test for signed/unsigned numbers and so I'm not sure which of those are for signed and which for unsigned (as long as the number is 7F or below, it'll be the same regardless (for a single byte) for example but 80 and up (for a single byte) and you'll have problems if you use the wrong one) If anyone wants to comment as to which are signed and which are unsigned, that would aid tremendously (and make it so I don't have to go through and do that;)) |
|||
09 May 2004, 00:53 |
|
roticv 09 May 2004, 02:12
Huh? What do you mean by test for sign and unsigned numbers? It does not matter.
Anyway there's quite a number of opcodes you missed out like cmovx, bswap, the dword conditional jumps. I do not know if Thomasz wants me to place this link, but http://aod.anticrack.de/ might explain how opcodes are encoded. The encoding of opcodes can be found in intel manual A1 to B52 |
|||
09 May 2004, 02:12 |
|
ShortCoder 09 May 2004, 06:02
roticv wrote: Huh? What do you mean by test for sign and unsigned numbers? It does not matter. Actually, it makes a big difference whether an operation is signed or unsigned version. For example, suppose we wish to add 175 and 20. suppose the size we are working with is size of a byte. Choose unsigned, and this works--175 in hex becomes AF and 20 becomes 14 added and you get C3 which is 195 in decimal, what you'd expect Choose signed, though, and you'll have MUCH different! Adding AF and 14 in signed will give you BD (NOT C3!!!) which, is then equal to -67 Reason? Because AF in signed is equal to -81 decimal and 20 - 81 = -67, or BD in hex. |
|||
09 May 2004, 06:02 |
|
roticv 09 May 2004, 06:45
That I know but how does it affect opcodes? Not very sure what you want.
|
|||
09 May 2004, 06:45 |
|
vid 09 May 2004, 11:17
ShortCoder: how can you add 175 and 20 with signed bytes. 175 can't fit into signed byte.
Anyway, you are wrong, AF+14 is C3 not depending if signed or not. You mistaken when adding AD + 14 signed, you added decimal 14, which is E, and which gives you BD (AD+0E = BC). When adding and substracting it doesn't matter if number are signed or not. |
|||
09 May 2004, 11:17 |
|
ShortCoder 09 May 2004, 13:12
vid wrote: ShortCoder: how can you add 175 and 20 with signed bytes. 175 can't fit into signed byte. You're right. Hmmm....I'm not really quite sure what I was thinking then. Well, if there is no difference between signed and unsigned, from a binary perspective, then what now puzzles me is why there are multiple opcodes for, perceivably, the exact same instruction (not counting ones where one set of opcodes reverses left and right operand and both operands happen to be the same, hence the same as well) |
|||
09 May 2004, 13:12 |
|
roticv 09 May 2004, 14:08
First there are some opcodes that are optimised, small in size if certain registers are used. For instance xchg eax, edx can be encoded as 1 byte while, xchg ebx, ecx are encoded as 2 bytes. The same applies to conditional jumps and jumps. jmp can be encoded in its byte displacement and the dword displacement variants. So why is there 2 different method of encoding? I am not sure, but personally I think it is for optimisation of size.
For the other question, Taking adc reg,reg as example adc is supposed to be encoded as 0001 00dw : mod/r/m w and d are bits. w tells the processor whether you are dealing with dword/word (w=1) or byte(w=0) in 32bit and word (w=1) or byte(w=0) in 16bit mode. d tells the processor whether it is adc rrr,mmm or adc mmm,rrr (refer below) modrm is encoded as xx:rrr:mmm where x,r,m are all bits. xx is the mod telling the processor how you are dealing with values in rrr and mmm. rrr is always a register while mmm can be a register or can be refered to as memory. if mod= 11b, means both of rrr and mmm are registers. if xx = 10b, it would be adc rrr, [reg+byte]. if xx = 01b, it would be adc rrr,[mem+dword] for 32bit. The encoding for 16bit is different, but i would not explain because I am not very sure of it and it is hard to explain. if xx = 00b, it would be adc reg,[reg]. For encoding of adc reg, [memory] it would be a special case of xx = 00b, but I would not go deeper into it. Because of the d flag and mod 11 it is possible for 2 opcodes to mean the same. |
|||
09 May 2004, 14:08 |
|
ShortCoder 09 May 2004, 14:57
Well, for example, what is the difference, then, between starting hex code of 82 and starting hex code of 83?
82 00 00 add word [bx+si],0000 83 00 00 add word [bx+si],0000 As far as I can tell, 82 hex and 83 hex behave the same but I can't believe they would be the same. Why? The purpose? And, yes I realize there are quite a few which have quicker variants which rely on accumulator register being one of the registers in the instruction, but I mean even besides those. There are more like this---this was just an example. |
|||
09 May 2004, 14:57 |
|
roticv 09 May 2004, 15:38
It should be
82 00 00 add byte [bx+si], 0 83 00 00 add word [bx+si], 0 |
|||
09 May 2004, 15:38 |
|
vid 09 May 2004, 19:44
ShortCoder:
one posibility is at intruction with two register operands. Look at my brief description of encoding - i said that in 2operand instruction there is one "main" operand (that can be register or memory, called "rm") and other operand (register only, called "r"). But nothing there tells which operand is main and which other, for example mov [bx],ax ;rm,r mov ax,[si+5] ;r,rm but no bit in second byte (where rm and r values are stored) tells you which is which. For that reason, there is separate opcode for "mov rm,r" and other opcode for "mov r,rm". And for instruction of type "mov r,r" both these opcodes can be used. This doesn't apply only to mov, but too to "add","sub","xchg","and" etc., there is basic set of instructions (8086 instructions) which all have same rules. |
|||
09 May 2004, 19:44 |
|
Jaques 16 Jun 2004, 23:49
Where can i find a GOOD raw binary editor?
-- sorry if im an idiot |
|||
16 Jun 2004, 23:49 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.