flat assembler
Message board for the users of flat assembler.

Index > Projects and Ideas > To build a new processor from scratch

Author
Thread Post new topic Reply to topic
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
Studying opcode encoding, i read in intel docs a stratification of years of unresolved problems. These docs do not say nothing on how to begin/follow
opcodes to decode them. They are the litterally "KAOS" "per-se".

Ok, the processor runs good! but, there no other way to run the same instruction at a lower cost of resource/performance/logic ?

I am reading different Lenght Disassembly Engine to learn how to decode instructions (mainly to disassemble them).

Assume this:
I have nothing again prefixing 16 bit code on a 64 bit machine. It is not the back compatibility the problem.

But,feel free to correct me if i mistake (but please give evidence of what you say):

is it not weird/stupid that after checking the prefix
0Fh (for 2-bytes opcode) -with no MATCH at all !!!,
one must check prefixes for 3-bytes opcode 38-3A-7A-7B ????

...and then again check for the group (3,Eb/Ev) F6/F7 that
is essentially a 1-opcode byte ?

I do not know how to build a Processor. If i could have done it, I would had let run all the software till now in an interpreted (read almost as "debugged") "legacy" processor shell, and the new software in a native way, where the logic take advantages from the actual knoweledge, and from the experience of years.

As example of bad processor design, take the Y2k bug.
Have we any experience on/about this bug ? If yes,
think about the concept of REX/DREX byte...
on 64 bit machine...Laughing

from intel Docs wrote:
REX prefixes are instruction-prefix bytes used in 64-bit mode. They do the following:
• Specify GPRs and SSE registers.
• Specify 64-bit operand size.
• Specify extended control registers.
Not all instructions require a REX prefix in 64-bit mode...

If we... run on 64 bit machine we need back-compatibility, not a way
to specify native 64 bit operand size...Laughing
Is it for emulation purpouse on 32 bit machine Laughing ?
Are you laughing loudly?
Are you thinking ??
Laughing Laughing Laughing
Post 19 Jun 2009, 05:06
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
The whole advancement on x86 has been kludge on kludge. Very Happy

It's not so bad really,
Code:
QWORD.atouiW:
   xor eax,eax             ; 33 C0
     jmp .0                  ; EB 0D
@@:      cmp edx,9               ; 83 FA 09
  ja @F                   ; 77 14
     lea rax,[rax+rax*4]     ; 48 8D 04 80
       lea rax,[rdx+rax*2]     ; 48 8D 04 42
.0:        movzx edx,word [rcx]    ; 0F B7 11
  add rcx,2               ; 48 83 C1 02
       sub edx,'0'           ; 83 EA 30
  jae @B                  ; 73 E7
@@:      retn                    ; C3    
(Three more bytes than the 32-bit version.)

The current design is very good at predicting code fetch, so the optimization needed to the decode stage has been minimal. IIRC, Intel already has patents for caching of macro/micro OPs (instead of the current instruction cache) to reduce decode pressure, but don't use it.
Post 19 Jun 2009, 06:18
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
bitRAKE wrote:
(Three more bytes than the 32-bit version.)

Ok, that's not so bad.
And... I am taking the positive side of the whole.
Thank you for your answering.
Post 19 Jun 2009, 12:21
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
I had more time to think about your post...
Quote:
is it not weird/stupid that after checking the prefix
0Fh (for 2-bytes opcode) -with no MATCH at all !!!,
one must check prefixes for 3-bytes opcode 38-3A-7A-7B ????

...and then again check for the group (3,Eb/Ev) F6/F7 that
is essentially a 1-opcode byte ?
Shouldn't prefix bytes just be consumed until a non-prefix byte is present - regardless of the instruction? Then error / back-track when prefixes invalidate instruction. The processor responds to invalid prefixes either by ignoring them (REP) or raising an exception (LOCK).

I've always imagined the processor is as you describe:
Quote:
If i could have done it, I would had let run all the software till now in an interpreted (read almost as "debugged") "legacy" processor shell, and the new software in a native way, where the logic take advantages from the actual knowledge, and from the experience of years.
...after fetch/decode the processor translates instructions to run on an efficient 64-bit core. Many legacy instructions are 'interpreted' this way. IIRC, the Pentium I was the first x86 processor to divide the instructions in this manner (without it being a separate chip, FPU integration in i486) - wikipedia confirms it. They call it superscalar, and they continue to evolve this design feature.

Quote:
If we... run on 64 bit machine we need back-compatibility, not a way
to specify native 64 bit operand size...
Are you suggesting 64-bit data size should be the default? AMD decided smaller accesses were more common (might be historical bias). It does seem awkward with the 64-bit address default, and limited use of upper dword (getting cleared all the time). Larger data access requiring longer instructions seems okay - MMX/SSE is that way as well.
Post 21 Jun 2009, 18:22
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
The matter is complex. I will print the thread to think cold about it. For example,
Quote:
Shouldn't prefix bytes just be consumed until a non-prefix byte is present - regardless of the instruction?

Yes
And the matter is how to check in Intel's mechanism of prefixing.

They have developed the prefixes 66h/67h/F2H/F3h. Then they have for new SIMD instruction like
CVTDQ2PD, considered a 2-byte opcode instruction,

Cap 2.1.2 (A-M instruc. reference)
Quote:

A mandatory prefix (66H, F2H, or F3H), an escape opcode byte, and a second
opcode byte (same as previous bullet)
For example, CVTDQ2PD consists of the following sequence: F3 0F E6. The first byte
is a mandatory prefix (it is not considered as a repeat prefix).


They have forgotten (thanks to God!!) the 67H for this time !!!
The 67h is imho yet a good prefix because
-1) it is for back compatibility
-2) they have not yet re-touched it.

And now, what is again the common sense:

F3h was the REP (a prefix!!!) on old machines,now it could be a mandatory prefix so that
one reads the following byte after considering that the previous is not 100% a REP PREFIX.

On old machines after F3 several things, but F3 could have been discarded
On new machine after F3 several things, but F3 cannot be simply neglected/discarded as an invalid/superfluos REP !!!

This invalidate the position as you say
Quote:
Shouldn't prefix bytes just be consumed until a non-prefix byte is present - regardless of the instruction?

It cannot be "regardless" beause the mandatory prefix for CVTDQ2PD is by design part of the instruction.

I am not against their inventivity, the could invent what they like.
But that of mine is a common-sense concept :

If i call in the USA, i do
PREFIX (USA) 001
PREFIX (STATE) 213 Hollywood
NUMBER YYYYYYY

I dial 001-213-YYYYYYY to call in Hollywood
But what if one day one will establish that the 001-415 will remain Palo Alto,when calling from Germany,
with 001 discarded,if i call from Palo Alto and, on the contrary,001-213 will be a mandatory prefix for an hotline ?

Quote:
Shouldn't prefix bytes just be consumed until a non-prefix byte is present - regardless of the instruction?

Should i dial in Hollywood by its prefixes regardless of the number 001-213 ?
What If i call in Hollywood from Palo Alto ? what prefix should i dial ? 213 ?

Regards,
hopcode
Post 22 Jun 2009, 02:47
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
By consume I do not imply the prefix should be discarded or ignored. Once a non-prefix byte is reached then analysis of prefixes is performed. I do understand though, how it does make dis-assembly more difficult than prefixes being a binary switch.

F3 66 F3 47 0F E6 3C F7 cvtdq2pd xmm15,[r15+r14*8]

...executes just fine! First prefixes prior to E6 selects instruction - first 0F (special); REX bytes are prefixes in long mode; and then F3 (of group 66 F2 F3). IMHO, handling 0F as a special type of prefix works better because E6 3C is a valid instruction.

Hopefully, no one got the bright idea to align instructions with bogus prefixes (AMD) - that would kill forward compatibility with the introduction of the more advanced decode semantics. Idea
Post 22 Jun 2009, 13:22
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
Quote:

F3 66 F3 47 0F E6 3C F7 cvtdq2pd xmm15,[r15+r14*8]
...executes just fine!
Olly doesnt understand it but PeBrowse (32bit) ...

0x403000: F366 REPZ
0x403002: F347 REPZ INC EDI

and...surprise!!

0x403004: 0FE63CF7 CVTTPD2DQ XMM7,XMMWORD PTR [ESI*8+EDI]

It understand 99% of the 64bit code.
This shows how the 4 bytes F3 66 F3 47 are only a waste of resource.

Quote:
no one got the bright idea to align instructions with bogus prefixes
I will tell you who had this idea.

Think upon POP CS (valid on some 186 machine) It is the 0Fh!!
it would have become the future famous prefix 0Fh. I imagine
that they have eliminated it because they needed a simple nibble.
Ok, but it was no more supported. This is good so. But they have
leaved untouched from 40h->5Fh INC/DEC/PUSH/POP for all r32 registers.

5f - 40 = 31 lost bytes in the main table

They have not implemented them with a MOD/RM with, for example,
only one 40h +MOD for the instructions and r32 regs !!! So
for the instruction cvtdq2pd, the prefixes are all before!!
All bogus! Why ? They could have implemented it with
postfixes, because

1-) it doesn't need back compatibility (PEBrowse show it 99% correct)
2-) F3 (REP) was not eliminated from the table,like the POP CS instruction.

And, think that in 64bit mode they could have implemented the same instruction
only by re-using the dismissed ones BOUND/AAM etc.

It executes fine,but doesnt follow the economy of technology.
(Please, take this one last statement of mine as a not necessairly negative).

Regards,
hopcode
Post 23 Jun 2009, 11:35
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
I have had an (perhaps already existing) idea.

All processors i know have these peculiarities,
and they show always the same difficulties:

1) they do not distinguish data from code
2) they try to cache data and/or code

but they have to

1a) work on different instruction's len/data sizes
2a) execute branching algos etc. to improve speed/caching

I am wondering whether exists or not a processor,
a "structured" processor, i would say, in which
we find the following feature of opcodes, for example,32bit:

a prefix of 2 bytes (it may be 4 too,why not ?)

Code:
|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|  bits in 2 bytes prefix
_________________________________ ______________
  i) instruction's part           d) 4 bit description's part
    


in 4 bits we could describe the type of i)
and the following b) body of instruction (4 bytes).
Also, the instruction has ALWAYS 4+2=6 bytes of len

As example,
the d) part tell us 0->15 informations/flags:
Code:
0000 -> data follows in the b) part
0001 -> code follows in the b) part
0010 -> mix of data+code in the b) part

0100 -> RAW POINTER TO INSTRUCTION follows in the b) part,
        it is to say, for example, one of reg/reg type (mov eax,ebx)
        they are NOT OPCODED! they are POINTER TO ALREADY stored 
        opcodes/operations, "called" from a list in an 
        INSTRUCTION DESCRIPTOR TABLE (similiar to interrupts)
        In this case, 4 bytes could describe a lot of registers
        and informations/packed instructions

1000 -> OPCODED instruction follows in the b)part
        like we know as they are today on processors.
....
    

If i find for example a word with the LSBits set to
0000
i will know that in the b) part i will find only datas
from a presumably data section, even if they are code
to be executed (in this case -> exceptions)

This will be important for these reasons:

1r) stark structuring of the program
2r) more security
3r) no brainfucked algos to perform caching
4r) PROGRAMMERS optimize their code
5r) ...all the advantage from a "structured" execution
instead of the slow parsing of an unstrucured/prone-to-errors
sequence of opcodes

Do such a "structured" processor exists ?

Regards,
hopcode
Post 02 Sep 2009, 01:55
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17287
Location: In your JS exploiting you and your system
revolution
The Itanium has a 5 bit prefix before each bundle to describe the contents of the following 123 bits (128 bits total).
Post 02 Sep 2009, 03:00
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
revolution wrote:
The Itanium has a 5 bit prefix before each bundle to describe the contents of the following 123 bits (128 bits total).

Itanium is recent (but i dont know nothing about it)
...Also, the same Intel has realized a sort of fixed prefixing... Evil or Very Mad
i cannot believe it Shocked

(btw, thanks for the interesting links "From Sand to Silicon: the Making of a Chip")

Regards,
hopcode
Post 02 Sep 2009, 03:41
View user's profile Send private message Visit poster's website Reply with quote
booter



Joined: 08 Dec 2006
Posts: 67
booter
hopcode wrote:
I have had an (perhaps already existing) idea.
I am wondering whether exists or not a processor,
a "structured" processor, i would say, in which
we find the following feature of opcodes, for example,32bit:

IBM mainframe (360,370,etc.) BTW, its assembler is 1000 times easier!
Post 28 Dec 2009, 11:24
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
I have browsed the net and here, what i have found

for IBM S/390 Assembler tutorial
http://csc.colstate.edu/woolbright/WOOLBRIG.htm

for IBM S/370 Assembler tutorial
http://cbttape.org/~jmorrison/s370asm/html/tut-contents.html

That is very interesting.

booter wrote:
its assembler is 1000 times easier

At a first glance Yes, and no (because i am not used at it)
for example, from this table:
Code:
BC B"1000",LAB  or  BC 8,LAB   Branch if condition code is zero
BC B"0100",LAB  or  BC 4,LAB   Branch if condition code is one
BC B"0010",LAB  or  BC 2,LAB   Branch if condition code is two
BC B"0001",LAB  or  BC 1,LAB   Branch if condition code is three
BC B"1001",LAB  or  BC 9,LAB   Branch if condition code is three or zero
BC B"1011",LAB  or  BC 11,LAB  Branch if condition code is three, two or zero
    

NO: it is not so intuitive
Code:
BC B"1000",LAB  or  BC 8,LAB   Branch if condition code is zero    

but YES: 2 branching on one line ,adding
Code:
 BC B"1001",LAB  or  BC 9,LAB   ;Branch if condition code is three or zero
 A R5,20( R7,R8 ) ;the word 20 bytes past the sum of R7 and R8 is added to R5     
For the produced opcodes, it is just what i asked for. Reading them is simpler
as when reading the well-known intel ones. As we can see from here

http://csc.colstate.edu/woolbright/INSTORG.HTM
and
IBM S/360 "an imaginary journey through the evolution of the S/370 hardware design project"
http://cbttape.org/~jmorrison/s370asm/html/tut-S370-design-001.html
Instructions format is highly structured.
One could learn opcode-reading by memory in few days!!

The whole thing sounds to me very inspirative. As i have
a couple of hours free -perhaps end of January-, i will install the Hercules emulator
to test something written in ASM for IBM/360/370

Thank you for your precious reference,

Regards,
hopcode
Post 29 Dec 2009, 05:50
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
Something else about the weirdness of Intel opcodes...

In an early '86 stage they implemented in the group F6
different instructions, which dont follow the MOD byte rules, example A
Code:
 F6C1 10  TEST CL,10
 F6D1     NOT CL
    


But, almost at the same time, developers of the math co-processor,
(more INTEL-ligent persons i think) starting from the fact that MOD byte
is a good solution to re-functionalize already-functional opcodes,
took their solution, in that same early stage, to encode FPU instructions,
as follow, example B:
Code:
 D805 88204000 FADD DWORD PTR DS:[402088]
 D8C1          FADD ST,ST(1)
    

where the MOD byte is fully functional as MOD byte.
It is superflous to say that this one was a really future-forwarded
compatible feature.

As i said: they had their own reasons to invent such a caotic patchwork,
as exposed in example A, in which one must check the MOD byte, not for the purpouse
to find infos about what follows, but to separate, the test instruction
from the other ones of the group. This is what i call a "ballast", to get rid of.

As you know, assembler is not an option.
It is the Way to talk to machines.
And as you know, bad fundemantals leads to squared impossiblility !!!

Paradoxically, following their patchwork-layout, i have found a new way
to read the len of instructions in my rebuilt LDE. At the moment, i do not
publish it because it is in test-phase, because i am too busy in the userland.
I think it could be the smallest LDE of the world, but i would state it as i
publish it, in my di-fasm thread.

Anyway,I remain curious to know their (good/bad) reasons for that patchwork, whatever
they are, because they are (superflous to say) part of their success.

Greetings,
hopcode
Post 06 Jan 2010, 05:34
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
Similar patchwork can be seen in 680x0 encoding.

http://aggregate.org/ has some research on assember / compiler creation - looking at several perspectives of the problem.

If we were engineering extensions to x86 from an existing design, we would have detailed knowledge of how each encoding flows through. Every design will have delays to syncronize resources. In some places we would see complex or seldom used instructions which would not be impacted by neighboring encodings.

Just like managing a group of people to a coordinated goal: some have special talents, others just get the job done, and a few are just plain slow (despite incentive). I wish social scheduling were as simple as one of the computer scheduling algorithms, but they are historically related.

Complexity results for scheduling problems.

Maybe peel off the decoder to reveal the macro-/micro-OPs used by the core - emulating the x86 decoder in software (a la Transmeta (owned one) or Itanium). This could be done every few years as cores abstract away previous encodings. Very Happy The common abstraction does open the door for other players - despite Intel's success.
Post 06 Jan 2010, 08:13
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.