Hypervisors are an interesting and exciting area of software development that is enjoying quite a bit of attention lately. There are many technical challenges in building hypervisors, the most demanding are 17 problematic instructions in the Intel ISA. However, with the new processors from Intel (VMX) and AMD (SVM), the problems presented by these 17 instructions are solved, paving the way for new kinds of effective and efficient virtualization software. It would be great for the FASM community to build some software using these new processors to show off the power of FASM. I'm currently writing my hypervisor using FASM (mostly C language with some FASM thrown in at the right places

). Once I get further along, I hope to publish some articles in US software developer magazines about my hypervisor and the benefits of using FASM for such work.
Based on a question vid posed about the problematic instructions, following was my response. Because of it's length, vid asked that I consolidate it into a new thread. So here it is ...
There are 17 instructions mentioned in Intel ISA instructions that need to be addressed in some fashion in order to achieve true virtualization. Until hardware support for virtualization (VMX and SVM) became a reality recently, the preferred method of dealing with these 17 sensitive instructions was to modify the guest OS (a method known as "para-virtualization"). Naturally, to make these modifications to the OS you need the source code to make the modifications and you need to recompile the OS. Obviously, this is not a problem for Linux, but a huge problem with Windows since we don't have access to the source code. So, the only method left to us for virtualizing Windows is to do dynamic binary translation of those 17 sensitive instructions; scan all code for these instructions and patch them at run-time. This creates a very significant performance loss and results, in part, in the "lag" you see when running Windows in Bochs or VMWare.
In order to support a Type I VMM (hypervisor), a processor must meet three virtualization requirements:
1. The method of executing non-privileged instructions must be roughly equivalent in both privileged and user mode. A processor must not use an additional bit in an instruction word or in the address portion of an instruction when in privileged mode.
2. There must me a method such as a protection system or an addess translation system to protect the real system and any other VMs fro the active VM.
3. There must be a way to automatically signal the VMM (hypervisor) when a VM attempts to execute one of the 17 sensitive instructions. It must also be possible for the hypervisor to simulate the effect of the instruction. Sensitive instructions include:
3A. Instructions that attempt to change or reference the mode of the VM or the stare of the machine.
3B. Instructions that read or change sensitive registers and/or memory locations such as the clock register and interupt registers.
3C. Instructions that reference the storage protection system, memory system, or address relocation system. This class of instruction includes instructions that would allow the VM to access any location no in its virtual memory.
3D. All I/O instructions.
The 17 sensitive instructions I mentioned all violate one of the listed requirement 3 (3A - 3D) above.
Several of the 17 violate requirement 3B (sensitive register instructions), namely: SGDT (Store Global Descriptor Table), SIDT (Store Interupt Descriptor Table), and SLDT (Store Local Descriptor Table). These instructions are normally only used by the OS but are NOT privileged in the Intel Architecture. Since Intel processors only have one LDTR, IDTR and GDTR, a problem arises when multiple operating systems try to use the same registers.
The next sensitive instruction is the SMSW (Store Machine Status Word) instruction. SMSW stores the machine status word (bits 0 - 15 of CR0) into a general purpose register or memory location. Bits 6 - 15 of CR0 are reserved and not to be modified. However, bits 0 - 5 contain system flags that control the operating mode and state of the processor. Although SMSW only stores the machine status word, it is sensitive and unprivileged. You can see the problem if a guest OS (VM) is running in real mode within a hypervisor (VMM) running in protected mode. If the VM checked the MSW to see if it was in real mode, it would incorrectly see that it was in protected mode (PE bit set) and could halt or shutdown and not be able to run successfully.
The next two sensitive instructions are PUSHF and POPF (and their 32-bit versions PUSHFD and POPFD). The issue with these instructions is similar to SMSW because pushing the EFLAGS register onto the stack allows examination of operating mode and state. POPF allows some of the EFLAGS bits to be changed. It varies based on the processor's current operating mode. In real-mode, or when operating at CPL 0, all non-reserved flags in the EFLAGS register can be modified except for the VM, VIP, and VIF flags. In virtual-8086 mode, the IOPL must equal 3 to use the POPF instructions. The IOPL allows an OS to set the privilege level needed to perform I/O. In virtual-8086 mode, these key flags are not affected by POPF. However, in protected mode, there are several conditions based on privilege levels. For example, if CPL is greater than 0 and <= to the IOPL, all flags can be modified (except IOPL). If POPF/POPFD is executed without enough privilege, an exception is NOT generated.
The next set of the 17 sensitive instructions violate requirement 3C above (Protection System References). Namely, LAR (Load Access Rights byte), LSL (Load Segment Limit), VERR/VERW (Verify a segment for reading or writing). The problem with these instructions is they all perform the following check during their execution (CPL -> DPL) OR (RPL -> DPL). This condition checks to ensure that the current privilege level (located in bits 0 and 1 of the CS register and SS register) and the requested privilege level (bits 0 and 1 of any segment selector) are both greater than the descriptor privilege level (privilege level of a segment). This is a problem because prior to VMX and SVM, VMs don't normally execute at the highest privilege level (CPL 0). For example, in Xen, VMs run at CPL 2 (ring 2) - they only "think" they are running at CPL 0. Therefore, if a VM running at CPL 2 executes any of LAR, LSL, VERR or VERW to examine a segment descriptor with a DPL < 3, it is likely that the instruction will not execute properly.
POP and PUSH are also included in this category of problematic instructions for similar reasons. POP cannot be used to load the CS register since it contains the CPL. A value that is loaded into a segment register must be a valid segment selector. The reason that POP is one of the problematic 17 instructions is it depends on the value of CPL. If the SS register is being loaded and the segment selector's RPL and the segment descriptor's DPL are not equal to the CPL, a general protection exception is raised. Furthermore, if the DS, ES, FS, or GS register is being loaded, the segment being pointed to is a nonconforming code segment or data, and the RPL and CPL are > the DPL, a general protection exception is raised. Therefore, as in the case with LAR, LSL, VERR and VERW, if the VM is at CPL 3 (ring 3) and did a privilege level check it would likely fail because it thinks it's running at CPL 0. If a process that thinks it's running at CPL 0 pushes CS onto the stack and checks it's CPL it will see that it's running at CPL 3 and may crash.
The next set of problematic instructions are CALL, JMP, INT n, and RET. CALL saves procedure linking information on the stack and branches to the procedure given in its destination argument. Naturally there are four types of calls (near, far calls to the same privilege level, far calls to a different privilege level, and task switches). Task switches and far calls to different privilige levels are a problem for virtualization because they involve CPL, DPL, and RPL. If a far call is executed to a different privilege level, the code segment for the procedure being accessed has to be accessed through the call gate. A task uses a different stack for every privilege level. Therefore, when a far call is made to another privilege level, the processor switches to a stack corresponding to the new privilege level of the called procedure. A task switch operates operates in a similar manner as a call gate. (The main difference being the target operand of the call instruction specifies the segment selector of a task gate instead of a call gate). Both call gate and task gate have many privilege level checks to compare the CPL and RPL to DPLs. Since the VM is running at CPL 2 or 3, these checks won't work properly with the guest OS tries to access call gates or task gates at CPL 0. The JMP and INT n instructions have similar problems for virtualization. (The INT n instruction references the protection system many times during it's execution). Naturally, the RET instruction has the opposite effect as CALL in that it transfers control to a return adress placed on the stack (normally by CALL). The RET instruction can be used for three different types of returns: near, far, and inter-privilege-level returns. Much like the CALL instruction, the inter-privilege-level far return examines the privilege levels and access rights of the code and stack segments that are being returned to determine if the operation should be allowed. The DS, ES, FS and GS segment registers are cleared by the RET instruction if they refer to segments that cannot be accessed by the new privilege level. Therefore, RET is problematic for virtualization because a VM running at CPL 3 could cause the DS, ES, FS and GS segment registers to not be cleared when they should be.
The next problematic instruction of the 17 instructions is STR (Store Task Register) because it references the protection system. The STR stores the segment selector from the task register into a general purpose registor or memory location. The segment selector that is stored with this instruction points to the task state segment of the current executing task. This instruction is problematic for virtualization because it allows a task to examine its requested privilege level (RPL).
The last problematic instruction for virtualization is (believe it or not) MOV. The MOV opcode that stores segment registers allows all six of the segment registers to be stored to either a general purpose register or memory location. This is a problem because the CS and SS registers both contain the CPL in bits 0 and 1. Thus, a task could store the CS or SS in a general purpose register to find that it's not running at the expected CPL. The MOV opcode that loads segment registers does offer some protection because it won't allow the CS register to be loade at all. However, if a task tries to load the SS register, several privilege level checks occur that become problematic for the reasons already explained.
So ..... those are the 17 sensitive, or problematic instructions that make virtualization of Windows difficult. The current version of Xen (and Paralllels, for that matter) handle this situation because they take advantage of the new virtualization features in the latest Intel and AMD processors (VMX and SVM, repectively). These hardware based virtualization features permit multiple ture CPL 0 levels. Therefore, VMMs (hypervisors) don't need to be "tricked" into "thinking" they are running at ring 0 when they are actually running in ring 2 or 3. With these new processors, they really are running at ring 0.
This is one reason I'm very excited about VMX and SVM. It opens up many new possibilities for effective and efficient virtualization software. Hardware supported virtualization (VMX and SVM) is a relatively new software development space and I think FASM can show it's power here.