cpu virtualization - intel · cpu virtualization yu ke
TRANSCRIPT
http://www.intel.com/opensource
2009-11-211Software and Service Group 2009 虚拟化技术全国高校师资研讨班
CPU Virtualization
Yu Ke <[email protected]>
Jiang Yunhong <[email protected]>
http://www.intel.com/opensource
2009-11-212Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Legal DisclaimerINFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice.All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others.Copyright © 2009 Intel Corporation.
http://www.intel.com/opensource
2009-11-213Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Agenda
• CPU Virtualization Overview
• CPU Virtualization Hardware Support (VT-x)
• CPU Virtualization Software Implementation
• Summary
3
http://www.intel.com/opensource
2009-11-214Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Summary of Last Session
• What is Virtualization
• Virtualization Challenge
• Virtualization Technologies
4
http://www.intel.com/opensource
2009-11-215Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Agenda
• CPU Virtualization OverviewCPU Virtualization GoalWhat is CPU: Software’s ViewCPU Virtualization Approach
• CPU Virtualization Hardware Support (VT-x)
• CPU Virtualization Software Implementation
• Summary
5
http://www.intel.com/opensource
2009-11-216Software and Service Group 2009 虚拟化技术全国高校师资研讨班
CPU Virtualization Goal
•Goal: provide guest software the same ISA (Instruction Set Architecture) as physical CPU
•ISA definesthe state visible to the software
• Registers and memorythe instruction that operate on the state
6
http://www.intel.com/opensource
2009-11-217Software and Service Group 2009 虚拟化技术全国高校师资研讨班
System Execution ResourceSystem Execution ResourceBasic Execution ResourceBasic Execution Resource
CPU Software’s View: Resource
7
General Purpose Register
General Purpose Register
RFLAGRFLAG RIPRIP
FPU RegistersFPU Registers
MMX RegistersMMX Registers XMM RegistersXMM Registers
Segment RegistersSegment Registers CRsCRs
I/O PortsI/O Ports
MemoryControl Register:
GDTR, IDTR, LDTR…
MemoryControl Register:
GDTR, IDTR, LDTR…
Debug RegistersDebug Registers
MTRRsMTRRs
MSRsMSRs
Performance Monitoring
Counter
Performance Monitoring
Counter
View 1: Set of Execution Resource
Address
Space
Address
Space
http://www.intel.com/opensource
2009-11-218Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Mode of OperationMode of OperationRingRing
CPU Software’s View: Mode
8
View 2: Predefined Mode - Ring and Mode of Operation
Real ModeReal Mode
ProtectedMode
ProtectedMode
Virtual-8086Mode
Virtual-8086Mode
IA32e ModeIA32e Mode
System Management
Mode
System Management
Mode
ring3ring3
ring2ring2
ring1ring1
ring0ring0
http://www.intel.com/opensource
2009-11-219Software and Service Group 2009 虚拟化技术全国高校师资研讨班
CPU Software’s View: Instruction
• Privileged Instruction: Those that trap if the processor is in user mode and do not trap if it is in system mode. Non-privileged Instruction
• Sensitive Instruction: Control sensitive instructions
• Those that attempt to change the configuration of resources in the system.
Behavior sensitive instructions• Those whose behavior or result
depends on the configuration of resources (the content of the relocation register or the processor's mode).
9
View 3: Execute Instruction Set with Predefined Semantic
non-privileged instruction
sensitiveinstructionsensitive
instruction
http://www.intel.com/opensource
2009-11-2110Software and Service Group 2009 虚拟化技术全国高校师资研讨班
CPU Software’s View: interruption
10
View 4: Response to Interruption
synchronousexception
asynchronous interrupt
http://www.intel.com/opensource
2009-11-2111Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Resource Virtualization
• Basic Execution Resource: context switch approach
Context save/restore during virtual CPU switch
• System Execution Resource: “Physical / Virtual / Shadow” approach
Host Mode• Physical context
Guest mode• Virtual context• Shadow context
11
physical context
CPU
VMM Guest
shadowcontext
virtualcontext
applyapply
VMM transform
“Physical / Virtual / Shadow” Approach
http://www.intel.com/opensource
2009-11-2112Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Ring Virtualization
12
Virtual Machine Monitor (VMM)Virtual Machine Monitor (VMM)
VM0
Guest OS
Apps
VM0
Guest Kernel
Guest Apps
VM0
Guest OS
Apps
VM1
Guest Kernel
Guest Apps
VM0
Guest OS
Apps
VM2
Guest Kernel
Guest Apps
Ring0
Ring1
Ring3
Traditional Ring De-privileging: virtualization hole issue!Hardware-assisted approach can cleanly solve this issue
http://www.intel.com/opensource
2009-11-2113Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Instruction & Interrupt Virtualization
• Run instruction without interveningApply to basic instruction
• Trap and EmulationApply to sensitive instructionsApply to interrupt
13
http://www.intel.com/opensource
2009-11-2114Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Agenda
• CPU Virtualization Overview
• CPU Virtualization Hardware Support (VT-x)
• CPU Virtualization Software Implementation
• Summary
14
http://www.intel.com/opensource
2009-11-2115Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Why need H/W support
• S/W approach has virtualization holesRing AliasingNon-trapping instructionsExcessive FaultingInterrupt Virtualization IssuesCPU state context switchingAddress Space Compression
• S/W approach lead to complex workaround under traditional IA architecture
Guest OS level source code modification (para-virtualization)Guest OS binary level patching
15
H/W support can eliminate virtualization holes and simplify VMM
http://www.intel.com/opensource
2009-11-2116Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VT-x: Key Features
• New mode of operation for guestAllows VMM control of guest operationNeed not use segmentation to control guestGuest can run at its intended ring
• New structure controls CPU operationVMCS: virtual-machine control structureResides in physical-address spaceNeed not be in guest’s linear-address space
16
http://www.intel.com/opensource
2009-11-2117Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VMX Operation
• New mode of operation for VMM and VMs
• Entered with new VMXON instruction
• VMX root operation:Fully privileged, intended for VM monitorEntered on exits from guest software
• VMX non-root operation:Not fully privileged, intended for guestExplicitly entered by VMM (new instructions)
17
http://www.intel.com/opensource
2009-11-2118Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VMX Transitions: Overview
•VM entryTransition from VMM to guestEnters VMX non-root operationVMLAUNCH used for initial entryVMRESUME used subsequently
•VM exitGuest-to-VMM transitionEnters VMX root operationCaused by external events,exceptions,some instructions
18
Virtual Machines (VMs)
Apps
OS
VMM
Apps
OS
VM Exit VM Entry
http://www.intel.com/opensource
2009-11-2119Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VMX Operation
19
Ring 0
Ring 3VMX RootOperation
VMX Non-RootOperation
…Ring 0
Ring 3
VM 1
Ring 0
Ring 3
VM 2
Ring 0
Ring 3
VM n
VMXONVMLAUNCHVMRESUME
VM Exit VMCS2
VMCSn
VMCS1
http://www.intel.com/opensource
2009-11-2120Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Virtual Machine Control Structure (VMCS)
• Control structures representing virtual processorEach virtual CPU should have its own VMCSOnly one VMCS active at a time on a CPU
• A memory block used by VMM and VT-x hardwareVMM: allocate and initialize the memory block, use “VMPTRLD” to make it active, use “VMREAD/VMWRITE” to access @ runtimeVT-x hardware: use the VMCS to control virtual processor behavior, update VMCS according to guest behavior.
20
http://www.intel.com/opensource
2009-11-2121Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VMCS Content
21
VMCSVMCS
Host State AreaHost State Area Guest State AreaGuest State Area
VMVM--execution execution control fieldscontrol fields
VMVM--exit exit control fields control fields
VMVM--entry entry control fieldscontrol fields
VMVM--exitexitinformation fieldsinformation fields
http://www.intel.com/opensource
2009-11-2122Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VM Entry
22
VMCSVMCS
Host State AreaHost State Area
VMVM--execution execution control fieldscontrol fields
VMVM--entry entry control fieldscontrol fields
Guest State AreaGuest State Area
VMVM--exit exit control fieldscontrol fields
VMVM--exitexitinformation fieldsinformation fields
VMLAUNCH VMRESUMEVMLAUNCH VMRESUME
VMX Non-RootOperation
VMX RootOperation
Guest State AreaGuest State AreaEventEvent
MSRMSR
http://www.intel.com/opensource
2009-11-2123Software and Service Group 2009 虚拟化技术全国高校师资研讨班
HostHostMSRMSR
VMCSVMCS
Host State AreaHost State Area
VM Exit
23
Host State AreaHost State Area
VMVM--execution execution control fieldscontrol fields
VMVM--entry entry control fieldscontrol fields
Guest State AreaGuest State Area
VMVM--exit exit control fieldscontrol fields
VMVM--exitexitinformation fieldsinformation fields
VMX Non-RootOperation
VMX RootOperation
Guest State AreaGuest State Area
HostHostMSRMSR
InterruptInterruptExceptionException
Sensitive instructionSensitive instruction
GuestGuestMSRMSR
http://www.intel.com/opensource
2009-11-2124Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Agenda
• CPU Virtualization Overview
• CPU Virtualization Hardware Support (VT-x)
• CPU Virtualization Software Implementation
• Summary
24
http://www.intel.com/opensource
2009-11-2125Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VCPU Concept
• VCPU is the central of CPU virtualization software implementation
• VCPU DataA software entity
• representing virtual processor• Containing all the context info for CPU virtualization (similar as
“process descriptor” in OS)Scheduler entity
• VCPU OperationsVCPU creationVCPU runningVCPU migrationVCPU destroy
25
http://www.intel.com/opensource
2009-11-2126Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VCPU Structure
• VCPU is consisted ofVCPU idVirtual register
• VMCS• Registers not in VMCS
VCPU booking info for scheduler: e.g. running/sleeping stateMisc.
26
VMCS
Non VMCS:id, state
Used by Hardware
Used by Software
VCPU Structure
http://www.intel.com/opensource
2009-11-2127Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VCPU Operation and Life-cycle
27
ReadyReady runningrunningCreate
Run
Migrate
BlockedBlockedBlock
(e.g. wait for I/O)Unblock
exit
http://www.intel.com/opensource
2009-11-2128Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VCPU Creation
• Task: create and initialized the VCPU structure
• Allocate and initialize VMCSVMCS is 4KB aligned memory blockGuest State Area: set to the similar state as physical CPU POST state. e.g. RIP can be the entry point of guest BIOSHost State Area: set to VMM host CPU state. Other area: set according to VMM policy
• Allocate and initialize non-VMCS contentVCPU idVCPU booking infoMisc.
28
http://www.intel.com/opensource
2009-11-2129Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VCPU Run – Context Switch for VM entry• VMM
save VMM contextrestore VCPU contextvmlaunch / vmresume
• Hardware (VM Entry)Restore VMCS guest state area
29
VMCS:Guest State Area
VMCS:Host State Area
Other Context
Other Context
Physical CPU
VCPU Context
VMMContext
1
2
4
3
Step 1,2: done by softwareStep 3,4: done by hardware
Context switch during VM Entry
http://www.intel.com/opensource
2009-11-2130Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VCPU Run – Lazy Save/Restore
30
1:FP
Context Switch Optimization:Lazy Save/Restore
http://www.intel.com/opensource
2009-11-2131Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VCPU Exit
31
VCPU exit - core of CPU virtualization
Exit cause:‐Privileged Resource Access‐Exception‐Interrupt
http://www.intel.com/opensource
2009-11-2132Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VCPU Exit – Privileged Resource Access
Privileged Resource Virtualization
Guest Instruction:MOV EAX, 0X1MOV CR0, EAX # disable page
e.g. CR0 accessBefore Instruction:virtual CR0=0x80000001 (page enabled, protected mode)
VMM handler:‐
Set virtual CR0
to 0x1‐
Shadow CR0 still be 0x80000001‐
Notify the change to memory
virtualization component
Virtual CR0: VMCS ‐>VM Execution
field‐> CR0 read shadowShadow CR0: VMCS‐>Guest State‐
>CR0
http://www.intel.com/opensource
2009-11-2133Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VCPU Exit – Exception
• Some exception is not necessary to exit, e.g. “Div by 0”, VMM can set VMCS exception bit map to deliver it to guest directly
• For the rest exceptions, VMM will handle case by case. e.g. for page fault:
33
Page Fault HandlingPage Fault Handling
MMIO?MMIO?
page fault
I/O VirthandlerI/O Virthandler
shadowPage Fault?
shadowPage Fault?
Memory Virt Handler
Memory Virt Handler
It is normal PF, so deliver it to guest It is normal PF, so deliver it to guest
Need VM exit?Need VM exit?IDT – Guest Exception Handler
IDT – Guest Exception Handler
N
Y
http://www.intel.com/opensource
2009-11-2134Software and Service Group 2009 虚拟化技术全国高校师资研讨班
case2:case2:exit due to virtual device interruptexit due to virtual device interrupt
case1:case1:Exit due to physical device Exit due to physical device
external interrupt external interrupt
VCPU Exit - Interrupt
34
get vector from VMCS
dispatch to interrupt handler
External
interruptVCPU
1.Send IPI
IPI handler
2. IPI exit3. inject
virtual
interrupt
4. Guest Interrupt handler
I/O virt
http://www.intel.com/opensource
2009-11-2135Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VCPU Optimization
• VT-x provide hardware optimization to reduce the VM exit and context switch:
• Bitmap to control VM-exitI/O port bitmapException bitmap
• Return shadow value to avoid VM-exitCR0/CR2/CR4 reading: return shadow valueTSC reading: return TSC+offset
• SYSENTER/SYSEXITSYSENTER/SYSEXIT will directly go to system call routine, and noVM exit
35
http://www.intel.com/opensource
2009-11-2136Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Lock-Holder Preemption
• Solution for LHP issueDetect that VCPU is busy acquiring lock for long time and schedule out the VCPU
36
VCPU0
VCPU1
lockacquired
lockacquired
lockreleased
lockreleased
acquiring lockacquiring lock
preempted (with lock acquired)
lockacquired
lockacquired
Wasting TimeWasting Time
acquiringlock
acquiringlock
http://www.intel.com/opensource
2009-11-2137Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VT-x Pause Loop Exit
• VT-x provide “Pause Loop Exit” to detect busy acquiring lock CPU
“PLE_Gap & PLE_Window” field in VMCS
37
Execution of PAUSE in instruction stream
Instruction stream (time)
gapLikely Lock-Holder Preemption:
Likely normal locking behavior:
gap
gap gap gap gap
VM ExitVM Exit
No VM ExitNo VM Exit
windowwindow
windowwindow
http://www.intel.com/opensource
2009-11-2138Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Summary
• CPU virtualization Goal - provide guest the same ISA as physical CPU
• VT-x provide hardware extension for IA32 CPU virtualization and simplify the VMM implementation
• VCPU is the central of CPU virtualization software implementation
38
http://www.intel.com/opensource
2009-11-2139Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Backup
39
http://www.intel.com/opensource
2009-11-2140Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Case Study: Mode of Operation Virtualization
40
http://www.intel.com/opensource
2009-11-2141Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Why mode of operation virtualization
• Mode of OperationReal modeProtected modeVM86 modeIA32e mode
• VT-x require CR0.PE=CR0.PG=1 (i.e. protected mode with paging) in non-root operation, so if guest require CR0.PG=0 or CR0.PE=0, it need virtualization.
• Two guest mode need virtualization:CR0.PG=0, CR0.PE=1: Guest Protected Mode with Paging disabledCR0.PG=0, CR0.PE=0: Guest Real Mode
41
http://www.intel.com/opensource
2009-11-2142Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Guest Protected with Paging disabled
• Gap:Virtual CPU should see flat memory Physical CPU is in protected mode with paging enabled
• Approach to address the gap:VMM use identity-mapping page table to provide flat memory
42
http://www.intel.com/opensource
2009-11-2143Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Guest Real Mode
• Gap: VCPU is in real mode, has different opcode length, address space, and semantic from protected
• Approach to address the gapVMM use protected VM86 mode to execute the real mode instructionfor those instructions that can not handle in VM86 mode, VMM will trap and emulate the instruction
43
http://www.intel.com/opensource
2009-11-2144Software and Service Group 2009 虚拟化技术全国高校师资研讨班
Hardware Support
• In recent VT-x, a new feature called “Unrestricted guest” is introduced.
Feature can be detected by IA32_VMX_MISC MSR bit 5
• With VMCS “Unrestricted guest” set, guest software can run in unpaged protected mode or in real-address mode
• software involvement is not needed in mode of operation virtualization in “Unrestricted guest” case
44
http://www.intel.com/opensource
2009-11-2145Software and Service Group 2009 虚拟化技术全国高校师资研讨班
VCPU Migration (Backup)
• Scheduler can migrate VCPU from one physical CPU to another physical CPU (e.g. for load balance purpose)
1.De-schedule VCPU from source CPU
2.Migrate VMCSVMCLEAR in source CPUVMPTRLD in target CPU
3. Schedule VCPU in target CPU by VMLAUNCH (not VMRESUME)
45
http://www.intel.com/opensource
2009-11-2146Software and Service Group 2009 虚拟化技术全国高校师资研讨班
PLE in Software
• Enable the VMCS “Pause-loop exiting” bit
• Configure VMCS “PLE_Gap” & “PLE_Window”The reasonable value is estimated by performance measure
• How to wake up VCPU0Key is how to find lock holderOption1: para-virtualization Option2: randomly select one sleep VCPU
46