

### Cloud Operating Systems

Daniel Gruss (+ credits to Peter Lipp, Sina Karvandi (@Intel80x86), and Fabian Rauscher) 2021-03-15









### Moore's Law – The number of transistors on integrated circuit chips (1971-2018)



Moore's law describes the empirical regularity that the number of transistors on integrated circuits doubles approximately every two years. This advancement is important as other aspects of technological progress – such as processing speed or the price of electronic products – are linked to Moore's law.







from outages blamed on cryptocurrency mining activities. 9

Despite the fact that, in both examples. policymakers did not decide to take action because of environmental concerns, the examples illustrate how policymakers might have multiple options in putting a halt to cryptocurrency mining. Although Bitcoin might be a decentralized currency, many aspects of the ecosystem surrounding it are not. The competitive Bitcoin market drives miners to take advantage of economies of scale in lowering costs, which also makes it harder for them to operate under the radar. Large-scale miners can easily be targeted with higher electricity rates, moratoria, or, in the most extreme case, confiscation of the equipment used. Moreover, the supply chain of specialized Bitcoin mining de-

#### CONCLUSION

As the price of Bitcoin rises, the negative externalities associated with Ritcoin mining increase in kind. This Commentary has shown how a simple economic model might be used to estimate the potential environmental impact of Bitcoin mining for a given Bitcoin price. These estimates reveal that the record-breaking surge in Bitcoin price at the start of 2021 could result in the network consuming as much energy as all data centers globally, with an associated carbon footprint matching London's footprint size. Beyond these environmental impacts, the production of specialized mining devices might exacerbate the global shortage of chips, which could effect the ability to work from home, the economic recovery after the COVID-19 crisis, and

- Joule 3, 893–898.
- Arab News. (2021). Crypto-miners take down Iran electric grids, prompting crackdown. https://www.arabnews.com/node/1794836/ middle-east.
- Blandin, A., Pieters, G.C., Wu, Y., Eisermann, T., Dek, A., Taylor, S., and Njoki, D. (2020).
   3rd global cryptoasset benchmarking study. https://www.jbs.cam.ac.uk/faculty-research/centres/alternative-finance/publications/ 3rd-global-cryptoasset-benchmarkingstudy/.
- Stoll, C., Klaaßen, L., and Gallersdörfer, U. (2019). The Carbon Footprint of Bitcoin. Joule 3, 1647–1661.
- Gallersdörfer, U., Klaaßen, L., and Stoll, C. (2020). Energy Consumption of Cryptocurrencies Beyond Bitcoin. Joule 4, 1843–1846.
- Jin, H., Busvine, D., and Kirton, D. (2020). Analysis: Global chip shortage threatens production of laptops, smartphones and more (Reuters).
- de Vries, A. (2020). Bitcoin's energy consumption is underestimated: A market dynamics approach. Energy Res. Soc. Sci. 70, 101721



memegenerator.net



| 1999                                          | 2019                                 | 2029                                           |
|-----------------------------------------------|--------------------------------------|------------------------------------------------|
| I DEVELOPED THE ENTIRE  SOFTWARE IN 120 LINES | I WROTE 1 COMPONENT IN 10,000 LINES! | I DEVELOPED THE ENTIRE  SOFTWARE IN 120 LINES! |



## Moving to the cloud can save up to 87% of IT energy



# Cloud means Efficiency

ullet Processes used to have access to all physical memory o that's not efficient!

- ullet Processes used to have access to all physical memory o that's not efficient!
- ightarrow Virtualize memory ightarrow processes can share resources of one machine and utilize it better

- ullet Processes used to have access to all physical memory o that's not efficient!
- ightarrow Virtualize memory ightarrow processes can share resources of one machine and utilize it better
- Processes need all the same pages  $\rightarrow$  that's not efficient!

- ullet Processes used to have access to all physical memory o that's not efficient!
- ightarrow Virtualize memory ightarrow processes can share resources of one machine and utilize it better
- Processes need all the same pages  $\rightarrow$  that's not efficient!
- $\rightarrow$  Let them share memory, using COW, page deduplication, etc.

- ullet Processes used to have access to all physical memory o that's not efficient!
- ightarrow Virtualize memory ightarrow processes can share resources of one machine and utilize it better
- Processes need all the same pages  $\rightarrow$  that's not efficient!
- ightarrow Let them share memory, using COW, page deduplication, etc.
- $\bullet$  Processes often cannot do anything but wait  $\to$  that's not efficient!

- ullet Processes used to have access to all physical memory o that's not efficient!
- ightarrow Virtualize memory ightarrow processes can share resources of one machine and utilize it better
- Processes need all the same pages  $\rightarrow$  that's not efficient!
- $\rightarrow$  Let them share memory, using COW, page deduplication, etc.
- ullet Processes often cannot do anything but wait o that's not efficient!
- $\rightarrow$  Let other processes run in between





Efficiency

- Efficiency
- Isolation of tenants (security, reliability, availability)

- Efficiency
- Isolation of tenants (security, reliability, availability)
- Abstraction of hardware

What is Virtualization?

**Virtualization** allows to represent resources in a computer in a way they can be used easily and without the need to know details of their properties

• Decouple operating system from hardware

- Decouple operating system from hardware
  - "computer in computer" implemented in software

- Decouple operating system from hardware
  - "computer in computer" implemented in software
  - includes devices (network, keyboard, sound...)

- Decouple operating system from hardware
  - "computer in computer" implemented in software
  - includes devices (network, keyboard, sound...)
- OS in VM "sees" its hardware, irrespective from the actual hardware in use

- Decouple operating system from hardware
  - "computer in computer" implemented in software
  - includes devices (network, keyboard, sound...)
- OS in VM "sees" its hardware, irrespective from the actual hardware in use
- OS does not know if HW is concurrently used by other VMS

Why virtualization



Why virtualization





• Cheaper hardware: one server for one task was common



- Cheaper hardware: one server for one task was common
- most of these servers: idle time 90%



- Cheaper hardware: one server for one task was common
- most of these servers: idle time 90%
- cost issue:



- Cheaper hardware: one server for one task was common
- most of these servers: idle time 90%
- cost issue:
  - support, maintenance



- Cheaper hardware: one server for one task was common
- most of these servers: idle time 90%
- cost issue:
  - support, maintenance
  - power consumption (operation, cooling)



- Cheaper hardware: one server for one task was common
- most of these servers: idle time 90%
- cost issue:
  - support, maintenance
  - power consumption (operation, cooling)
  - space



- Cheaper hardware: one server for one task was common
- most of these servers: idle time 90%
- cost issue:
  - support, maintenance
  - power consumption (operation, cooling)
  - space
- Virtualization allows consolidation



- Cheaper hardware: one server for one task was common
- most of these servers: idle time 90%
- cost issue:
  - support, maintenance
  - power consumption (operation, cooling)
  - space
- Virtualization allows consolidation
  - multiple servers on one box







• Better hardware utilization



- Better hardware utilization
- Lower administration cost



- Better hardware utilization
- Lower administration cost
- long-term operations of older applications



- Better hardware utilization
- Lower administration cost
- long-term operations of older applications
- lower down-times



- Better hardware utilization
- Lower administration cost
- long-term operations of older applications
- lower down-times
- simple migration to more powerful hardware







• Performance cost: slower I/O operation



- Performance cost: slower I/O operation
- single point of failure: requires better hardware reliability



- Performance cost: slower I/O operation
- single point of failure: requires better hardware reliability
- security gets more complex

• Virtualization no significant role in internet hosting

- Virtualization no significant role in internet hosting
- often PaaS

- Virtualization no significant role in internet hosting
- often PaaS
- Web hoster (FTP access, HTTP website)

- Virtualization no significant role in internet hosting
- often PaaS
- Web hoster (FTP access, HTTP website)
- Isolation on the OS level (tenants as users)

- Virtualization no significant role in internet hosting
- often PaaS
- Web hoster (FTP access, HTTP website)
- Isolation on the OS level (tenants as users)
- $\bullet \ \ \mathsf{no} \ \mathsf{hardware} \ \mathsf{support} \ \to \mathsf{expensive} \ + \ \mathsf{many} \ \mathsf{problems}$

• OS-level Virtualization

**Modern Virtualization** 

- OS-level Virtualization
- Para-Virtualization

**Modern Virtualization** 

- OS-level Virtualization
- Para-Virtualization
- Full Virtualization

**Modern Virtualization** 

- OS-level Virtualization
- Para-Virtualization
- Full Virtualization
- Hardware-Assisted Virtualization

• integrated into kernel

10

**OS-level Virtualization** 

- integrated into kernel
- all application software intended to run in a virtual environment get strictly separated virtual runtime environments (container, jail)

**OS-level Virtualization** 

- integrated into kernel
- all application software intended to run in a virtual environment get strictly separated virtual runtime environments (container, jail)
- no separate kernels only process level virtualization

10

- integrated into kernel
- all application software intended to run in a virtual environment get strictly separated virtual runtime environments (container, jail)
- no separate kernels only process level virtualization
- can't run other OSes only for applications

10

- integrated into kernel
- all application software intended to run in a virtual environment get strictly separated virtual runtime environments (container, jail)
- no separate kernels only process level virtualization
- can't run other OSes only for applications
- examples: OpenVZ, Docker, (s)chroot

Para-Virtualization











• Cooperation with OS: OS is aware of virtualization



- Cooperation with OS: OS is aware of virtualization
- needs to modify guest



- Cooperation with OS: OS is aware of virtualization
- needs to modify guest
- not usable for closed source OSes

• OS not aware of being virtualized

www.tugraz.at

- OS not aware of being virtualized
- no need to adapt guest

- OS not aware of being virtualized
- no need to adapt guest
- reduced performance

- OS not aware of being virtualized
- no need to adapt guest
- reduced performance
  - up to 25%

- OS not aware of being virtualized
- no need to adapt guest
- reduced performance
  - up to 25%
- full virtualization of HW required (e.g., emulation via qemu)

- OS not aware of being virtualized
- no need to adapt guest
- reduced performance
  - up to 25%
- full virtualization of HW required (e.g., emulation via qemu)
  - virtual machines not allowed to access physical components

- OS not aware of being virtualized
- no need to adapt guest
- reduced performance
  - up to 25%
- full virtualization of HW required (e.g., emulation via gemu)
  - virtual machines not allowed to access physical components
  - every physical component has to be virtualized and requires drivers in OS

• Guest no longer runs in kernel mode (Ring 0)

- Guest no longer runs in kernel mode (Ring 0)
  - parts that require kernel privileges won't run

- Guest no longer runs in kernel mode (Ring 0)
  - parts that require kernel privileges won't run
- hypervisor (VMM) changes binaries of guest-OS on the fly

- Guest no longer runs in kernel mode (Ring 0)
  - parts that require kernel privileges won't run
- hypervisor (VMM) changes binaries of guest-OS on the fly
- allows supporting any OS

- Guest no longer runs in kernel mode (Ring 0)
  - parts that require kernel privileges won't run
- hypervisor (VMM) changes binaries of guest-OS on the fly
- allows supporting any OS
  - no need to change source

- Guest no longer runs in kernel mode (Ring 0)
  - parts that require kernel privileges won't run
- hypervisor (VMM) changes binaries of guest-OS on the fly
- allows supporting any OS
  - no need to change source
- high performance penalty

• First full x86 virtualization

- First full x86 virtualization
- hypervisor continuously reads program code before it is executed (prescan)

14

- First full x86 virtualization
- hypervisor continuously reads program code before it is executed (prescan)
- looking for relevant commands

- First full x86 virtualization
- hypervisor continuously reads program code before it is executed (prescan)
- looking for relevant commands
  - change of system state

- First full x86 virtualization
- hypervisor continuously reads program code before it is executed (prescan)
- looking for relevant commands
  - change of system state
  - commands depending on CPU state

- First full x86 virtualization
- hypervisor continuously reads program code before it is executed (prescan)
- looking for relevant commands
  - change of system state
  - commands depending on CPU state
- sets breakpoint and lets OS run







• Diverse problems were to be solved when virtualizing on IA-32:



- Diverse problems were to be solved when virtualizing on IA-32:
  - Ring Problems



- Diverse problems were to be solved when virtualizing on IA-32:
  - Ring Problems
  - Address Space Compression



- Diverse problems were to be solved when virtualizing on IA-32:
  - Ring Problems
  - Address Space Compression
  - Non-Faulting Access to Priv. State



- Diverse problems were to be solved when virtualizing on IA-32:
  - Ring Problems
  - Address Space Compression
  - Non-Faulting Access to Priv. State
  - SYSENTER / SYSEXIT



- Diverse problems were to be solved when virtualizing on IA-32:
  - Ring Problems
  - Address Space Compression
  - Non-Faulting Access to Priv. State
  - SYSENTER / SYSEXIT
  - Interrupt Virtualization



- Diverse problems were to be solved when virtualizing on IA-32:
  - Ring Problems
  - Address Space Compression
  - Non-Faulting Access to Priv. State
  - SYSENTER / SYSEXIT
  - Interrupt Virtualization
  - Hidden States

• usually: application run in ring 3, kernel in ring 0

- usually: application run in ring 3, kernel in ring 0
- ullet guest may not run in ring 0

- usually: application run in ring 3, kernel in ring 0
- guest may not run in ring 0
- ring depriviledging needed: guest must run in ring ¿ 0

- usually: application run in ring 3, kernel in ring 0
- guest may not run in ring 0
- ring depriviledging needed: guest must run in ring ¿ 0
  - ullet most often 1 or 3

• guest has to run in a ring it has not been developed for

- guest has to run in a ring it has not been developed for
- certain instructions contain privilege level in result (e.g. PUSH CS)

- guest has to run in a ring it has not been developed for
- certain instructions contain privilege level in result (e.g. PUSH CS)
- guest OS can find out ring it is running in

- guest has to run in a ring it has not been developed for
- certain instructions contain privilege level in result (e.g. PUSH CS)
- guest OS can find out ring it is running in
- may result in diverse problems

## **Address Space Compression**

• Guest expects to have full address space available

## **Address Space Compression**

- Guest expects to have full address space available
- hypervisor requires part of address space

- Guest expects to have full address space available
- hypervisor requires part of address space
  - control structures for switching between guest and hypervisor

- Guest expects to have full address space available
- hypervisor requires part of address space
  - control structures for switching between guest and hypervisor
- Access to these areas not allowed for guest. Invokes switch to hypervisor who has to emulate these accesses

• unprivileged software may not access certain elements of the CPU state

- unprivileged software may not access certain elements of the CPU state
- access by guest results in fault: hypervisor can emulate instructions

- unprivileged software may not access certain elements of the CPU state
- access by guest results in fault: hypervisor can emulate instructions
- IA-32 possesses instructions that do not induce a fault:

- unprivileged software may not access certain elements of the CPU state
- access by guest results in fault: hypervisor can emulate instructions
- IA-32 possesses instructions that do not induce a fault:
  - $\bullet$  Registers GDTR, IDTR, LDTR and TR are only modifiable in ring 0

19

- unprivileged software may not access certain elements of the CPU state
- access by guest results in fault: hypervisor can emulate instructions
- IA-32 possesses instructions that do not induce a fault:
  - Registers GDTR, IDTR, LDTR and TR are only modifiable in ring 0
  - can be executed in any ring without fault (without function)

• special commands for fast syscalls

**SYSENTER / SYSEXIT** 

- special commands for fast syscalls
- SYSENTER always switches to ring 0

**SYSENTER / SYSEXIT** 

- special commands for fast syscalls
- SYSENTER always switches to ring 0
- SYSEXIT can only be executed in ring 0

- special commands for fast syscalls
- SYSENTER always switches to ring 0
- SYSEXIT can only be executed in ring 0
- ring 1 thus is problematic

- special commands for fast syscalls
- SYSENTER always switches to ring 0
- SYSEXIT can only be executed in ring 0
- ring 1 thus is problematic
  - $\bullet$  SYSENTER switches to hypervisor  $\to$  has to emulate

- special commands for fast syscalls
- SYSENTER always switches to ring 0
- SYSEXIT can only be executed in ring 0
- ring 1 thus is problematic
  - ullet SYSENTER switches to hypervisor o has to emulate
  - ullet SYSEXIT switches to hypervisor o has to emulate

• interrupts can be masked (so they do not occur if not welcome)

## **Interrupt Virtualization**

- interrupts can be masked (so they do not occur if not welcome)
- controlled by IF-flag in EFLAGS-Register

21

- interrupts can be masked (so they do not occur if not welcome)
- controlled by IF-flag in EFLAGS-Register
- Interrupts managed by VM though

21

- interrupts can be masked (so they do not occur if not welcome)
- controlled by IF-flag in EFLAGS-Register
- Interrupts managed by VM though
- ullet change of IF o fault to hypervisor

- interrupts can be masked (so they do not occur if not welcome)
- controlled by IF-flag in EFLAGS-Register
- Interrupts managed by VM though
- ullet change of IF o fault to hypervisor
- $\bullet\,$  OS do this quite often  $\to$  performance problem

- interrupts can be masked (so they do not occur if not welcome)
- controlled by IF-flag in EFLAGS-Register
- Interrupts managed by VM though
- ullet change of IF o fault to hypervisor
- ullet OS do this quite often o performance problem
- forwarding of virtual interrupts must consider IF

## hidden state information

www.tugraz.at







• Not all state-information accessible via registers



- Not all state-information accessible via registers
- cannot be saved and restored when switching between VMs

• Two new operating modes:

- Two new operating modes:
  - VMX root operation

- Two new operating modes:
  - VMX root operation
    - for hypervisor

- Two new operating modes:
  - VMX root operation
    - for hypervisor
  - VMX non-root operation

- Two new operating modes:
  - VMX root operation
    - for hypervisor
  - VMX non-root operation
    - controlled by hypervisor

- Two new operating modes:
  - VMX root operation
    - for hypervisor
  - VMX non-root operation
    - controlled by hypervisor
    - supports VMs

- Two new operating modes:
  - VMX root operation
    - for hypervisor
  - VMX non-root operation
    - controlled by hypervisor
    - supports VMs
- Both modes have ring 0-3

- Two new operating modes:
  - VMX root operation
    - for hypervisor
  - VMX non-root operation
    - controlled by hypervisor
    - supports VMs
- Both modes have ring 0-3
- guest can run in ring 0

- Two new operating modes:
  - VMX root operation
    - for hypervisor
  - VMX non-root operation
    - controlled by hypervisor
    - supports VMs
- Both modes have ring 0-3
- guest can run in ring 0
- hypervisor said to be running in "ring -1"

Rings on Intel



**VMM Operation** 



**VMM Transitions** 



ullet VM entry: root operation o non-root operation

- VM entry: root operation  $\rightarrow$  non-root operation
- ullet VM exit: non-root operation o root operation

- VM entry: root operation  $\rightarrow$  non-root operation
- ullet VM exit: non-root operation o root operation
- VMCS: Virtual Machine Control Structure

- VM entry: root operation  $\rightarrow$  non-root operation
- ullet VM exit: non-root operation o root operation
- VMCS: Virtual Machine Control Structure
  - Guest-state-area

- VM entry: root operation  $\rightarrow$  non-root operation
- ullet VM exit: non-root operation o root operation
- VMCS: Virtual Machine Control Structure
  - Guest-state-area
  - Host-state-area

- VM entry: root operation → non-root operation
- ullet VM exit: non-root operation o root operation
- VMCS: Virtual Machine Control Structure
  - Guest-state-area
  - Host-state-area
- Entry/Exit loads/safes information using the proper area

• Contains elements comprising the state of the virtual CPU of a VMCS

- Contains elements comprising the state of the virtual CPU of a VMCS
- VM-exit requires loading certain registers (like segment registers, CR3, IRTR...)

- Contains elements comprising the state of the virtual CPU of a VMCS
- VM-exit requires loading certain registers (like segment registers, CR3, IRTR...)
- GSA contains fields for these registers

- Contains elements comprising the state of the virtual CPU of a VMCS
- VM-exit requires loading certain registers (like segment registers, CR3, IRTR...)
- GSA contains fields for these registers
- GSA contains fields for other information not readable via registers

- Contains elements comprising the state of the virtual CPU of a VMCS
- VM-exit requires loading certain registers (like segment registers, CR3, IRTR...)
- GSA contains fields for these registers
- GSA contains fields for other information not readable via registers
  - e.g. interuptability state

Natural-Width fields.
16-bits fields.

CopyLeft 2017, @Noteworthy (Intel Manuel of July 2017)

32-bits fields.

32-bits fields.64-bits fields.

## **GUEST STATE AREA**

| CR0                                                | CR3                                   |           |                                   |                       |                            |             | CR4          |              |              |  |
|----------------------------------------------------|---------------------------------------|-----------|-----------------------------------|-----------------------|----------------------------|-------------|--------------|--------------|--------------|--|
| DR7                                                |                                       |           |                                   |                       |                            |             |              |              |              |  |
| RSP                                                | RIP RFLAGS                            |           |                                   |                       |                            |             | AGS          |              |              |  |
| CS                                                 | Selector                              | В         | Base Address Segment Limit        |                       |                            |             |              | Access Right |              |  |
| SS                                                 | Selector                              | В         | ase Ad                            | Address Segment Limit |                            |             |              | Access Right |              |  |
| DS                                                 | Selector                              | В         | Base Address Segme                |                       |                            | gment Limit |              |              | Access Right |  |
| ES                                                 | Selector                              | В         | ase Ad                            | dress                 | Se                         | gmen        | nt Limit     |              | Access Right |  |
| FS                                                 | Selector                              | В         | ase Ad                            | dress                 | Se                         | gmen        | gment Limit  |              | Access Right |  |
| GS                                                 | Selector                              | В         | ase Ad                            | dress                 | Se                         | gment Limit |              |              | Access Right |  |
| LDTR                                               | Selector                              | В         | Base Address Seg                  |                       |                            |             | gment Limit  |              | Access Right |  |
| TR                                                 | Selector                              | В         | Base Address Segment Limit Access |                       |                            |             |              | Access Right |              |  |
| GDTR                                               | Selector                              | В         | ase Ad                            | dress                 | Segment Limit Access R     |             |              |              | Access Right |  |
| IDTR                                               | Selector                              | В         | ase Ad                            | dress                 | Segment Limit Acces        |             |              | Access Right |              |  |
| IA32_DEBUGCTL                                      | IA32_SYS                              | SENTER_CS | IA                                | A32_SYSEN             | NTER_ESP IA32_SYSENTER_EIP |             |              |              |              |  |
| _IA32_PERF_GLOBAL_CT                               | RL IA3                                | 2_PAT     | IA32_EFER IA32_BNDCFGS            |                       |                            |             |              |              | BNDCFGS      |  |
| SMBASE                                             |                                       |           |                                   |                       |                            |             |              |              |              |  |
| Activity state                                     | Activity state Interruptibility state |           |                                   |                       |                            |             |              |              |              |  |
| Pending debug exceptions                           |                                       |           |                                   |                       |                            |             |              |              |              |  |
| VMCS link pointer                                  |                                       |           |                                   |                       |                            |             |              |              |              |  |
| VMX-preemption timer value                         |                                       |           |                                   |                       |                            |             |              |              |              |  |
| Page-directory-pointer-table entries PDPTE0 PDPTE1 |                                       |           |                                   |                       | E1                         |             | PDPTE2 PDPTE |              | PDPTE3       |  |

Guest interrupt status PML index

| 11031 STATE AREA |                       |                       |          |  |  |  |  |  |
|------------------|-----------------------|-----------------------|----------|--|--|--|--|--|
| CRO              |                       | CR3                   | CR4      |  |  |  |  |  |
|                  | RSP                   | RIP                   |          |  |  |  |  |  |
| CS               |                       | Selector              | Selector |  |  |  |  |  |
| SS               |                       | Selector              | Selector |  |  |  |  |  |
| DS               | Selector              |                       |          |  |  |  |  |  |
| ES               | Selector              |                       |          |  |  |  |  |  |
| FS               | Selector              | Selector Base Address |          |  |  |  |  |  |
| GS               | Selector Base Address |                       |          |  |  |  |  |  |

IA32\_SYSENTER\_ESP

IA32 PAT

**Base Address** 

IA32\_SYSENTER\_EIP

IA32 EFER

**Base Address** 

**Base Address** 

TR

**GDTR** 

**IDTR** 

IA32 SYSENTER CS

IA32 PERF GLOBAL CTRL

Selector

HOST STATE AREA

• Addressed using physical addresses

- Addressed using physical addresses
- not part of guest address space

- Addressed using physical addresses
- not part of guest address space
- hypervisor may run in different address space as guest (CR3 part of state)

29

- Addressed using physical addresses
- not part of guest address space
- hypervisor may run in different address space as guest (CR3 part of state)
- VM-exits leave detailed information on reason for exit in VMCS

- Addressed using physical addresses
- not part of guest address space
- hypervisor may run in different address space as guest (CR3 part of state)
- VM-exits leave detailed information on reason for exit in VMCS
  - exit reason

- Addressed using physical addresses
- not part of guest address space
- hypervisor may run in different address space as guest (CR3 part of state)
- VM-exits leave detailed information on reason for exit in VMCS
  - exit reason
  - exit qualification

|                  | Save debug controls           | Host ad | dress space size | Load IA32_PERF_GLOBAL_CTRL     |                   |               |                |
|------------------|-------------------------------|---------|------------------|--------------------------------|-------------------|---------------|----------------|
| VM-Exit Controls | Acknowledge interrupt on exit | Sav     | e IA32_PAT       | Load IA32_PAT                  | 32_PAT   Save IA3 | ave IA32_EFER | Load IA32_EFER |
|                  | Save VMX preemption timer val | Clear I | A32_BNDCFGS      | Conceal VM exits from Intel PT |                   |               |                |
| /M-Exit Controls | VM-exit MSR-store count       |         |                  | VM-exit MSR-st                 | ore               | address       |                |

its from Intel PT for MSRs VM-exit MSR-load count VM-exit MSR-load address

VM-FXIT CONTROL FIFLDS

VM-EXIT INFORMATION FIELDS

Basic VM-Exit Exit reason

Information

Guest-linear address

VM Exits Due to Instruction Execution

VM Exits Due to Vectored Events VM Exits That Occur During Event Delivery

I/O RCX

**IDT-vectoring information** 

VM-exit interruption information

VM-exit instruction length

VM-instruction error field

I/O RSI

I/O RDI

VM-exit interruption error code IDT-vectoring error code

VM-exit instruction information

Exit qualification

**Guest-physical address** 

I/O RIP

www.tugraz.at

• Example: MOV CR

www.tugraz.at

• Example: MOV CR

• Exit reason: "control register access"

30

• Example: MOV CR

• Exit reason: "control register access"

• Exit qualification:

- Example: MOV CR
- Exit reason: "control register access"
- Exit qualification:
  - which CR

30

- Example: MOV CR
- Exit reason: "control register access"
- Exit qualification:
  - which CR
  - direction (Rx $\rightarrow$ CR or CR $\rightarrow$ Rx)

- Example: MOV CR
- Exit reason: "control register access"
- Exit qualification:
  - which CR
  - direction (Rx $\rightarrow$ CR or CR $\rightarrow$ Rx)
  - register used

| CONTROL FIELDS                                           |                                          |                                   |                       |                      |                                     |                                     |                           |                    |                        |  |
|----------------------------------------------------------|------------------------------------------|-----------------------------------|-----------------------|----------------------|-------------------------------------|-------------------------------------|---------------------------|--------------------|------------------------|--|
| Pin-Based VM-                                            | External-interrupt exiting               |                                   |                       |                      | NM                                  | l exiting                           | 3                         | Virtual NMIs       |                        |  |
| Execution Controls                                       | Activate VMX-preemption timer            |                                   |                       |                      |                                     | Process posted interrupts           |                           |                    |                        |  |
|                                                          |                                          | Interrupt-wi                      | ndow exitir           | ng                   |                                     | Use TSC offsetting                  |                           |                    |                        |  |
| Primary processor-                                       | H                                        | ILT exiting                       | INVL                  | LPG ex               | kiting                              | MWAIT exiting                       |                           |                    | RDPMC exiting          |  |
| based                                                    | RE                                       | RDTSC exiting CR3-load            |                       |                      | xiting                              | CR3-store exiting                   |                           |                    | CR8-load exiting       |  |
| VM-execution                                             | CR8                                      | CR8-store exiting Use TPR         |                       |                      | adow                                | NMI-window exiting                  |                           |                    | MOV-DR exiting         |  |
| controls                                                 | Uncond                                   | Inconditional I/O exiting Use I/O |                       |                      | maps                                | Monitor trap flag                   |                           |                    | Use MSR bitmaps        |  |
|                                                          |                                          | MONITOR exiting                   | g                     |                      | PAUS                                | SE exiting Act                      |                           |                    | ate secondary controls |  |
|                                                          | Virtualize APIC accesses                 |                                   | Enable EP             |                      | PT                                  | Descriptor-table                    |                           | exiting            | Enable RDTSCP          |  |
| Secondary<br>processor-based<br>VM-execution<br>controls | Virtual                                  | ze x2APIC mode                    | Enable VPI            |                      | PID                                 | WBINVD exit                         |                           | ing                | Unrestricted guest     |  |
|                                                          | APIC-register virtualization             |                                   |                       |                      | Virtual-interrupt delivery          |                                     |                           | PAUSE-loop exiting |                        |  |
|                                                          | RDRAND exiting Er                        |                                   |                       | nable INVPCID Enable |                                     | able VM fund                        | ctions                    | VMCS shadowing     |                        |  |
|                                                          | Enable ENCLS exiting RD                  |                                   |                       | EED ex               | xiting                              | ng Enable PML EPT-violation #\      |                           |                    | EPT-violation #VE      |  |
| CONTROLS                                                 | Conceal VMX non-root operation from      |                                   |                       |                      | ntel PT                             |                                     | Enable XSAVES/XRSTORS     |                    |                        |  |
|                                                          | Mode-based execute control for EPT       |                                   |                       |                      | PT                                  | Use TSC scaling                     |                           |                    |                        |  |
| Exception Bitmap I/O                                     |                                          |                                   | I/O-Bi                | I/O-Bitmap Addresses |                                     |                                     |                           | TSC-offset         |                        |  |
| Guest/Host Masks fo                                      | Guest/Host Masks for CRO Guest/Host      |                                   | lasks for CR4 Read    |                      |                                     | Shadows for CRO Re                  |                           |                    | d Shadows for CR4      |  |
| CR3-target value 0                                       | CR                                       | 3-target value 1                  | CR3-ta                | arget v              | value 2                             | alue 2 CR3-                         |                           | ue 3               | CR3-target count       |  |
|                                                          | APIC-access addres                       |                                   | ress                  | ess Vi               |                                     | tual-APIC address                   |                           |                    | TPR threshold          |  |
| <b>APIC Virtualization</b>                               | EO                                       | EOI-exit bitmap 0 EOI-ex          |                       | xit bit              | t bitmap 1 EOI-e                    |                                     | OI-exit bitma             | ap 2               | EOI-exit bitmap 3      |  |
|                                                          | Posted-interrupt notification vec        |                                   |                       |                      | tor                                 | Posted-interrupt descriptor address |                           |                    | scriptor address       |  |
| Read bitmap for low                                      | o for low MSRs Read bitmap for high MSR: |                                   | Rs                    | Write bit            | map for low MSRs                    |                                     | Write bitmap for low MSRs |                    |                        |  |
| Executive-VMCS Pointer Extended                          |                                          |                                   | l-Page                | -Table Poi           | able Pointer Virtual-Processor Iden |                                     |                           | essor Identifier   |                        |  |
| PLE_Gap                                                  |                                          | PLE_Window                        | Vindow VM-function of |                      |                                     | VMREAD bitmap VMWRIT                |                           |                    | VMWRITE bitmap         |  |
| ENCLS-exiting bitmap                                     |                                          |                                   |                       |                      | PML address                         |                                     |                           |                    |                        |  |
| Virtualization-exception information address             |                                          |                                   | EPTP index            |                      |                                     |                                     | XSS-exiting bitmap        |                    |                        |  |

The next step ( $\approx 2005$ ):

• Virtualization Hardware Extensions for Intel and AMD

The next step ( $\approx 2005$ ):

- Virtualization Hardware Extensions for Intel and AMD
- $\rightarrow\,$  substantially lower overheads for VMs

The next step ( $\approx$  2005):

- Virtualization Hardware Extensions for Intel and AMD
- $\,\rightarrow\,$  substantially lower overheads for VMs
- $\rightarrow$  better isolation

## The next step ( $\approx 2005$ ):

- Virtualization Hardware Extensions for Intel and AMD
- ightarrow substantially lower overheads for VMs
- $\rightarrow$  better isolation
- ightarrow IaaS VMs become widely used

• Support for interrupt-virtualization

- Support for interrupt-virtualization
  - VM-exit with every external interrupt (cannot be masked by guest)

32

- Support for interrupt-virtualization
  - VM-exit with every external interrupt (cannot be masked by guest)
  - VM-exit when guest-OS ready to accept interrupts (EFLAGS.IF==1)

- Support for interrupt-virtualization
  - VM-exit with every external interrupt (cannot be masked by guest)
  - VM-exit when guest-OS ready to accept interrupts (EFLAGS.IF==1)
- Support for CR0 and CR4-virtualization

- Support for interrupt-virtualization
  - VM-exit with every external interrupt (cannot be masked by guest)
  - VM-exit when guest-OS ready to accept interrupts (EFLAGS.IF==1)
- Support for CR0 and CR4-virtualization
  - VM-exit with any change of these registers

- Support for interrupt-virtualization
  - VM-exit with every external interrupt (cannot be masked by guest)
  - VM-exit when guest-OS ready to accept interrupts (EFLAGS.IF==1)
- Support for CR0 and CR4-virtualization
  - VM-exit with any change of these registers
  - can be set on which bits this shall happen

• Address Space Compression

- Address Space Compression
  - change of address space with any switch guest/hypervisor

- Address Space Compression
  - change of address space with any switch guest/hypervisor
  - guest owns full virtual address space

- Address Space Compression
  - change of address space with any switch guest/hypervisor
  - guest owns full virtual address space
- Ring Problems, SYSENTER/SYSEXIT

- Address Space Compression
  - change of address space with any switch guest/hypervisor
  - guest owns full virtual address space
- Ring Problems, SYSENTER/SYSEXIT
  - Guest can now run in ring 0

• Nonfaulting Access to Privileged State

- Nonfaulting Access to Privileged State
  - access raise fault into hypervisor

- Nonfaulting Access to Privileged State
  - access raise fault into hypervisor
- Hidden State

- Nonfaulting Access to Privileged State
  - access raise fault into hypervisor
- Hidden State
  - Saved into VMCS

• Hypervisor uses virtual memory

## **Hypervisor and Virtual Memory**

www.tugraz.at

- Hypervisor uses virtual memory
- guest OS uses virtual memory

- Hypervisor uses virtual memory
- guest OS uses virtual memory
- hardware supports pagetables

- Hypervisor uses virtual memory
- guest OS uses virtual memory
- hardware supports pagetables
- how does this work?

- Hypervisor uses virtual memory
- guest OS uses virtual memory
- hardware supports pagetables
- how does this work?
  - shadow page tables

- Hypervisor uses virtual memory
- guest OS uses virtual memory
- hardware supports pagetables
- how does this work?
  - shadow page tables
  - hardware support

**Virtual Memory** 



All problems in computer science can be solved by another level of indirection.

All problems in computer science can be solved by another level of indirection.

But that usually will create another problem.

David Wheeler

**Paging** 





and in 64 bit...







Page Tables





• merges both page tables into one that the HW uses

## **Shadow Page Table**

- merges both page tables into one that the HW uses
- when guest changes own page table

## Shadow Page Table

- merges both page tables into one that the HW uses
- when guest changes own page table
  - Hypervisor has to catch access

- merges both page tables into one that the HW uses
- when guest changes own page table
  - Hypervisor has to catch access
  - update shadow page table

when HW changes shadow page table

- when HW changes shadow page table
- update guest PT

- when HW changes shadow page table
- update guest PT
  - expensive!

- when HW changes shadow page table
- update guest PT
  - expensive!
  - page faults caught by hypervisor

- when HW changes shadow page table
- update guest PT
  - expensive!
  - page faults caught by hypervisor
  - must run through guest PTs

- when HW changes shadow page table
- update guest PT
  - expensive!
  - page faults caught by hypervisor
  - must run through guest PTs
  - must emulate accessed and modified bits for guest





"guest page walk"

• lots of memory accesses....

- lots of memory accesses....
- but how many exactly?













And Combined



max. number of memory accesses per address translation

• 5 on guest level

... and combined ...

www.tugraz.at

max. number of memory accesses per address translation

- 5 on guest level
- each induces 5 on host level

... and combined ...

www.tugraz.at

max. number of memory accesses per address translation

- 5 on guest level
- each induces 5 on host level
- makes 25!



**Performance** 

• depending on application: 3.9-4.6 times slower

**Performance** 

• depending on application: 3.9-4.6 times slower

• but: TLB

www.tugraz.at

• EPT only used if VM active

- EPT only used if VM active
- Translations tagged in TLB with EPT-basepointer

- EPT only used if VM active
- Translations tagged in TLB with EPT-basepointer
  - differentiate TLB-entries of different VMs

- EPT only used if VM active
- Translations tagged in TLB with EPT-basepointer
  - differentiate TLB-entries of different VMs
  - TLB-flush per guest possible

- EPT only used if VM active
- Translations tagged in TLB with EPT-basepointer
  - differentiate TLB-entries of different VMs
  - TLB-flush per guest possible
- VPID: virtual processor ID

- EPT only used if VM active
- Translations tagged in TLB with EPT-basepointer
  - differentiate TLB-entries of different VMs
  - TLB-flush per guest possible
- VPID: virtual processor ID
  - unique value for each VM

- EPT only used if VM active
- Translations tagged in TLB with EPT-basepointer
  - differentiate TLB-entries of different VMs
  - TLB-flush per guest possible
- VPID: virtual processor ID
  - unique value for each VM
  - $\bullet$  translations tagged in TLB using VPID

| S   A   EPT   EPT   Reserved   Address of EPT PML4 table   Rsvd.   S   /   PWL -   PS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | EPTP <sup>3</sup>         |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|
| Reserved Address of EPT PML4 table Rsvd. S A EPT EPT S / PWL-PS D 1 MT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | EPIP                      |
| Ignored Rsvd. Address of EPT page-directory-pointer table $\begin{vmatrix}  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  &  g X  & $                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | PML4E<br>present          |
| S V Ignored Q Q Q                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | PML4E<br>not<br>presen    |
| S V Ign S Ignored Rsvd. Physical address of 1 GB page Reserved Ig X D A 1 P EPT X W R                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | PDPTE<br>1GB<br>page      |
| Ignored Rsvd. Address of EPT page directory $\begin{bmatrix}  g  \ X \  g  \ n, \ U \ n, \ A \end{bmatrix}$ Rsvd. $\begin{bmatrix} X \ W \ R \end{bmatrix}$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | PDPTE<br>page<br>director |
| S<br>V<br>Ignored<br>E                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | PDTPE<br>not<br>presen    |
| S   S   S   Ignored   Rsvd.   Physical address   Reserved   Ig X   D   A   1   P   EPT   X   W   R   S   S   S   S   S   S   S   S   S                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | PDE:<br>2MB<br>page       |
| Ignored Rsvd. Address of EPT page table $\begin{vmatrix}  g  & X &  g  \\ n, U & n, A & \underline{0} \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w  \\ x &  w  \end{vmatrix}$ Rsvd. $\begin{vmatrix} X &  w$ | PDE:<br>page<br>table     |
| S V Ignored $\Omega$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | PDE:<br>not<br>presen     |
| S   g  P  S   Ignored Rsvd. Physical address of 4KB page   Ig X D A   I P EPT   KW R                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | PTE:<br>4KB<br>page       |
| S<br>V Ignored <b>Q Q Q</b>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | PTE:<br>not<br>presen     |

Figure 28-1. Formats of EPTP and EPT Paging-Structure Entries

1. Enable VMX via CR4

- 1. Enable VMX via CR4
- 2. Allocate a VMXON region and use the VMXON instruction

- 1. Enable VMX via CR4
- 2. Allocate a VMXON region and use the VMXON instruction
- 3. Allocate an MSR Bitmap region (we don't want a trap for all MSRs)

- 1. Enable VMX via CR4
- 2. Allocate a VMXON region and use the VMXON instruction
- 3. Allocate an MSR Bitmap region (we don't want a trap for all MSRs)
- 4. Use VMCLEAR instruction

- 1. Enable VMX via CR4
- 2. Allocate a  ${\tt VMXON}$  region and use the  ${\tt VMXON}$  instruction
- 3. Allocate an MSR Bitmap region (we don't want a trap for all MSRs)
- 4. Use VMCLEAR instruction
- 5. Execute VMPTRLD to make a VMCS the "current VMCS"

- 1. Enable VMX via CR4
- 2. Allocate a VMXON region and use the VMXON instruction
- 3. Allocate an MSR Bitmap region (we don't want a trap for all MSRs)
- 4. Use VMCLEAR instruction
- 5. Execute VMPTRLD to make a VMCS the "current VMCS"
- 6. Allocate a VMCS region and set up the VMCS (using  ${\tt VMWRITES})$

- 1. Enable VMX via CR4
- 2. Allocate a VMXON region and use the VMXON instruction
- 3. Allocate an MSR Bitmap region (we don't want a trap for all MSRs)
- 4. Use VMCLEAR instruction
- 5. Execute VMPTRLD to make a VMCS the "current VMCS"
- 6. Allocate a VMCS region and set up the VMCS (using VMWRITES)
- 7. Use the VMLAUNCH

1. user needs help for some operations (e.g., HW interaction)

- 1. user needs help for some operations (e.g., HW interaction)
- $\rightarrow$  can use a syscall!

- 1. user needs help for some operations (e.g., HW interaction)
- $\rightarrow$  can use a syscall!
- 2. What about VMs?

- 1. user needs help for some operations (e.g., HW interaction)
- $\rightarrow$  can use a syscall!
- 2. What about VMs?
- 3. Same concept different level:

- 1. user needs help for some operations (e.g., HW interaction)
- $\rightarrow$  can use a syscall!
- 2. What about VMs?
- 3. Same concept different level:
- ightarrow Hypercalls!

- 1. user needs help for some operations (e.g., HW interaction)
- $\rightarrow$  can use a syscall!
- 2. What about VMs?
- 3. Same concept different level:
- ightarrow Hypercalls!

- 1. user needs help for some operations (e.g., HW interaction)
- $\rightarrow$  can use a syscall!
- 2. What about VMs?
- 3. Same concept different level:
- $\rightarrow$  Hypercalls! via the  ${\tt vmcall}$  instruction

• Full virtualization often not needed

- Full virtualization often not needed
- Serverless / Edge Computing (it's still a form of cloud computing)

- Full virtualization often not needed
- Serverless / Edge Computing (it's still a form of cloud computing)
- ullet Virtualization is not for free o why not skip it and just use OS level isolation?

- Full virtualization often not needed
- Serverless / Edge Computing (it's still a form of cloud computing)
- Virtualization is not for free  $\rightarrow$  why not skip it and just use OS level isolation?
- ullet Context switches between processes are expensive o why not skip process isolation and just use language-level isolation?

Cloud Operating Systems  $\rightarrow$  Hardware-assisted virtualization





# Talk to your kids about hypervisors...before someone else does







PARENTS: TALK TO YOUR
KIDS ABOUT HYPERVISORS...
BEFORE SOMEBOOD ELSE DOES.







60



• Seminar-style

CloudOS: the first time

www.tugraz.at

- Seminar-style
- You code

CloudOS: the first time

www.tugraz.at

- Seminar-style
- You code
- You plan

CloudOS: the first time

www.tugraz.at

- Seminar-style
- You code
- You plan
- You present

60

Team



Daniel Gruss



Fabian Rauscher

• 28 participants  $\rightarrow$  7 teams with each 4 participants

- ullet 28 participants ightarrow 7 teams with each 4 participants
- ightarrow send me your registration until Friday March 19

- 28 participants  $\rightarrow$  7 teams with each 4 participants
- ightarrow send me your registration until Friday March 19
- 10 Points Basic SWEB Hypervisor

- ullet 28 participants o 7 teams with each 4 participants
- ightarrow send me your registration until Friday March 19
- 10 Points Basic SWEB Hypervisor
- 10 Points Advanced Hypervisor Feature of your choice

- 28 participants  $\rightarrow$  7 teams with each 4 participants
- ightarrow send me your registration until Friday March 19
- 10 Points Basic SWEB Hypervisor
- 10 Points Advanced Hypervisor Feature of your choice
- 10 Points Talk on Special Topic

- 28 participants  $\rightarrow$  7 teams with each 4 participants
- ightarrow send me your registration until Friday March 19
- 10 Points Basic SWEB Hypervisor
- 10 Points Advanced Hypervisor Feature of your choice
- 10 Points Talk on Special Topic
- points based on exercise interview

• 26 of 30 points  $\rightarrow$  1

- 26 of 30 points  $\rightarrow$  1
- 22 of 30 points  $\rightarrow$  2

**Grading** 

- 26 of 30 points  $\rightarrow$  1
- 22 of 30 points  $\rightarrow$  2
- 18 of 30 points  $\rightarrow$  3

**Grading** 

- 26 of 30 points  $\rightarrow$  1
- 22 of 30 points  $\rightarrow$  2
- 18 of 30 points  $\rightarrow$  3
- 15 of 30 points  $\rightarrow$  4

• Implementation Deadline 11.6.

- Implementation Deadline 11.6.
- VMX works

- Implementation Deadline 11.6.
- VMX works
- EPT works

- Implementation Deadline 11.6.
- VMX works
- EPT works
- a virtualized SWEB boots and is usable

• Propose an advanced hypervisor feature you will support

- Propose an advanced hypervisor feature you will support
- 280 characters description

- Propose an advanced hypervisor feature you will support
- 280 characters description
- Feature Plan Deadline 7.5.

- Propose an advanced hypervisor feature you will support
- 280 characters description
- Feature Plan Deadline 7.5.
- What? Anything you like

- Propose an advanced hypervisor feature you will support
- 280 characters description
- Feature Plan Deadline 7.5.
- What? Anything you like
  - running multiple VMs

- Propose an advanced hypervisor feature you will support
- 280 characters description
- Feature Plan Deadline 7.5.
- What? Anything you like
  - running multiple VMs
  - page deduplication across VMs

- Propose an advanced hypervisor feature you will support
- 280 characters description
- Feature Plan Deadline 7.5.
- What? Anything you like
  - running multiple VMs
  - page deduplication across VMs
  - EPT hooking

- Propose an advanced hypervisor feature you will support
- 280 characters description
- Feature Plan Deadline 7.5.
- What? Anything you like
  - running multiple VMs
  - page deduplication across VMs
  - EPT hooking
  - shadow page tables

- Propose an advanced hypervisor feature you will support
- 280 characters description
- Feature Plan Deadline 7.5.
- What? Anything you like
  - running multiple VMs
  - page deduplication across VMs
  - EPT hooking
  - shadow page tables
  - virtualize the running system to hook (and alter) instructions

Daniel Gruss

- Propose an advanced hypervisor feature you will support
- 280 characters description
- Feature Plan Deadline 7.5.
- What? Anything you like
  - running multiple VMs
  - page deduplication across VMs
  - EPT hooking
  - shadow page tables
  - virtualize the running system to hook (and alter) instructions
  - virtualization of any hardware devices (many options)

- Propose an advanced hypervisor feature you will support
- 280 characters description
- Feature Plan Deadline 7.5.
- What? Anything you like
  - running multiple VMs
  - page deduplication across VMs
  - EPT hooking
  - shadow page tables
  - virtualize the running system to hook (and alter) instructions
  - virtualization of any hardware devices (many options)
  - nested virtualization

- Propose an advanced hypervisor feature you will support
- 280 characters description
- Feature Plan Deadline 7.5.
- What? Anything you like
  - running multiple VMs
  - page deduplication across VMs
  - EPT hooking
  - shadow page tables
  - virtualize the running system to hook (and alter) instructions
  - virtualization of any hardware devices (many options)
  - nested virtualization
- Implementation Deadline 11.6.

• 12.4., 26.4., 10.5.

- 12.4., 26.4., 10.5.
- 2 talks each

- 12.4., 26.4., 10.5.
- 2 talks each
- 20-40 minutes (=5-10 minutes per participant) + Q&A

66 Daniel Gruss

- 12.4., 26.4., 10.5.
- 2 talks each
- 20-40 minutes (=5-10 minutes per participant) + Q&A
- Register until 29.3. with talk topic and date

www.tugraz.at

• 15.03. Introduction Lecture

www.tugraz.at

- 15.03. Introduction Lecture
- 19.03. **Deadline**: Group Registration

www.tugraz.at

- 15.03. Introduction Lecture
- 19.03. Deadline: Group Registration
- 22.03. Hypervisor Implementation Basics

- 15.03. Introduction Lecture
- 19.03. **Deadline**: Group Registration
- 22.03. Hypervisor Implementation Basics
- 29.03. Deadline: Talk Registration

- 15.03. Introduction Lecture
- 19.03. **Deadline**: Group Registration
- 22.03. Hypervisor Implementation Basics
- 29.03. Deadline: Talk Registration
- 12.04. Student Presentations

- 15.03. Introduction Lecture
- 19.03. **Deadline**: Group Registration
- 22.03. Hypervisor Implementation Basics
- 29.03. Deadline: Talk Registration
- 12.04. Student Presentations
- 26.04. Student Presentations

- 15.03. Introduction Lecture
- 19.03. **Deadline**: Group Registration
- 22.03. Hypervisor Implementation Basics
- 29.03. **Deadline**: Talk Registration
- 12.04. Student Presentations
- 26.04. Student Presentations
- 07.05. **Deadline**: Feature Plan Deadline

- 15.03. Introduction Lecture
- 19.03. **Deadline**: Group Registration
- 22.03. Hypervisor Implementation Basics
- 29.03. **Deadline**: Talk Registration
- 12.04. Student Presentations
- 26.04. Student Presentations
- 07.05. **Deadline**: Feature Plan Deadline
- 10.05. Student Presentations

- 15.03. Introduction Lecture
- 19.03. **Deadline**: Group Registration
- 22.03. Hypervisor Implementation Basics
- 29.03. **Deadline**: Talk Registration
- 12.04. Student Presentations
- 26.04. Student Presentations
- 07.05. **Deadline**: Feature Plan Deadline
- 10.05. Student Presentations
- 11.06. **Deadline**: Implementation Deadline

- 15.03. Introduction Lecture
- 19.03. **Deadline**: Group Registration
- 22.03. Hypervisor Implementation Basics
- 29.03. **Deadline**: Talk Registration
- 12.04. Student Presentations
- 26.04. Student Presentations
- 07.05. **Deadline**: Feature Plan Deadline
- 10.05. Student Presentations
- 11.06. **Deadline**: Implementation Deadline
- 14.06. Exercise Interviews