microprocessor system architectures – ia64 jakub yaghob
TRANSCRIPT
Microprocessor system architectures – IA64
Jakub Yaghob
Application architecture
Application architecture features – I
Instruction set Architecture
Load-Execute-Store architecture, no stack, no division Explicit parallelism
Massive resources (128 integer and FP registers, 64 predicate registers, 8 branch registers)
Enhancements Speculation, predication, software pipelining, branch
prediction, multimedia instructions Instruction level parallelism
Independent instructions in bundles Multiple bundles per clock
Application architecture features – II
Explicit parallelism Instruction group
Defined by a compiler Parallel execution of instructions Strict requirements on dependencies
Forbidden register RAW, WAW dependencies
Memory model Relatively weak Only restriction is RAW, WAW, WAR dependencies on one
memory location Explicit memory access synchronization
Speculation Early memory load Control speculation
Advancing load in a condition Sometimes load executed “uselessly”, when the condition
is not met Data speculation
Advancing load before a store with aliases Checking using ALAT
Speculation check No speculative load, if it would cause an exception Data speculation is invalid, if there is a write to the memory
location
Prediction
Predicate registers 64 1-bit predicate registers PR0-PR63 PR0 hardwired to 1, write is ignored
No specialized arithmetic/logic flags Set by compare instructions
Pair of PR (one for the comparison, one for complementary comparison)
Modes of setting (some of them breach WAW inside of an instruction group)
Nearly all instructions are conditioned by a PR
Register stack
Support for function calls GR0-GR31 are global registers GR32-GR127 create a register stack Each procedure has a register frame
2 variable sized areas: local and output
Register renaming using alloc instruction First output register becomes GR32
If register stack overflows, then CPU will free some registers by saving them into the memory
Privilege levels and serialization Privilege levels
Like IA-32, levels 0-3 System instructions and registers accessible only with CPL=0
Serialization Data dependency
All application and system resources excluding control registers Values written to a register are observed by instructions in subsequent
instruction groups Instruction serialization
Modifications are observed before subsequent instruction group fetches are re-initiated
Data serialization Modifications affecting both execution and data memory access are observed
In-flight Non-serialized resources have “some” value for reads
System registers
Processor Status Register (PSR)
Current execution environment Divided into four overlapped sections Special instructions
Control registers
128 control registers Large number of reserved, only 26 used Groups
Global control registers CR0 (DCR=Default Control Register) CR2 (IVA=Interruption Vector Address) CR8 (PTA=Page Table Address)
Global interrupt control registers Control of an active interrupt
Writes are not serialized
Banked general registers
Fast switching of GR16-GR31 for interrupt handlers Current bank in PSR.bn Bank switching
Interrupt selects bank 0 rfi sets the bank from IPSR.bn bsw switches to the specified bank Including NaT
Virtual memory model
Virtual regions Supports OS with Multiple Address Spaces
Protection domain mechanism Supports OS with Single Address Space
TLB Algorithms for paging deferred to OS
VHPT (Virtual Hash Page Table) Augmenting TLB performance Inverted page tables
Other mechanisms Various page sizes, fixed translations, …
Address translation
TLB
Separated for code and data Data TLB translates accesses to VHPT or RSE Each TLB divided into two parts
Translation registers (TR) Fully associative array OS can explicitly set the translation No automatic replacement
Translation cache (TC) Entries can be inserted by an instruction Automatic replacement (from VHPT)
Access rights on pages
Defined by TLB.ar and TLB.pl Using TLB.ar
Read only Read, execute Read, write Read, write, execute Read only/read, write Read, execute/read, write, execute Read, write, execute/read, write Exec, promote/read, execute
Virtual addressing – other – I
Page sizes 4k, 8k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, 256M,
4G Region registers (RR)
Highest 3 bits of VA create an index into RR
rid – region identification ps – preferred page size ve – VHPT enabling
Virtual addressing – other – II
Protection keys
At least 16 keys A key in TLB entry is compared with protection
keys; exception „key miss fault“
VHPT – I
VHPT – II
Vlastnosti CPU do VHPT nic nezapisuje
CPU neudržuje koherenci TLB a VHPT Dva formáty
Krátký – pro každou oblast, položka 8B Dlouhý – jedna velká pro systém, položka 32B
Různé velikosti mocniny 2 Prohledáváno, pokud selže TLB Pokud nalezeno ve VHPT, automaticky vloženo do TC Pevné hashovací funkce
Physical addressing and memory attributes
Only 63 bits Current architecture and implementation only 50
bits Memory attributes
Virtual – like IA-32 (WB, WC, …) Physical – using bit 63 of FA
0 – WB, speculative 1 – UC, nonspeculative
Nontrivial rules for memory ordering
Interrupts – I
Kinds depending on handlers IVA
Handled by OS, a vector defined by CR2 PAL
Handled by PAL or by system firmware, ev. by OS Kinds depending on behavior
Abort Interrupt
External, asynchronous Fault Trap
Interrupts are disabled during interrupt handling
Interrupts – II
Currently defined 81 exceptions 5 for „hard“ exceptions
RESET, INIT, INT, MCA, PMI 23 for IA-32 emulation
IVA-interrupts Vectors have fixed address Exception groups on one vector
External interrupts 256 vectors Priority division using vector number
Current vector CR65 (IVR=Interrupt Vector Register) Current priority in CR66 (TPR=Task Priority Register)
RSE – 1
Register Stack Engine (RSE) Transfers registers stack from/to memory
Without software intervention in the background Different activity modes (lazy-store intensive-load
intensive-eager) Physical register stack must have size at least 96
registers More in multiplies of 16
RSE – II
Firmware
Processor Abstraction Layer (PAL) Unified interface to the CPU firmware
System abstraction layer (SAL) Separates OS from implementation variation of platforms
Extensible firmware interface (EFI) OS booting
Each FW layer (including OS) has defined an entry point
PAL and SAL placed in 16M memory exactly below 4G Fixed structure
Model firmware