computer architecture and assembly language · 2020-04-11 · introductionproblemdefinition...
TRANSCRIPT
Computer Architecture and Assembly Language
Gabriel Laskar
EPITA
2015
License
I Copyright c© 2004-2005, ACU, Benoit PerrotI Copyright c© 2004-2008, Alexandre BecouletI Copyright c© 2009-2013, Nicolas PouillonI Copyright c© 2014, Joël PorquetI Copyright c© 2015, Gabriel Laskar
Permission is granted to copy, distributeand/or modify this document under the termsof the GNU Free Documentation License, Version1.2 or any later version published by the FreeSoftware Foundation; with the Invariant Sectionsbeing just ‘‘Copying this document’’, noFront-Cover Texts, and no Back-Cover Texts.
Introduction
Part I
Introduction
Gabriel Laskar (EPITA) CAAL 2015 3 / 378
Introduction Problem definition
1: Introduction
Problem definition
Outline
Gabriel Laskar (EPITA) CAAL 2015 4 / 378
Introduction Problem definition
What are we trying to learn?Computer Architecture
What is in the hardware?I A bit of history of computers, current machinesI Concepts and conventions: processing, memory, communication,
optimization
How does a machine run code?I Program execution modelI Memory mapping, OS support
Gabriel Laskar (EPITA) CAAL 2015 5 / 378
Introduction Problem definition
What are we trying to learn?Assembly Language
How to “talk” with the machine directly?I Mechanisms involvedI Assembly language structure and usageI Low-level assembly language featuresI C inline assembly
Gabriel Laskar (EPITA) CAAL 2015 6 / 378
Introduction Problem definition
Who do I talk to?
I System gurusI Low-level enthusiasts
I ProgrammersI Wise managers
Gabriel Laskar (EPITA) CAAL 2015 7 / 378
Introduction Problem definition
Who do I talk to?
I System gurusI Low-level enthusiastsI Programmers
I Wise managers
Gabriel Laskar (EPITA) CAAL 2015 7 / 378
Introduction Problem definition
Who do I talk to?
I System gurusI Low-level enthusiastsI Programmers
C/C++
I Wise managers
Gabriel Laskar (EPITA) CAAL 2015 7 / 378
Introduction Problem definition
Who do I talk to?
I System gurusI Low-level enthusiastsI Programmers
C/C++, Objective-C
I Wise managers
Gabriel Laskar (EPITA) CAAL 2015 7 / 378
Introduction Problem definition
Who do I talk to?
I System gurusI Low-level enthusiastsI Programmers
C/C++, Objective-C, C#
I Wise managers
Gabriel Laskar (EPITA) CAAL 2015 7 / 378
Introduction Problem definition
Who do I talk to?
I System gurusI Low-level enthusiastsI Programmers
C/C++, Objective-C, C#, Java
I Wise managers
Gabriel Laskar (EPITA) CAAL 2015 7 / 378
Introduction Problem definition
Who do I talk to?
I System gurusI Low-level enthusiastsI Programmers
C/C++, Objective-C, C#, Java, JS/AS
I Wise managers
Gabriel Laskar (EPITA) CAAL 2015 7 / 378
Introduction Problem definition
Who do I talk to?
I System gurusI Low-level enthusiastsI Programmers at any level:
(C/C++, Objective-C, C#, Java, JS/AS, etc.)
I Wise managers
Gabriel Laskar (EPITA) CAAL 2015 7 / 378
Introduction Problem definition
Who do I talk to?
I System gurusI Low-level enthusiastsI Programmers at any level:
(C/C++, Objective-C, C#, Java, JS/AS, etc.)I Wise managers
Gabriel Laskar (EPITA) CAAL 2015 7 / 378
Introduction Outline
1: Introduction
Problem definition
Outline
Gabriel Laskar (EPITA) CAAL 2015 8 / 378
Introduction Outline
Course outline
I Processor architectureI MemoryI Memory mappingI Execution flowI Object file formatsI Assembly programmingI Focus on x86I Focus on RISC processorsI CPU-aware optimizationsI Multi-/Many-core, heterogeneous systems
Gabriel Laskar (EPITA) CAAL 2015 9 / 378
Processor architecture
Part II
Processor architecture
Gabriel Laskar (EPITA) CAAL 2015 10 / 378
Processor architecture Overview
2: Processor architecture
Overview
Inside the processor
Processor units
Instructions
Instruction flow
Pipeline processor
CISC & RISC architectures
Gabriel Laskar (EPITA) CAAL 2015 11 / 378
Processor architecture Overview
What a processor is...
A processor must be able to perform the following basic tasks:I Execute instructionsI Read operandsI Store results
It needs several basic units to perform those tasks:I A control unitI An arithmetic and logical unit (ALU)I A register bank
Let’s design it!
Gabriel Laskar (EPITA) CAAL 2015 12 / 378
Processor architecture Overview
What a processor is...
A processor must be able to perform the following basic tasks:I Execute instructionsI Read operandsI Store results
It needs several basic units to perform those tasks:I A control unitI An arithmetic and logical unit (ALU)I A register bank
Let’s design it!
Gabriel Laskar (EPITA) CAAL 2015 12 / 378
Processor architecture Overview
Basic architecture
Control Unit
ALU
R1
R0
= 0123
= 0100
= 0023
R5
R4
R3
R2
Reg
iste
rs
Gabriel Laskar (EPITA) CAAL 2015 13 / 378
Processor architecture Overview
Basic architecture (2)
In this model, the system state is entirely contained in the processor.I This might be sufficient for a very basic processorI More features could be leveraged by adding registers or program steps
Unfortunately,I Internal memory is expensive and hard to designI There is no communicationI Updating the program may not be easy
We need an access to memory, external devices, etc.
Gabriel Laskar (EPITA) CAAL 2015 14 / 378
Processor architecture Overview
Basic architecture (2)
In this model, the system state is entirely contained in the processor.I This might be sufficient for a very basic processorI More features could be leveraged by adding registers or program steps
Unfortunately,I Internal memory is expensive and hard to designI There is no communicationI Updating the program may not be easy
We need an access to memory, external devices, etc.
Gabriel Laskar (EPITA) CAAL 2015 14 / 378
Processor architecture Overview
Basic architecture (2)
In this model, the system state is entirely contained in the processor.I This might be sufficient for a very basic processorI More features could be leveraged by adding registers or program steps
Unfortunately,I Internal memory is expensive and hard to designI There is no communicationI Updating the program may not be easy
We need an access to memory, external devices, etc.
Gabriel Laskar (EPITA) CAAL 2015 14 / 378
Processor architecture Overview
Revised processor model
A processor must be able to perform the following basic tasks:I Fetch instructions from an external entity and understand them (fetch
and decode)I Execute instructionsI Store results to registers or external memory
It needs several basic units to perform those tasks:I A control unitI An arithmetic and logical unit (ALU)I A register bankI A program memory accessI A data memory access
Gabriel Laskar (EPITA) CAAL 2015 15 / 378
Processor architecture Overview
Revised processor model
A processor must be able to perform the following basic tasks:I Fetch instructions from an external entity and understand them (fetch
and decode)I Execute instructionsI Store results to registers or external memory
It needs several basic units to perform those tasks:I A control unitI An arithmetic and logical unit (ALU)I A register bankI A program memory accessI A data memory access
Gabriel Laskar (EPITA) CAAL 2015 15 / 378
Processor architecture Overview
Revised processor model (2)
Inst
ruct
ion
@
Co
de
Co
mm
and
R D
ata
Dat
a @
W D
ata
Control Unit
ALU
R1
R0
= 0123
= 0100
= 0023
R5
R4
R3
R2
Reg
iste
rs
Gabriel Laskar (EPITA) CAAL 2015 16 / 378
Processor architecture Overview
Processor physical layout
A processor is composed of many different units:I Caches, MMUI Integer unit, Control unit, Floating-point unit
Each unit is:I implemented as an hardware componentI made of switchable parts (transistors)
In old processors:I Units used to be independent chipsI Some were even optional “coprocessors”
Today, processors are embedded on a single chip.
Gabriel Laskar (EPITA) CAAL 2015 17 / 378
Processor architecture Inside the processor
2: Processor architecture
Overview
Inside the processor
Processor units
Instructions
Instruction flow
Pipeline processor
CISC & RISC architectures
Gabriel Laskar (EPITA) CAAL 2015 18 / 378
Processor architecture Inside the processor
Processor package
Gabriel Laskar (EPITA) CAAL 2015 19 / 378
Processor architecture Inside the processor
Processor physical layout
Gabriel Laskar (EPITA) CAAL 2015 20 / 378
Processor architecture Inside the processor
Processor physical layout
Gabriel Laskar (EPITA) CAAL 2015 21 / 378
Processor architecture Inside the processor
Processor physical layoutTransistor details
Gabriel Laskar (EPITA) CAAL 2015 22 / 378
Processor architecture Processor units
2: Processor architecture
Overview
Inside the processor
Processor units
Instructions
Instruction flow
Pipeline processor
CISC & RISC architectures
Gabriel Laskar (EPITA) CAAL 2015 23 / 378
Processor architecture Processor units
Units
I Control unit fetches anddecodes the instruction
I Registers gives the dataI ALU implements the
operationI Some instructions access
external data
Inst
ruct
ion
@
Co
de
Co
mm
and
R D
ata
Dat
a @
W D
ata
Control Unit
ALU
R1
R0
= 0123
= 0100
= 0023
R5
R4
R3
R2
Reg
iste
rs
Gabriel Laskar (EPITA) CAAL 2015 24 / 378
Processor architecture Processor units
RegistersMay be seen as variables located inside the processor
I 8, 16, 32, 64, 128, ... -bits largeI General-purpose registers:
I integer (int)I floating point (float, double)
I Specialized registers:I flags
I Zero,I Negative,I Carry,I Overflow,I etc.
I systemI Mode,I IRQ masking,I etc.
Gabriel Laskar (EPITA) CAAL 2015 25 / 378
Processor architecture Processor units
ALU: Arithmetic and Logical UnitAn unit without registers!
I Logical operationsI AND, OR, XOR, NOT, NOR
I Arithmetic operationsI addition, subtraction, multiplication, division
I ShiftsI Compares
Division is not possible without registers!
Gabriel Laskar (EPITA) CAAL 2015 26 / 378
Processor architecture Instructions
2: Processor architecture
Overview
Inside the processor
Processor units
Instructions
Instruction flow
Pipeline processor
CISC & RISC architectures
Gabriel Laskar (EPITA) CAAL 2015 27 / 378
Processor architecture Instructions
Instruction types
There are different instruction types:I Arithmetic and logical operationsI Control instructionsI Memory access instructions
Gabriel Laskar (EPITA) CAAL 2015 28 / 378
Processor architecture Instructions
Instructions classification
Flynn’s taxonomy:I SISD: Single Instruction, Single Data
Classical Von Neumann architectureI SIMD: Single Instruction, Multiple Data
Vectorial computersI MIMD: Multiple Instruction, Multiple Data
Multiprocessor computers
Gabriel Laskar (EPITA) CAAL 2015 29 / 378
Processor architecture Instruction flow
2: Processor architecture
Overview
Inside the processor
Processor units
Instructions
Instruction flow
Pipeline processor
CISC & RISC architectures
Gabriel Laskar (EPITA) CAAL 2015 30 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
Fetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
DecodFetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
ExecDecodFetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
WriteExecDecodFetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
Fetch
WriteExecDecodFetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
DecodFetch
WriteExecDecodFetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
ExecDecodFetch
WriteExecDecodFetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
Fetch
ExecDecodFetch
WriteExecDecodFetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
DecodFetch
ExecDecodFetch
WriteExecDecodFetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
ExecDecodFetch
ExecDecodFetch
WriteExecDecodFetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
MemExecDecodFetch
ExecDecodFetch
WriteExecDecodFetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorInstruction flow
Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)
A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.
WriteMemExecDecodFetch
ExecDecodFetch
WriteExecDecodFetch
PC
ad
dre
ss
sub
jmp
ld
Gabriel Laskar (EPITA) CAAL 2015 31 / 378
Processor architecture Instruction flow
Microprogrammed processorPro/cons
Cons:I SlowI The more complex the instructions are, the longer they take to get
processedI Most of the hardware is used only once for each instructionI Most of the hardware is unused most of the time
Pros:I Easy to implementI SmallI New instructions can be added just by adding new steps
Gabriel Laskar (EPITA) CAAL 2015 32 / 378
Processor architecture Instruction flow
Microprogrammed processorPro/cons
Cons:I SlowI The more complex the instructions are, the longer they take to get
processedI Most of the hardware is used only once for each instructionI Most of the hardware is unused most of the time
Pros:I Easy to implementI SmallI New instructions can be added just by adding new steps
Gabriel Laskar (EPITA) CAAL 2015 32 / 378
Processor architecture Pipeline processor
2: Processor architecture
Overview
Inside the processor
Processor units
Instructions
Instruction flow
Pipeline processor
CISC & RISC architectures
Gabriel Laskar (EPITA) CAAL 2015 33 / 378
Processor architecture Pipeline processor
Pipelined processorInstruction flowA pipeline architecture enables the parallel execution of severalinstructions:
I Split the execution of each instruction in several stepsI Each step performs an elementary operationI Each step is associated to a specific part of the hardwareI All parts of the hardware work in parallel
Fetch Decode Exec Memory
Instruction
memory
access
Data
memory
access
Writeback
Gabriel Laskar (EPITA) CAAL 2015 34 / 378
Processor architecture Pipeline processor
PipelineFlow
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Fetch?
Gabriel Laskar (EPITA) CAAL 2015 35 / 378
Processor architecture Pipeline processor
PipelineFlow
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Decod
Fetch?
Fetch
Gabriel Laskar (EPITA) CAAL 2015 35 / 378
Processor architecture Pipeline processor
PipelineFlow
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Fetch
Decod
Exec
?
Decod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 35 / 378
Processor architecture Pipeline processor
PipelineFlow
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
?
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 35 / 378
Processor architecture Pipeline processor
PipelineFlow
Exec
Decod
Fetch
Mem
Write
?
jmp
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 35 / 378
Processor architecture Pipeline processor
PipelineFlow
sub
Exec
Decod
Fetch
Mem
Write
?
Exec
Decod
Fetch
Mem
Write
jmp
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 35 / 378
Processor architecture Pipeline processor
PipelineFlow
sub
Fetch
Decod
Exec
Write
Mem
?
Exec
Decod
Fetch
Mem
Write
Exec
Decod
Fetch
Mem
Write
add
jmp
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 35 / 378
Processor architecture Pipeline processor
Speeding up the pipeline
Current processors extensively use the pipeline architecture to acceleratethe execution of instructions.
Because a pipeline architecture works in parallel, the slowest step delaydetermines the pipeline global cycle delay (and working frequency).
8 ns18 ns10 ns
Step 1 Step 2 Step 3
Gabriel Laskar (EPITA) CAAL 2015 36 / 378
Processor architecture Pipeline processor
Speeding up the pipelineSplitting operations in shorter steps enables the processor frequency toincrease.
8 ns18 ns10 ns
Step 1 Step 2 Step 3
Step 1
10 ns 9 ns
Step 2.1 Step 2.2 Step 3
8 ns9 ns
Gabriel Laskar (EPITA) CAAL 2015 37 / 378
Processor architecture CISC & RISC architectures
2: Processor architecture
Overview
Inside the processor
Processor units
Instructions
Instruction flow
Pipeline processor
CISC & RISC architectures
Gabriel Laskar (EPITA) CAAL 2015 38 / 378
Processor architecture CISC & RISC architectures
Instruction-set classification
Based on internal architecture and instructions formats, processorarchitectures may be classified in two groups:
I Complex Instruction Set Computer (CISC)I Reduced Instruction Set Computer (RISC)
Early processor architectures were mostly CISC-based: z80, Intel x86,Motorola 68000, etc.
More recent designs are rather RISC-based: MIPS, Sparc, Alpha,PowerPC, ARM, etc.
Gabriel Laskar (EPITA) CAAL 2015 39 / 378
Processor architecture CISC & RISC architectures
RISCCharacteristics
Pros:I Simple instructionsI Fixed-length instructionsI Decoding instructions requires simple hardware
Cons:I Programs are longer as they need more instructionsI Optimization is harder, compilers need to be smarter
Sometimes said as “Reject Important Stuff into Compiler”
Gabriel Laskar (EPITA) CAAL 2015 40 / 378
Processor architecture CISC & RISC architectures
RISCInstruction example
sub %g1, %g2, %g3
0x200x86 0x40 0x02
00010000000000000010001000001110
%g3 sub %g1 %g2
source unused sourceformat destination opcode
Gabriel Laskar (EPITA) CAAL 2015 41 / 378
Processor architecture CISC & RISC architectures
CISCCharacteristics
Pros:I Lots of instructions and opcodesI A single instruction can perform complex operationsI Assembly programs are easier and shorter to writeI Code compression ratio is good
Cons:I Binary instruction format has variable lengthI It requires more complex hardware and high frequencies are harder to
achieve
Modern processors often internally translate the CISC code to RISCmicrocode
Gabriel Laskar (EPITA) CAAL 2015 42 / 378
Processor architecture CISC & RISC architectures
CISCInstruction example
opcode opc2 immediate memorydisplacement
modr/m
Gabriel Laskar (EPITA) CAAL 2015 43 / 378
Memory
Part III
Memory
Gabriel Laskar (EPITA) CAAL 2015 44 / 378
Memory Memories
3: Memory
MemoriesMemory typesAccess examples
Memory accessing modes
Alignment
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 45 / 378
Memory Memories Memory types
3: Memory
MemoriesMemory typesAccess examples
Memory accessing modes
Alignment
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 46 / 378
Memory Memories Memory types
Reasons to access memory
How does the memory work with the processor?I Memory is used to fetch instructions,I Memory is used to access data.I There may be one unique memory, or two.
I If there is one memory, instruction / data accesses must besequencialized
I If there are two, code cannot be accessed as dataConventional names:1 Von Neuman architecture2 Harvard architecture
Gabriel Laskar (EPITA) CAAL 2015 47 / 378
Memory Memories Access examples
3: Memory
MemoriesMemory typesAccess examples
Memory accessing modes
Alignment
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 48 / 378
Memory Memories Access examples
Instruction fetchThe processor needs an instruction to process
Processor
Memory
Gabriel Laskar (EPITA) CAAL 2015 49 / 378
Memory Memories Access examples
Instruction fetchThe processor needs an instruction to process
Inst
ruct
ion @
Processor
Memory
Gabriel Laskar (EPITA) CAAL 2015 49 / 378
Memory Memories Access examples
Instruction fetchThe processor needs an instruction to process
Code
Inst
ruct
ion @
Processor
Memory
Gabriel Laskar (EPITA) CAAL 2015 49 / 378
Memory Memories Access examples
Data accessloadLoad the content of the memory cells pointed to by %g1 into %g2 register.
Processor
Memory
ld [%g1], %g2
Gabriel Laskar (EPITA) CAAL 2015 50 / 378
Memory Memories Access examples
Data accessloadLoad the content of the memory cells pointed to by %g1 into %g2 register.
Com
man
d
Dat
a @
Processor
Memory
ld [%g1], %g2
Gabriel Laskar (EPITA) CAAL 2015 50 / 378
Memory Memories Access examples
Data accessloadLoad the content of the memory cells pointed to by %g1 into %g2 register.
R D
ata
Com
man
d
Dat
a @
Processor
Memory
ld [%g1], %g2
Gabriel Laskar (EPITA) CAAL 2015 50 / 378
Memory Memories Access examples
Data accessstoreStores the content of %g2 register into the memory cells pointed to by %g1.
Processor
Memory
st %g2, [%g1]
Gabriel Laskar (EPITA) CAAL 2015 51 / 378
Memory Memories Access examples
Data accessstoreStores the content of %g2 register into the memory cells pointed to by %g1.
W D
ata
Com
man
d
Dat
a @
Processor
Memory
st %g2, [%g1]
Gabriel Laskar (EPITA) CAAL 2015 51 / 378
Memory Memory accessing modes
3: Memory
Memories
Memory accessing modesImmediate addressingAbsolute addressingRegister indirect addressingComplex addressing
Alignment
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 52 / 378
Memory Memory accessing modes Immediate addressing
3: Memory
Memories
Memory accessing modesImmediate addressingAbsolute addressingRegister indirect addressingComplex addressing
Alignment
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 53 / 378
Memory Memory accessing modes Immediate addressing
Immediate addressing
I The value of data is directly stored in the instructionI No memory access needed to get the value
In C language:int a, b = ...;
a = b + 0x831;
In assembly language:add %g1, 0x831, %g2
Gabriel Laskar (EPITA) CAAL 2015 54 / 378
Memory Memory accessing modes Immediate addressing
Immediate addressing
I The value of data is directly stored in the instructionI No memory access needed to get the value
In C language:int a, b = ...;
a = b + 0x831;
In assembly language:add %g1, 0x831, %g2
Gabriel Laskar (EPITA) CAAL 2015 54 / 378
Memory Memory accessing modes Immediate addressing
Immediate addressing
I The value of data is directly stored in the instructionI No memory access needed to get the value
In C language:int a, b = ...;
a = b + 0x831;
In assembly language:add %g1, 0x831, %g2
Gabriel Laskar (EPITA) CAAL 2015 54 / 378
Memory Memory accessing modes Immediate addressing
Immediate addressingSparc instruction details
0x200x84
000010001000001010
%g2 sub %g1
sourceformat destination opcode
sub %g1, 0x831, %g2
0x68 0x31
0x831
1 0 1000 0011 0001
immediate
Gabriel Laskar (EPITA) CAAL 2015 55 / 378
Memory Memory accessing modes Absolute addressing
3: Memory
Memories
Memory accessing modesImmediate addressingAbsolute addressingRegister indirect addressingComplex addressing
Alignment
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 56 / 378
Memory Memory accessing modes Absolute addressing
Absolute addressing
I The address of the data is stored in the instructionI A memory access is needed to get the value
In C language:int a = *(int*)0x830;
In assembly language:ld [0x830], %g1
Gabriel Laskar (EPITA) CAAL 2015 57 / 378
Memory Memory accessing modes Absolute addressing
Absolute addressing
I The address of the data is stored in the instructionI A memory access is needed to get the value
In C language:int a = *(int*)0x830;
In assembly language:ld [0x830], %g1
Gabriel Laskar (EPITA) CAAL 2015 57 / 378
Memory Memory accessing modes Absolute addressing
Absolute addressing
I The address of the data is stored in the instructionI A memory access is needed to get the value
In C language:int a = *(int*)0x830;
In assembly language:ld [0x830], %g1
Gabriel Laskar (EPITA) CAAL 2015 57 / 378
Memory Memory accessing modes Register indirect addressing
3: Memory
Memories
Memory accessing modesImmediate addressingAbsolute addressingRegister indirect addressingComplex addressing
Alignment
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 58 / 378
Memory Memory accessing modes Register indirect addressing
Register indirect addressing
I The address of the data is stored in a registerI A memory access is needed to get the value
In C language:int a, *b = ...;
a = *b;
In assembly language:ld [%g2], %g1
Gabriel Laskar (EPITA) CAAL 2015 59 / 378
Memory Memory accessing modes Register indirect addressing
Register indirect addressing
I The address of the data is stored in a registerI A memory access is needed to get the value
In C language:int a, *b = ...;
a = *b;
In assembly language:ld [%g2], %g1
Gabriel Laskar (EPITA) CAAL 2015 59 / 378
Memory Memory accessing modes Register indirect addressing
Register indirect addressing
I The address of the data is stored in a registerI A memory access is needed to get the value
In C language:int a, *b = ...;
a = *b;
In assembly language:ld [%g2], %g1
Gabriel Laskar (EPITA) CAAL 2015 59 / 378
Memory Memory accessing modes Complex addressing
3: Memory
Memories
Memory accessing modesImmediate addressingAbsolute addressingRegister indirect addressingComplex addressing
Alignment
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 60 / 378
Memory Memory accessing modes Complex addressing
Complex addressing
I Register indirect with base registerI Register indirect with offsetI And many others...
Assembly example:ld [%g2 + 0x124], %g1ld [%g2 + %g3], %g1
Gabriel Laskar (EPITA) CAAL 2015 61 / 378
Memory Alignment
3: Memory
Memories
Memory accessing modes
AlignmentMemory access alignmentStructure alignmentpacked structures
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 62 / 378
Memory Alignment Memory access alignment
3: Memory
Memories
Memory accessing modes
AlignmentMemory access alignmentStructure alignmentpacked structures
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 63 / 378
Memory Alignment Memory access alignment
Definition
Data access alignment is solely about considered data type width.I 32-bit integer access is aligned for addresses multiple of 4I 16-bit integer access is aligned for even addressesI 8-bit integer (char) accesses are always aligned!
Think about address % sizeof(type)
Gabriel Laskar (EPITA) CAAL 2015 64 / 378
Memory Alignment Memory access alignment
Access alignment
0x00
0x10
0x04
0x08
0x0c
Bus Width
Data read
0x6b0x5a
0xef0xcd 0x9f 0x8c
0xab0x89
0x56 0x780x340x01Address
Gabriel Laskar (EPITA) CAAL 2015 65 / 378
Memory Alignment Structure alignment
3: Memory
Memories
Memory accessing modes
AlignmentMemory access alignmentStructure alignmentpacked structures
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 66 / 378
Memory Alignment Structure alignment
Structure alignment
In a structure:I Fields must be in declaration orderI Fields must all be aligned
Data alignment does not depend on architecture bus width
15 160 32
struct bit_packed_s
{
int a;
short b;
short c;};
c
a
b
Gabriel Laskar (EPITA) CAAL 2015 67 / 378
Memory Alignment Structure alignment
PaddingSometimes, consecutive fields in declaration cannot be consecutive andaligned in memory
I Compilers put structure fields at aligned offsetsI Alignment may add unused padding bytes between fields
15 160 32
struct example_aligned_s
{
int b;
short c;};
char a;
a
b
c
padding bytes
Gabriel Laskar (EPITA) CAAL 2015 68 / 378
Memory Alignment Structure alignment
PaddingSometimes, consecutive fields in declaration cannot be consecutive andaligned in memory
I Compilers put structure fields at aligned offsetsI Alignment may add unused padding bytes between fields
15 160 32
struct example_aligned_s
{
int b;
short c;};
char a;
a
b
c
padding bytesGabriel Laskar (EPITA) CAAL 2015 68 / 378
Memory Alignment packed structures
3: Memory
Memories
Memory accessing modes
AlignmentMemory access alignmentStructure alignmentpacked structures
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 69 / 378
Memory Alignment packed structures
Basic packing
I Fields alignment can be ignored by compiler, on requestI Few architectures are able to access non aligned fields directlyI If non-native, unaligned access is emulated with multiple memory
accesses, shifts, ORs, etc.
struct example_packed_s
{
int b;
short c;
char a;
}
__attribute__((packed));
15 160 32
a b
c
Gabriel Laskar (EPITA) CAAL 2015 70 / 378
Memory Alignment packed structures
Low-level packingI Packing can even be done at bit level!I Compiler will handle shifts and masksI Can be mixed with unionI Powerful for matching existing protocols
struct bit_packed_s
{
int a:17;
int b:5;
int c:12;
int d:16;
int e:14;
}
__attribute__((packed));
a
d
cb
e
15 160 32
Beware of endiannessGabriel Laskar (EPITA) CAAL 2015 71 / 378
Memory Endianness
3: Memory
Memories
Memory accessing modes
Alignment
EndiannessDifferent designs, different paradigmsExplainationDemo
Caches
Gabriel Laskar (EPITA) CAAL 2015 72 / 378
Memory Endianness Different designs, different paradigms
3: Memory
Memories
Memory accessing modes
Alignment
EndiannessDifferent designs, different paradigmsExplainationDemo
Caches
Gabriel Laskar (EPITA) CAAL 2015 73 / 378
Memory Endianness Different designs, different paradigms
Endianness
A data string represented with multiple bytes must be stored in memory.Similarly to written language, these bytes may be written “left-to-right” or“right-to-left”.
I Big-EndianI Little-EndianI Other endian modes
Gabriel Laskar (EPITA) CAAL 2015 74 / 378
Memory Endianness Explaination
3: Memory
Memories
Memory accessing modes
Alignment
EndiannessDifferent designs, different paradigmsExplainationDemo
Caches
Gabriel Laskar (EPITA) CAAL 2015 75 / 378
Memory Endianness Explaination
EndiannessMathematical reference
With a base b, a natural N may be decomposed in digits dk .If we naturally write it:N = dndn−1...dk ...d1d0N = dn · bn + dn−1 · bn−1 + · · ·+ dk · bk + · · ·+ d1 · b1 + d0 · b0
N = 4810310 = bbe716I With b = 10: N = 4 · 104 + 8 · 103 + 1 · 102 + 0 · 101 + 3 · 100I With b = 16: N = 11 · 163 + 11 · 162 + 14 · 161 + 7 · 160
So logically we tend to count digits from LSB: right to left
Digit no 4 3 2 1 0Base 10 value 4 8 1 0 3Base 16 value b b e 7
Gabriel Laskar (EPITA) CAAL 2015 76 / 378
Memory Endianness Explaination
EndiannessMathematical reference
With a base b, a natural N may be decomposed in digits dk .If we naturally write it:N = dndn−1...dk ...d1d0N = dn · bn + dn−1 · bn−1 + · · ·+ dk · bk + · · ·+ d1 · b1 + d0 · b0
N = 4810310 = bbe716I With b = 10: N = 4 · 104 + 8 · 103 + 1 · 102 + 0 · 101 + 3 · 100I With b = 16: N = 11 · 163 + 11 · 162 + 14 · 161 + 7 · 160
So logically we tend to count digits from LSB: right to left
Digit no 4 3 2 1 0Base 10 value 4 8 1 0 3Base 16 value b b e 7
Gabriel Laskar (EPITA) CAAL 2015 76 / 378
Memory Endianness Explaination
EndiannessMathematical reference
With a base b, a natural N may be decomposed in digits dk .If we naturally write it:N = dndn−1...dk ...d1d0N = dn · bn + dn−1 · bn−1 + · · ·+ dk · bk + · · ·+ d1 · b1 + d0 · b0
N = 4810310 = bbe716I With b = 10: N = 4 · 104 + 8 · 103 + 1 · 102 + 0 · 101 + 3 · 100I With b = 16: N = 11 · 163 + 11 · 162 + 14 · 161 + 7 · 160
So logically we tend to count digits from LSB: right to left
Digit no 4 3 2 1 0Base 10 value 4 8 1 0 3Base 16 value b b e 7
Gabriel Laskar (EPITA) CAAL 2015 76 / 378
Memory Endianness Explaination
Memory representation
Usually, we like to represent memory in written order, the same way wewrite words on a paper sheet: left to right
0 1 2 30x00 ’A’ ’ ’ ’s’ ’i’0x04 ’m’ ’p’ ’l’ ’e’0x08 ’ ’ ’m’ ’e’ ’s’0x0c ’s’ ’a’ ’g’ ’e’
Gabriel Laskar (EPITA) CAAL 2015 77 / 378
Memory Endianness Explaination
Integer memory representationThe “endianness” problem is whether to write integers
I in text order: the natural way (for western languages)I in index order: with digit 0 at address 0
3 2 1 0
0x00
0x10
0x04
0x08
0x0c
20 21 22 23
00 02 03
10 12
40
01
31 32
41 42
13
33
43
11
30
Gabriel Laskar (EPITA) CAAL 2015 78 / 378
Memory Endianness Explaination
Integer memory representationThe “endianness” problem is whether to write integers
I in text order: the natural way (for western languages)I in index order: with digit 0 at address 0
3 2 1 0
20 21 22 23Big endian
0x00
0x10
0x04
0x08
0x0c
20 21 22 23
00 02 03
10 12
40
01
31 32
41 42
13
33
43
11
30
Gabriel Laskar (EPITA) CAAL 2015 78 / 378
Memory Endianness Explaination
Integer memory representationThe “endianness” problem is whether to write integers
I in text order: the natural way (for western languages)I in index order: with digit 0 at address 0
3 2 1 0
2023 22 21Little endian
0x00
0x10
0x04
0x08
0x0c
20 21 22 23
00 02 03
10 12
40
01
31 32
41 42
13
33
43
11
30
Gabriel Laskar (EPITA) CAAL 2015 78 / 378
Memory Endianness Explaination
Integer memory representationThe “endianness” problem is whether to write integers
I in text order: the natural way (for western languages)I in index order: with digit 0 at address 0
3 2 1 0
2023 22 21Little endian
0x00
0x10
0x04
0x08
0x0c
2023 22 21
00
10
40
30
03
13
33
43
02
12
32
42
01
31
41
11
0x00
0x10
0x04
0x08
0x0c
20 21 22 23
00 02 03
10 12
40
01
31 32
41 42
13
33
43
11
30
Gabriel Laskar (EPITA) CAAL 2015 78 / 378
Memory Endianness Explaination
Integer memory representationThe “endianness” problem is whether to write integers
I in text order: the natural way (for western languages)I in index order: with digit 0 at address 0
3 2 1 0
2023 22 21Little endian
0x00
0x10
0x04
0x08
0x0c
2023 22 21
00
10
40
30
03
13
33
43
02
12
32
42
01
31
41
11
0x00
0x10
0x04
0x08
0x0c
20 21 22 23
00 02 03
10 12
40
01
31 32
41 42
13
33
43
11
30
Gabriel Laskar (EPITA) CAAL 2015 78 / 378
Memory Endianness Explaination
EndiannessDo not mix everything!
I Data must be stored and fetched using the same convention.I Don’t worry about byte order in registers, it does not make sense.
Gabriel Laskar (EPITA) CAAL 2015 79 / 378
Memory Endianness Demo
3: Memory
Memories
Memory accessing modes
Alignment
EndiannessDifferent designs, different paradigmsExplainationDemo
Caches
Gabriel Laskar (EPITA) CAAL 2015 80 / 378
Memory Endianness Demo
Endianness Demo
int main(void){
unsigned int a = 0x12345678;hexdump(&a);
}
Gabriel Laskar (EPITA) CAAL 2015 81 / 378
Memory Caches
3: Memory
Memories
Memory accessing modes
Alignment
Endianness
Caches
Gabriel Laskar (EPITA) CAAL 2015 82 / 378
Memory Caches
Cache memory: reason
I Once upon a time, CPU and memory speed were the same.I Of course, they evolved:
I Moore’s law: CPU power doubles every 2 years,I Memory speed: +7% every year.
Gabriel Laskar (EPITA) CAAL 2015 83 / 378
Memory Caches
Cache memory: definition
Cache memory is:I a local copy of central memory,I transparent, on-demand,I volatile (may be flushed anytime),I faster, closer to CPU than central memory,I expensive!
Gabriel Laskar (EPITA) CAAL 2015 84 / 378
Memory Caches
Cache memoryHierarchy
L1
Data
CPU
L1
Data
CPU
Die−Shared L3
Ins + Data + TLB
CPU−Shared L2
Ins + Data
Instr.
L1L1
Instr.
CPU−Shared L2
Ins + Data
I There are multiple caches “levels”I with different size,I with different speed,I with different latency.
I Caches may be shared between code and data.Gabriel Laskar (EPITA) CAAL 2015 85 / 378
Memory Caches
Cache memories side-effects
Caches may also involve stangeness:I having copies of memory introduce a coherence problem,I an access to a given memory location may vary,
There are various memory operators:Programmer that you handle in C code,Assembler that the compiler generated,
CPU that the CPU does,In-cache that the cache does in the system.
Gabriel Laskar (EPITA) CAAL 2015 86 / 378
Memory mapping
Part IV
Memory mapping
Gabriel Laskar (EPITA) CAAL 2015 87 / 378
Memory mapping Address space
4: Memory mapping
Address spaceDefinitionTranslation
Computer address spaces
Computer address space translation
MMU patterns
Gabriel Laskar (EPITA) CAAL 2015 88 / 378
Memory mapping Address space Definition
4: Memory mapping
Address spaceDefinitionTranslation
Computer address spaces
Computer address space translation
MMU patterns
Gabriel Laskar (EPITA) CAAL 2015 89 / 378
Memory mapping Address space Definition
Address spaceDefinition
An address space is a set of discrete values targetting a set of objects.I Each address points to one objectI One given object may be pointed by more than one address
1
2
3
foo
bar
baz
Gabriel Laskar (EPITA) CAAL 2015 90 / 378
Memory mapping Address space Translation
4: Memory mapping
Address spaceDefinitionTranslation
Computer address spaces
Computer address space translation
MMU patterns
Gabriel Laskar (EPITA) CAAL 2015 91 / 378
Memory mapping Address space Translation
Address space translation
Address space translation creates a new address space where each sourceaddress is mapped to a destination address.A given object can then be targetted by either addresses.
a
b
c
1
2
3
foo
bar
baz
Gabriel Laskar (EPITA) CAAL 2015 92 / 378
Memory mapping Computer address spaces
4: Memory mapping
Address space
Computer address spacesDefinitionUsual address spaces
Computer address space translation
MMU patterns
Gabriel Laskar (EPITA) CAAL 2015 93 / 378
Memory mapping Computer address spaces Definition
4: Memory mapping
Address space
Computer address spacesDefinitionUsual address spaces
Computer address space translation
MMU patterns
Gabriel Laskar (EPITA) CAAL 2015 94 / 378
Memory mapping Computer address spaces Definition
Memory address spaceDefinition
A memory address spaceI is contiguousI can be mapped to another target address space (through address
space translation)All memory accesses are done with respect to an address space.
Gabriel Laskar (EPITA) CAAL 2015 95 / 378
Memory mapping Computer address spaces Usual address spaces
4: Memory mapping
Address space
Computer address spacesDefinitionUsual address spaces
Computer address space translation
MMU patterns
Gabriel Laskar (EPITA) CAAL 2015 96 / 378
Memory mapping Computer address spaces Usual address spaces
Computer address spaces
CPU Address space available through a CPU register. Current machineshave 32 or 64 bit pointer registers
Physical Address space actually wired between hardware components. Currentmachines have physical address spaces around 40 bits.
Virtual Address space reachable by a process. Generally 32 or 64 bits.
Gabriel Laskar (EPITA) CAAL 2015 97 / 378
Memory mapping Computer address spaces Usual address spaces
Physical memory
Physical memory provides the lowest accessible address space in computer.Physical RAM is usually accessible as a small memory subset of thephysical address space.Physical memory can be mapped in different ways depending on physicaladdress bus implementation:
I Accessible at a given location,I Scattered at multiple locations,I Accessible (many times) at multiple locations.
Gabriel Laskar (EPITA) CAAL 2015 98 / 378
Memory mapping Computer address space translation
4: Memory mapping
Address space
Computer address spaces
Computer address space translationSegmentationPagination
MMU patterns
Gabriel Laskar (EPITA) CAAL 2015 99 / 378
Memory mapping Computer address space translation Segmentation
4: Memory mapping
Address space
Computer address spaces
Computer address space translationSegmentationPagination
MMU patterns
Gabriel Laskar (EPITA) CAAL 2015 100 / 378
Memory mapping Computer address space translation Segmentation
Memory segments
Memory segments define an address space as a sub-region of anotheraddress space. It is mostly defined by the following attributes
I Segment base address in target address space,I Segment size,I Segment type and access rights.
Gabriel Laskar (EPITA) CAAL 2015 101 / 378
Memory mapping Computer address space translation Segmentation
Simple segment
Segmentation keeps memory locations contiguous.
0
0
Source address space
Destination address space
Gabriel Laskar (EPITA) CAAL 2015 102 / 378
Memory mapping Computer address space translation Segmentation
Physical address space
Physical address space
Physical memory
0
0
Gabriel Laskar (EPITA) CAAL 2015 103 / 378
Memory mapping Computer address space translation Segmentation
Single mapped RAM
addr[x<n] = mem[x]
Physical address space
Physical memory
0
0
Gabriel Laskar (EPITA) CAAL 2015 104 / 378
Memory mapping Computer address space translation Segmentation
Single mapped RAM
addr[x>=n] = ?
n
?
Physical address space
Physical memory
0
0
Gabriel Laskar (EPITA) CAAL 2015 104 / 378
Memory mapping Computer address space translation Segmentation
Multiple/loop mapped RAM
addr[x] = mem[x%n]
Physical address space
Physical memory
0
0
Gabriel Laskar (EPITA) CAAL 2015 105 / 378
Memory mapping Computer address space translation Segmentation
Memory access using segments
To access memory through a defined segment, the CPU performs thefollowing tasks:
I Check requested address against the segment size,I Add the segment base address to the requested memory address,I Check access rights.
Gabriel Laskar (EPITA) CAAL 2015 106 / 378
Memory mapping Computer address space translation Segmentation
Memory access using segmentsMemory access
0
0
Segment baseaddress
Accessoffset
Effective address
Target addressspace
Process addressspace
Gabriel Laskar (EPITA) CAAL 2015 107 / 378
Memory mapping Computer address space translation Segmentation
Segment descriptor table
0
0
addressspace
Target
Segment 2
Segment 3
Segment 0
sizebase
CPU
Register
1
0
0
6
9
3
3
− −
0 12
Gabriel Laskar (EPITA) CAAL 2015 108 / 378
Memory mapping Computer address space translation Segmentation
Limitations of segmentation
Segmentation is a great thing, but is has a few limitations:I Address-space must be mapped in contiguous blocksI Thus segments are difficult to grow on-demandI A whole segment must be present in the target address space
Gabriel Laskar (EPITA) CAAL 2015 109 / 378
Memory mapping Computer address space translation Pagination
4: Memory mapping
Address space
Computer address spaces
Computer address space translationSegmentationPagination
MMU patterns
Gabriel Laskar (EPITA) CAAL 2015 110 / 378
Memory mapping Computer address space translation Pagination
Memory pagesModern operating systems need more than segmentation.Basic idea is to split the source address space in pages, and map eachsource page to a target page.
addressspace
Target
addressspace
Source
0
0
Gabriel Laskar (EPITA) CAAL 2015 111 / 378
Memory mapping Computer address space translation Pagination
Pages mapping
Memory pages need to be mapped using a specific descriptor tablerecognized by the CPU. Usual attributes for a page are
I address in target address space,I type and access rights,I page size,I other attributes (cacheability, coherence, ...)
Gabriel Laskar (EPITA) CAAL 2015 112 / 378
Memory mapping Computer address space translation Pagination
Pages descriptor table
addressspace
Target
addressspace
Source
0
0
CPU
Register
−
5
3
6
10
−
−
−
4
1
Gabriel Laskar (EPITA) CAAL 2015 113 / 378
Memory mapping Computer address space translation Pagination
Pages mappingRationale
Splitting memory in pages allows more powerful memory management:I Address spaces may be mapped to uncontiguous target pages,
allowing memory fragmentation.I Many interesting operations can be performed on pages (sharing,
swapping, ...)
Gabriel Laskar (EPITA) CAAL 2015 114 / 378
Memory mapping MMU patterns
4: Memory mapping
Address space
Computer address spaces
Computer address space translation
MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()
Gabriel Laskar (EPITA) CAAL 2015 115 / 378
Memory mapping MMU patterns Memory protection
4: Memory mapping
Address space
Computer address spaces
Computer address space translation
MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()
Gabriel Laskar (EPITA) CAAL 2015 116 / 378
Memory mapping MMU patterns Memory protection
Memory protectionMotivations
Why do we need memory protection?
Gabriel Laskar (EPITA) CAAL 2015 117 / 378
Memory mapping MMU patterns Memory protection
Memory protection
In order to execute secure operating system, hardware has to providememory protection systems. This protects
I the system from the hosted processesI hosted processes from each other
It may even protect the system from its own components in aMicro-Kernel approach.
Gabriel Laskar (EPITA) CAAL 2015 119 / 378
Memory mapping MMU patterns Memory protection
Memory protectionHow?
Several memory access checks are performed by the CPU, for each access,transparently:
I address bounds validity,I privilege level,I operation type.
Gabriel Laskar (EPITA) CAAL 2015 120 / 378
Memory mapping MMU patterns Memory protection
Memory protectionWhere?
I In a segmentation-based system, priviliges are per segment.I In a pagination-based system, privileges are in page descriptor table,
with a page granularity.
x86 mixes both with segmentation on top of pagination.
Gabriel Laskar (EPITA) CAAL 2015 121 / 378
Memory mapping MMU patterns Privileges
4: Memory mapping
Address space
Computer address spaces
Computer address space translation
MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()
Gabriel Laskar (EPITA) CAAL 2015 122 / 378
Memory mapping MMU patterns Privileges
Privilege levelsPrivilege level keeps memory from being accessed by non-authorized code.Different CPUs define different privilege levels.
Datasegment/page
level 0
Datasegment/page
level 1
Codesegment/page
level 1
Allowed
memory
access
memory
Illegal
access
Instruction
Target addressspace
Gabriel Laskar (EPITA) CAAL 2015 123 / 378
Memory mapping MMU patterns Privileges
Operation typeOperation types checking keeps code from doing unwanted memoryoperations.
segment/pageData
(read)segment/page
Data
(read/write)segment/page
Code
(exec)
Target addressspace
Instruction
ok
ReadWrite
okok
Read
writeread
Illegal
writeok
Execution Illegal Illegal
Gabriel Laskar (EPITA) CAAL 2015 124 / 378
Memory mapping MMU patterns Process switching
4: Memory mapping
Address space
Computer address spaces
Computer address space translation
MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()
Gabriel Laskar (EPITA) CAAL 2015 125 / 378
Memory mapping MMU patterns Process switching
Process switching
Pagination systems are easier to use with a separate memory address spacefor each process running on a computer:
I Switching process space implies changing the page descriptor table.I Each process has its own page descriptor table ready to be used by
the CPU.I Only the CPU register pointing to this table has to be changed to
setup a new address space.
Gabriel Laskar (EPITA) CAAL 2015 126 / 378
Memory mapping MMU patterns Process switching
Process switching
Process Baddress space
Process Aaddress space
Target addressspace
Process Bpage table
Process Apage table
CPU register
Gabriel Laskar (EPITA) CAAL 2015 127 / 378
Memory mapping MMU patterns Memory sharing
4: Memory mapping
Address space
Computer address spaces
Computer address space translation
MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()
Gabriel Laskar (EPITA) CAAL 2015 128 / 378
Memory mapping MMU patterns Memory sharing
Memory sharing
The memory pagination can be used to share pages between processes.A single page can be mapped in several process address spaces to permitdifferent behaviors:
I Save physical memory by not duplicating shared code and read-onlydata memory (used for shared libraries),
I Use shared memory for inter-process communication,
Gabriel Laskar (EPITA) CAAL 2015 129 / 378
Memory mapping MMU patterns Memory sharing
Memory sharing
Process Apage table
Process Aaddress space
Process Baddress space
Target addressspace
Process Bpage table
0 1 2 3 0 1 2
0 1 2 3 4 18........
Gabriel Laskar (EPITA) CAAL 2015 130 / 378
Memory mapping MMU patterns Copy On Write
4: Memory mapping
Address space
Computer address spaces
Computer address space translation
MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()
Gabriel Laskar (EPITA) CAAL 2015 131 / 378
Memory mapping MMU patterns Copy On Write
Copy On Write
Copy On Write (COW) is a powerful trick used in many situations.
One of the most usual ones is fork() where the whole address space of aprocess has to be cloned.
Basic idea is to make the whole memory read-only and actually copy onlywhen necessary, as late as possible.
Gabriel Laskar (EPITA) CAAL 2015 132 / 378
Memory mapping MMU patterns Copy On Write
Copy On WriteStep by step
0
0
4
−
10
−
−
−
−
3
6
Gabriel Laskar (EPITA) CAAL 2015 133 / 378
Memory mapping MMU patterns Copy On Write
Copy On WriteStep by step
0
0
4
−
10
−
−
−
−
3
6
Gabriel Laskar (EPITA) CAAL 2015 133 / 378
Memory mapping MMU patterns Copy On Write
Copy On WriteStep by step
0
0
4
−
10
−
−
−
−
3
6
4
−
10
−
−
−
−
3
6
Gabriel Laskar (EPITA) CAAL 2015 133 / 378
Memory mapping MMU patterns Copy On Write
Copy On WriteStep by step
0
0 0
4
−
10
−
−
−
−
3
6
4
−
10
−
−
−
−
3
6
Gabriel Laskar (EPITA) CAAL 2015 133 / 378
Memory mapping MMU patterns Copy On Write
Copy On WriteStep by step
0
0 0
4
−
10
−
−
−
−
3
6
4
−
10
−
−
−
−
3
6
Gabriel Laskar (EPITA) CAAL 2015 133 / 378
Memory mapping MMU patterns Copy On Write
Copy On WriteStep by step
0
0 0
4
−
10
−
−
−
−
3
6
4
−
10
−
−
−
−
3
6
Gabriel Laskar (EPITA) CAAL 2015 133 / 378
Memory mapping MMU patterns Copy On Write
Copy On WriteStep by step
0
0 0
2
−
10
−
−
−
−
3
6
4
−
10
−
−
−
−
3
6
Gabriel Laskar (EPITA) CAAL 2015 133 / 378
Memory mapping MMU patterns Copy On Write
Copy On WriteStep by step
0
0 0
2
−
10
−
−
−
−
3
6
4
−
10
−
−
−
−
3
6
Gabriel Laskar (EPITA) CAAL 2015 133 / 378
Memory mapping MMU patterns Page swapping
4: Memory mapping
Address space
Computer address spaces
Computer address space translation
MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()
Gabriel Laskar (EPITA) CAAL 2015 134 / 378
Memory mapping MMU patterns Page swapping
Page swapping
Page swapping is a mechanism artificially enlarging Physical memory witha part of the hard-disk.
Gabriel Laskar (EPITA) CAAL 2015 135 / 378
Memory mapping MMU patterns Page swapping
Page swappingStep by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
6
−
5
3
10
−
−
−
4
Gabriel Laskar (EPITA) CAAL 2015 136 / 378
Memory mapping MMU patterns Page swapping
Page swappingStep by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
6
−
5
3
10
−
−
−
4
Gabriel Laskar (EPITA) CAAL 2015 136 / 378
Memory mapping MMU patterns Page swapping
Page swappingStep by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
6
−
5
3
10
−
−
−
4
Gabriel Laskar (EPITA) CAAL 2015 136 / 378
Memory mapping MMU patterns Page swapping
Page swappingStep by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
S: 3
6
−
5
3
10
−
−
−
4
Gabriel Laskar (EPITA) CAAL 2015 136 / 378
Memory mapping MMU patterns Page swapping
Page swappingStep by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
S: 3
6
−
5
3
10
−
−
−
4
Gabriel Laskar (EPITA) CAAL 2015 136 / 378
Memory mapping MMU patterns Page swapping
Page swappingStep by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
S: 3
disk
−
5
3
10
−
−
−
4
Gabriel Laskar (EPITA) CAAL 2015 136 / 378
Memory mapping MMU patterns Page swapping
Page swappingStep by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
S: 3
disk
−
5
3
10
−
−
−
4
Gabriel Laskar (EPITA) CAAL 2015 136 / 378
Memory mapping MMU patterns Page swapping
Page swappingStep by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
S: 3
disk
−
5
3
10
−
−
−
4
Gabriel Laskar (EPITA) CAAL 2015 136 / 378
Memory mapping MMU patterns Page swapping
Page swappingStep by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
S: 3
−
5
3
10
−
−
−
4
7
Gabriel Laskar (EPITA) CAAL 2015 136 / 378
Memory mapping MMU patterns Page swapping
Page swappingStep by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
S: 3
−
5
3
10
−
−
−
4
7
Gabriel Laskar (EPITA) CAAL 2015 136 / 378
Memory mapping MMU patterns mmap()
4: Memory mapping
Address space
Computer address spaces
Computer address space translation
MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()
Gabriel Laskar (EPITA) CAAL 2015 137 / 378
Memory mapping MMU patterns mmap()
mmap()
Make part of the memory exactly match the contents of a file.I Reflect changes immediatly between processes having file openedI Permit different protections on different parts of the fileI Lazily load parts of fileI Lazily write parts back
Gabriel Laskar (EPITA) CAAL 2015 138 / 378
Memory mapping MMU patterns mmap()
mmap()Step by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
−
−
−
−
−
5
−
−
3
Gabriel Laskar (EPITA) CAAL 2015 139 / 378
Memory mapping MMU patterns mmap()
mmap()Step by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
foo.bin
f:0
f:1
f:2
f:3
−
5
−
−
3
Gabriel Laskar (EPITA) CAAL 2015 139 / 378
Memory mapping MMU patterns mmap()
mmap()Step by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
foo.bin
f:0
f:1
f:2
f:3
−
5
−
−
3
Gabriel Laskar (EPITA) CAAL 2015 139 / 378
Memory mapping MMU patterns mmap()
mmap()Step by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
foo.bin
f:0
f:1
f:2
f:3
foo
.bin
:3
−
5
−
−
3
Gabriel Laskar (EPITA) CAAL 2015 139 / 378
Memory mapping MMU patterns mmap()
mmap()Step by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
foo.bin
f:0
f:1
f:2
8
−
5
−
−
3
Gabriel Laskar (EPITA) CAAL 2015 139 / 378
Memory mapping MMU patterns mmap()
mmap()Step by step
addressspace
Source
addressspace
Target
Disk
0
0
CPU
Register
1
foo.bin
f:0
f:1
f:2
8
−
5
−
−
3
Gabriel Laskar (EPITA) CAAL 2015 139 / 378
Execution flow
Part V
Execution flow
Gabriel Laskar (EPITA) CAAL 2015 140 / 378
Execution flow Branch, function calls
5: Execution flow
Branch, function callsBranch principlePipeline considerations
Function calls
Handling events
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 141 / 378
Execution flow Branch, function calls Branch principle
5: Execution flow
Branch, function callsBranch principlePipeline considerations
Function calls
Handling events
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 142 / 378
Execution flow Branch, function calls Branch principle
Branch principle
Branching is breaking the normal incremental execution flow to go executecode somewhere else.A branch has
I a destination,I optionally a condition,I optionally a “link” feature, saving return point.
Gabriel Laskar (EPITA) CAAL 2015 143 / 378
Execution flow Branch, function calls Branch principle
Branch offset
Instructions are fetched from memory to CPU by dereferencing theprogram counter (%pc register)
I in normal execution flow, the %pc is auto-incremented
%pc
80450020
8045001C
80450018
80450014
80450010
Addresses Instructions
add %g1, %g2, %g3
mov %g0, %g1
xor %g2, %g3, %g2
sub %g3, %g2, %g1
...
Next instruction
Executed instruction
I when a branch occurs, the %pc is affected in two possible ways:relative branch: a constant offset is added to the %pcabsolute branch: an absolute address is loaded into the %pc
Gabriel Laskar (EPITA) CAAL 2015 144 / 378
Execution flow Branch, function calls Branch principle
Unconditional branch
The %pc register is always modified.I Explicit jump: gotoI Explicit infinite loop, optimized by the compiler
%pc
80450120
8045011C
80450118
80450014
80450010
Addresses Instructions
b 8045011C
mov %g0, %g1
xor %g2, %g3, %g2
sub %g3, %g2, %g1
...
Next instruction
Executed instruction
Gabriel Laskar (EPITA) CAAL 2015 145 / 378
Execution flow Branch, function calls Branch principle
Unconditional branchC listing
#include <stdio.h>
void main(void){
do {puts("hello");
} while (1);}
Gabriel Laskar (EPITA) CAAL 2015 146 / 378
Execution flow Branch, function calls Branch principle
Unconditional branchControl flow graph
return
main
puts("hello");
Gabriel Laskar (EPITA) CAAL 2015 147 / 378
Execution flow Branch, function calls Branch principle
Conditional branch
The %pc register is modified only if the condition is verifiedI if statementsI loop (for, while) statements
80450010
Addresses
80450014
80450118
8045011C
bz 8045011C
mov %g0, %g1
xor %g2, %g3, %g2
sub %g3, %g2, %g1
Instructions
Executed instruction
Potential next instruction
80450120 ...
%pc
Potential next instruction
80450018 ...
Gabriel Laskar (EPITA) CAAL 2015 148 / 378
Execution flow Branch, function calls Branch principle
Conditional branchC listing
#include <stdio.h>
void main(void){
int i = 10;
do {printf("%i\n", --i);
} while (i != 0);}
Gabriel Laskar (EPITA) CAAL 2015 149 / 378
Execution flow Branch, function calls Branch principle
Conditional branchControl flow graph
yes
no
main
i = 10
printf("%i\n");
i != 0
−−i
returnGabriel Laskar (EPITA) CAAL 2015 150 / 378
Execution flow Branch, function calls Branch principle
Conditions
The decision to take the branch is based on register contents:I conditional branch may occur if a specific bit is set in a registerI conditional branch may occur if a specific bit is clear in a registerI conditional branch may occur if a register equals a specific value
(usually zero)
Gabriel Laskar (EPITA) CAAL 2015 151 / 378
Execution flow Branch, function calls Branch principle
Complete examples
1. strlen2. pgcd
Gabriel Laskar (EPITA) CAAL 2015 152 / 378
Execution flow Branch, function calls Pipeline considerations
5: Execution flow
Branch, function callsBranch principlePipeline considerations
Function calls
Handling events
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 153 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerations
A branch may break the pipeline, slowing the execution1. when things goes wrong, a bubble is created2. sometimes, the processor succeeds in predicting the branch target3. when a misprediction occurs, a bubble is created
Gabriel Laskar (EPITA) CAAL 2015 154 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsBubble
When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Fetch?
Gabriel Laskar (EPITA) CAAL 2015 155 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsBubble
When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Decod
Fetch?
Fetch
Gabriel Laskar (EPITA) CAAL 2015 155 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsBubble
When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Fetch
Decod
Exec
?
Decod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 155 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsBubble
When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
?
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 155 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsBubble
When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:
jmp
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
Write
?
!
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 155 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsBubble
When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:
−−
jmp
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
−−
Fetch
Mem
Write
?
Exec
Decod
Fetch
Mem
Write
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 155 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsBubble
When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:
add
−−
jmp
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Fetch
Decod
−−
Write
Mem
?
Exec
−−
Fetch
Mem
Write
Exec
Decod
Fetch
Mem
Write
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 155 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsDelay slot
A delay slot is a processor feature. It’s a convention saying the instructionfollowing branches is always executed.Branch is delayed.
Exec
Decod
Fetch
Mem
Write
?
jmp
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 156 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsDelay slot
A delay slot is a processor feature. It’s a convention saying the instructionfollowing branches is always executed.Branch is delayed.
Exec
Decod
Fetch
Mem
Write
?
jmp
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
jmp is executedinstruction following
execution flowis modified one cycle later
Gabriel Laskar (EPITA) CAAL 2015 156 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsDelay slot
A delay slot is a processor feature. It’s a convention saying the instructionfollowing branches is always executed.Branch is delayed.
sub
Exec
Decod
Fetch
Mem
Write
?
Exec
Decod
Fetch
Mem
Write
jmp
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
jmp is executedinstruction following
execution flowis modified one cycle later
Gabriel Laskar (EPITA) CAAL 2015 156 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsDelay slot
A delay slot is a processor feature. It’s a convention saying the instructionfollowing branches is always executed.Branch is delayed.
sub
Fetch
Decod
Exec
Write
Mem
?
Exec
Decod
Fetch
Mem
Write
Exec
Decod
Fetch
Mem
Write
add
jmp
add
not
sub
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
jmp is executedinstruction following
execution flowis modified one cycle later
Gabriel Laskar (EPITA) CAAL 2015 156 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsLoads (digression)
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Fetch?
Gabriel Laskar (EPITA) CAAL 2015 157 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsLoads (digression)
sub g1, g2, g3
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Decod
Fetch?
Fetch
Gabriel Laskar (EPITA) CAAL 2015 157 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsLoads (digression)
ld g5, [g2]
sub g1, g2, g3
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Fetch
Decod
Exec
?
Decod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 157 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsLoads (digression)
add g6, 0x2a, g5
ld g5, [g2]
sub g1, g2, g3
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
!
?
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 157 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsLoads (digression)
add g6, 0x2a, g5
ld g5, [g2]
sub g1, g2, g3
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Decod
Fetch
Mem
Write
Exec!
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 157 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsLoads (digression)
add g6, 0x2a, g5
ld g5, [g2]
sub g1, g2, g3
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Decod
Fetch
Mem
Write
Exec!
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 157 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsLoads (digression)
add g6, 0x2a, g5
ld g5, [g2]
sub g1, g2, g3
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Decod
Fetch
Mem
Write
Exec!
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
add g6, 0x2a, g5
Gabriel Laskar (EPITA) CAAL 2015 157 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsLoads (digression)
ld i3, [g6 + 0x8]
add g6, 0x2a, g5
ld g5, [g2]
sub g1, g2, g3
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
−−
Write
?
(Exec)
(Decod)
(Fetch)
Mem
Write
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
add g6, 0x2a, g5
Gabriel Laskar (EPITA) CAAL 2015 157 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsLoads (digression)
ld i3, [g6 + 0x8]
add g6, 0x2a, g5
ld g5, [g2]
sub g1, g2, g3
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
−−
Write
?
(Exec)
(Decod)
(Fetch)
Mem
Write
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
add g6, 0x2a, g5
Gabriel Laskar (EPITA) CAAL 2015 157 / 378
Execution flow Branch, function calls Pipeline considerations
Pipeline considerationsLoads (digression)
xor i4, 0x2a, i3
ld i3, [g6 + 0x8]
add g6, 0x2a, g5
ld g5, [g2]
sub g1, g2, g3
Pipeline timeline
Inst
ruct
ion
seq
uen
ce
Exec
Decod
Fetch
Mem
−−
?
Exec
Decod
Fetch
−−
Write
(Exec)
(Decod)
(Fetch)
Mem
Write
Exec
Decod
Fetch
Mem
Fetch
Decod
ExecDecod
Fetch
Fetch
Gabriel Laskar (EPITA) CAAL 2015 157 / 378
Execution flow Function calls
5: Execution flow
Branch, function calls
Function callsPrinciplesArgument passingCall conventions
Handling events
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 158 / 378
Execution flow Function calls Principles
5: Execution flow
Branch, function calls
Function callsPrinciplesArgument passingCall conventions
Handling events
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 159 / 378
Execution flow Function calls Principles
Function calls principleA function is a piece of code that returns a value depending on theparameters the user specifies as inputs, if any.
8004500
8004504
8004508
800450B
8002000
8002004
8002008
Addresses AddressesInstructions Instructions
Gabriel Laskar (EPITA) CAAL 2015 160 / 378
Execution flow Function calls Principles
“call” instruction
I Saves next instruction address (the return path),I Jumps to function
80002000
80002004
80450018
80450014
8045001080450018
%i7
Return address
%pc add %g1, %g2, %g3
...
xor %g2, %g1, %g3
mov %g0, %g1
call 80002000
Addresses Instructions
Delay slot instruction
Executed instruction
Next instruction
Gabriel Laskar (EPITA) CAAL 2015 161 / 378
Execution flow Function calls Principles
“ret” instruction
I Stores the result in a dedicated registerI Restores the previously-saved PC address
Return address
%i7
80450018
%pc 80450018
80450014
80450010
80002100
800020FC
800020F8
Addresses Instructions
ret
not %g1, %g2
...
Executed instruction
Delay slot instruction
call 80002000
mov %g0, %g1
xor %g2, %g1, %g3 Next instruction
Gabriel Laskar (EPITA) CAAL 2015 162 / 378
Execution flow Function calls Argument passing
5: Execution flow
Branch, function calls
Function callsPrinciplesArgument passingCall conventions
Handling events
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 163 / 378
Execution flow Function calls Argument passing
How to pass arguments and return values?
We need a space of shared data between the caller and the callee.I From caller to callee, for argumentsI From callee to caller, for return values
Some arguments are purely for execution purposes:I context pointers (stack, globals, ...)I return address
Gabriel Laskar (EPITA) CAAL 2015 164 / 378
Execution flow Function calls Argument passing
Simplest case
I Non-nested function callI Less arguments than available machine registers
Gabriel Laskar (EPITA) CAAL 2015 165 / 378
Execution flow Function calls Argument passing
Through registers
On most RISC architectures, there are registers dedicated to argumentstorage:
I Each argument is stored and preserved directly in a registerI Local variables may be held in another set of registers
Example: on SPARC architecture, the CPU has:I 8 registers (%g0, ... %g7) dedicated to global variablesI 8 registers (%i0, ... %i7) dedicated to input argumentsI 8 registers (%l0, ... %l7) dedicated to local variablesI 8 registers (%o0, ... %o7) dedicated to output arguments
A callee’s “input” registers are shared with caller’s “output” registers.
Gabriel Laskar (EPITA) CAAL 2015 166 / 378
Execution flow Function calls Argument passing
Through global memory
Principle:I Each argument is stored and preserved directly in memoryI Each local variable is held in memory
Gabriel Laskar (EPITA) CAAL 2015 167 / 378
Execution flow Function calls Call conventions
5: Execution flow
Branch, function calls
Function callsPrinciplesArgument passingCall conventions
Handling events
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 168 / 378
Execution flow Function calls Call conventions
Call conventions
I How to deal with huge amount of variables and arguments?I How to deal with recursive calls and nested function calls?I How to deal with variables bigger than registers?I How to deal with “...” (like printf)?I How to deal with dedicated registers (floats)?
Gabriel Laskar (EPITA) CAAL 2015 169 / 378
Execution flow Function calls Call conventions
Problem
Consider a recursive function that needs local variables:I Each time the function is called, a new context must be allocated to
preserve each local variableI Each time the function returns, the previous context must be restoredI The function may call itself an unpredictable number of timesI These local variables cannot be reserved in a static memory space (as
global variables are).
Gabriel Laskar (EPITA) CAAL 2015 170 / 378
Execution flow Function calls Call conventions
Abstract context stack
Local
context
Local
context
Local
context
calls
Function
returns
Function
void foo(int a)
{
return foo(a + 1);
}
Gabriel Laskar (EPITA) CAAL 2015 171 / 378
Execution flow Function calls Call conventions
Register window
Hardware context stack implementation:I Uses a large amount of registers (example: 512 on Sparc)I Uses a logical limitation to define multiple contextsI Does context change by sliding a window on each function call
Gabriel Laskar (EPITA) CAAL 2015 172 / 378
Execution flow Function calls Call conventions
Principle
%i0 − %i7
%l0 − %l7
%o0 − %o7
Sliding register windowHardware registers
%R95
%R96
%R98
%R97
%R99
%R100
%R101
%R102
%R115
%R116
%R117
%R118
%R119
%R120
Previous
Current
Next
Abstract stack layout
Local
context
Local
context
Local
context
calls
Function
returns
Function
Gabriel Laskar (EPITA) CAAL 2015 173 / 378
Execution flow Function calls Call conventions
Sparc overlapping register window
registerwindow
Input registers
%i0 ... %i7
Local registers
%l0 ... %l7
Output registers
%o0 ... %o7
registerwindow
Calling fonctions context
Called fonctions context
Registeroverlap
Function call
%l0 ... %l7
Local registers
Input registers
%i0 ... %i7
Output registers
%o0 ... %o7
Gabriel Laskar (EPITA) CAAL 2015 174 / 378
Execution flow Function calls Call conventions
Function prologue and epilogue
prologue: slide register windowExample:
save %sp, -96, %sp
epilogue: slide back register window (i.e. restore previous context).Example:
restore
Gabriel Laskar (EPITA) CAAL 2015 175 / 378
Execution flow Function calls Call conventions
Memory stack
Register window has hard limitations:I The call depth is limited to the total amount of registersI The CPU needs a large amount of physical registers ⇒ expensive
On most systems, memory is used to implement a cheaper stack.
Gabriel Laskar (EPITA) CAAL 2015 176 / 378
Execution flow Function calls Call conventions
Principle
Stack
memory
space
frame
stack
Currentvariables
Local
arguments
Called function
Return address
argumentsInput
Frame
pointer
pointer
Stack
Sliding stack frame Process memory
Previous
Current
Next
Abstract stack layout
Local
context
Local
context
Local
context
calls
Function
returns
Function
Gabriel Laskar (EPITA) CAAL 2015 177 / 378
Execution flow Function calls Call conventions
Nested function calls
Local variables
Stack memory space
Called functionarguments
Return address
Inputarguments
Return address
Local variables
Called functionarguments
Inputarguments
Return address
Local variables
Currentfunctioncontext
Stackpointer
Framepointer
Functioncalls
Gabriel Laskar (EPITA) CAAL 2015 178 / 378
Execution flow Function calls Call conventions
Function prologue and epilogue
prologue: saves previous frame pointer, set new frame pointer, reservespace on memory stack for local variablesExample: a function that needs three 32 bits local variables(12 bytes) on its stack:
[%sp] <- %fp%fp <- %sp%sp <- %sp - 12
epilogue: restores previous frame and stack pointers (i.e. restoreprevious context).Example:
%sp <- %fp%fp <- [%fp]
Gabriel Laskar (EPITA) CAAL 2015 179 / 378
Execution flow Function calls Call conventions
Argument and local variable access
argument:Dereference the address “frame pointer + argument offset”:
ld [%fp + 16], %g1; Access to arg_a
local variable:Dereference the address “frame pointer - local variableoffset”
ld [%fp - 4], %g1; Access to var_b
Gabriel Laskar (EPITA) CAAL 2015 180 / 378
Execution flow Function calls Call conventions
Argument and local variable accessschema
previous fp
ret address
localvariables
functionarguments
Stackpointer
Framepointer
void foo(int arg_a,
int arg_b)
{
int var_a, var_, var_c;
...
}
int var_c
int var_b
int var_a
previous fp
int arg_b
int arg_a
ret address
Gabriel Laskar (EPITA) CAAL 2015 181 / 378
Execution flow Handling events
5: Execution flow
Branch, function calls
Function calls
Handling eventsEventsSystem callsFaultsHardware interruptions
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 182 / 378
Execution flow Handling events Events
5: Execution flow
Branch, function calls
Function calls
Handling eventsEventsSystem callsFaultsHardware interruptions
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 183 / 378
Execution flow Handling events Events
Events
What are events?
How to handle them?
Gabriel Laskar (EPITA) CAAL 2015 184 / 378
Execution flow Handling events Events
Categorizing events
An interrupt is caused by an external event.An exception is caused by instruction execution.
Unplanned DeliberateSynchronous fault syscall trap,
software interruptAsynchronous hardware
interruption
→ An incoming event must be executed with a high priority.
Gabriel Laskar (EPITA) CAAL 2015 185 / 378
Execution flow Handling events Events
User space / kernel spaceA trap is a critical event that must be handled by the kernel in a safememory space, the kernel space.
Kernel mode
Usr.stack
User data
User code
stack
Kernel
code
Kernel
Kernel
spacememory
Processmemoryspace
User mode
Invalid
User code
User data
Usr.stack
Gabriel Laskar (EPITA) CAAL 2015 186 / 378
Execution flow Handling events Events
Trap handler tableThe processor jumps to a trap routine defined by the operating system.The trap routine address is stored in a dedicated descriptor table.
Interruptiontable
addressregister
Kernel mode
Usr.stack
User data
User code
code
Kernel
data
Kernel
trap #4
Interruption table
Gabriel Laskar (EPITA) CAAL 2015 187 / 378
Execution flow Handling events System calls
5: Execution flow
Branch, function calls
Function calls
Handling eventsEventsSystem callsFaultsHardware interruptions
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 188 / 378
Execution flow Handling events System calls
System calls
The system call mechanism can be used by a process to request servicesfrom the operating system:
I executing a process, exitingI reading input, writing output (read, write...)I performing restricted actions such as accessing hardware devices or
accessing the memory management unit.I etc, ...
Gabriel Laskar (EPITA) CAAL 2015 189 / 378
Execution flow Handling events System calls
Processor modes
Hardware facilities have been implemented to ensure process isolation inmulti-user/multi-processor environment.Today, processors have two (or more) execution modes:user mode
dedicated to user applicationssupervisor mode
reserved for operating system kernel
Gabriel Laskar (EPITA) CAAL 2015 190 / 378
Execution flow Handling events System calls
Execution permissions
For security sake, some operations are restricted to kernel space:I peripheral input and outputI low-level memory management
System calls allow user to switch to kernel space and perform restrictedactions, under certain conditions.The kernel must check system call parameters validity.
Gabriel Laskar (EPITA) CAAL 2015 191 / 378
Execution flow Handling events System calls
System call implementation
1. System calls use a dedicated instruction which causes the processor tochange mode to superuser (or protected) mode.
2. Each system call is indexed by a single number: the syscall traphandler makes an indirect call through the system call dispatch tableto the handler for the specific system call.
Gabriel Laskar (EPITA) CAAL 2015 192 / 378
Execution flow Handling events System calls
Execution pathSystem calls run in kernel mode on the kernel memory space.
systemreturn
contextswitch
contextswitch
Kernel
code
User
codesystemcall
User
code
User mode Kernel mode
Gabriel Laskar (EPITA) CAAL 2015 193 / 378
Execution flow Handling events System calls
libc example
Standart C library’s read, write, pipe, ... functions are wrappers tocorresponding system calls:
_SYSENTRY(pipe)mov %o0, %o2mov SYS_pipe, %g1ta %xcc, ST_SYSCALLbcc,a,pt %xcc, 1fstw %o0, [%o2]ERROR()1: stw %o1, [%o2 + 4]retlclr %o0_SYSEND(pipe)
Gabriel Laskar (EPITA) CAAL 2015 194 / 378
Execution flow Handling events Faults
5: Execution flow
Branch, function calls
Function calls
Handling eventsEventsSystem callsFaultsHardware interruptions
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 195 / 378
Execution flow Handling events Faults
Faults
How to handle divide by zero?How to handle overflow?
How to handle ...
Gabriel Laskar (EPITA) CAAL 2015 196 / 378
Execution flow Handling events Faults
Similarities with system calls
Faults are similar to system calls in some respects:I Faults occur as a result of a process executing an instructionI The kernel exception handler may return to the faulty user context
But faults are different in other respects:I Syscalls are deliberate, faults are unexpectedI Not every execution of the instruction results in a fault
Gabriel Laskar (EPITA) CAAL 2015 197 / 378
Execution flow Handling events Faults
Handling a fault
Different actions may be taken by the operating system in response tofaults:
I kill the user processI notify the process that a fault occurred (so it may recover in its own
way)I solve the problem and resume the process transparently
Gabriel Laskar (EPITA) CAAL 2015 198 / 378
Execution flow Handling events Faults
Execution path
Instructionfault
contextsave
processexit
User
code
contextrestore
User
code
Kernel
code
User mode Kernel mode
Gabriel Laskar (EPITA) CAAL 2015 199 / 378
Execution flow Handling events Faults
Events interface
Unix systems can notify a user program of a fault with a signal.Signals are also used for other forms of asynchronous event notifications.
Gabriel Laskar (EPITA) CAAL 2015 200 / 378
Execution flow Handling events Hardware interruptions
5: Execution flow
Branch, function calls
Function calls
Handling eventsEventsSystem callsFaultsHardware interruptions
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 201 / 378
Execution flow Handling events Hardware interruptions
Hardware interruptions
How to handle devices interruptions?
Gabriel Laskar (EPITA) CAAL 2015 202 / 378
Execution flow Handling events Hardware interruptions
Execution path
contextload
contextsave
User
code
Kernel
code
User
code
interruptionreturn
hardwareinterruption
User mode Kernel mode
Gabriel Laskar (EPITA) CAAL 2015 203 / 378
Execution flow Multitasking
5: Execution flow
Branch, function calls
Function calls
Handling events
Multitasking
Gabriel Laskar (EPITA) CAAL 2015 204 / 378
Execution flow Multitasking
Multitasking
A single process may not use system resources at full capacity.The idea of multitasking is to simulate the execution of concurrentexecution of many processes, using a single processor.The operating system has to:
I create and delete processes,I organize processes in memory,I schedule processes for CPU use.
Gabriel Laskar (EPITA) CAAL 2015 205 / 378
Execution flow Multitasking
Memory mapping
Pysical memory space
B process addressingspace
A process addressingspace
A processmemory map
B processmemory map
Page tableaddress register
Gabriel Laskar (EPITA) CAAL 2015 206 / 378
Execution flow Multitasking
Context switching
B
process
registers
A
process
registers
Real
CPU
registers
Memory
contexts
backup
Reg 1
Reg 2
FlagReg
PageReg
Gabriel Laskar (EPITA) CAAL 2015 207 / 378
Execution flow Multitasking
Process life
Kernel
Process A Process B Process C
context savecontext load
Timer
interruption
This approach requires a way of handling events.Gabriel Laskar (EPITA) CAAL 2015 208 / 378
Object file formats
Part VI
Object file formats
Gabriel Laskar (EPITA) CAAL 2015 209 / 378
Object file formats Build process
6: Object file formats
Build processOverviewDevelopment toolsAnalysis tools
Binary formats
Executable code loading
Gabriel Laskar (EPITA) CAAL 2015 210 / 378
Object file formats Build process Overview
6: Object file formats
Build processOverviewDevelopment toolsAnalysis tools
Binary formats
Executable code loading
Gabriel Laskar (EPITA) CAAL 2015 211 / 378
Object file formats Build process Overview
The big picture
ld
C sourcecode
pure Ccode
Assemblycode
Objectfile
ascc1cpp
.o
.o
.o.s.i.c
.h
a.out
Header files
Gabriel Laskar (EPITA) CAAL 2015 212 / 378
Object file formats Build process Overview
Preprocessor
cpp as ld
C source
code
Pure C
code
Assembly
code file
Object
cc1
Executable
Header
files
.i .s .o.c
I Also called macro-processorI Transforms a text (source) file into another text (source) file:
I merges many files into one (#include)I replaces macros by their definitions (#define)I removes code considering conditions (#if*)
Example: cppI Input: C source file with directivesI Output: “pure” C source file
Gabriel Laskar (EPITA) CAAL 2015 213 / 378
Object file formats Build process Overview
C compiler
cpp as ld
C source
code
Pure C
code
Assembly
code file
Object
cc1
Executable
Header
files
.i .s .o.c
I Scans and parses the source fileI Analyzes the source (type checking)I Generates assembly code (another source file) for the target
architecture
Gabriel Laskar (EPITA) CAAL 2015 214 / 378
Object file formats Build process Overview
Assembler
cpp as ld
C source
code
Pure C
code
Assembly
code file
Object
cc1
Executable
Header
files
.i .s .o.c
I Translates instructions in binary opcode sequence for the targetarchitecture
I Collects symbols and resolve label addressesI Writes an object file
The assembler source file will be different depending on architecture!
Gabriel Laskar (EPITA) CAAL 2015 215 / 378
Object file formats Build process Overview
Linker
cpp as ld
C source
code
Pure C
code
Assembly
code file
Object
cc1
Executable
Header
files
.i .s .o.c
I Merges (many) object filesI Resolves external symbolsI Computes addressesI Writes an executable file
Gabriel Laskar (EPITA) CAAL 2015 216 / 378
Object file formats Build process Development tools
6: Object file formats
Build processOverviewDevelopment toolsAnalysis tools
Binary formats
Executable code loading
Gabriel Laskar (EPITA) CAAL 2015 217 / 378
Object file formats Build process Development tools
Development tools
I SPARC assembler: use aasmI Project at http://savannah.nongnu.org/projects/aasm
I Sparc, Mips, PPC, ARM, ... architecture emulatorsI SoCLib: https://www.soclib.frI QEmu
Gabriel Laskar (EPITA) CAAL 2015 218 / 378
Object file formats Build process Analysis tools
6: Object file formats
Build processOverviewDevelopment toolsAnalysis tools
Binary formats
Executable code loading
Gabriel Laskar (EPITA) CAAL 2015 219 / 378
Object file formats Build process Analysis tools
objdump
I Displays object (executable) file tables, on host architectureI Disassembles object (executable) file code-sectionsI Useful options:
I -h: display the section headersI -t: display the symbol tablesI -dx: disassemble
Gabriel Laskar (EPITA) CAAL 2015 220 / 378
Object file formats Build process Analysis tools
readelf
I Displays ELF object (executable) file tables, on any architecture
Gabriel Laskar (EPITA) CAAL 2015 221 / 378
Object file formats Binary formats
6: Object file formats
Build process
Binary formatsSimple binary formatsHow about code reuse and splittingLinkingFile formats history
Executable code loading
Gabriel Laskar (EPITA) CAAL 2015 222 / 378
Object file formats Binary formats Simple binary formats
6: Object file formats
Build process
Binary formatsSimple binary formatsHow about code reuse and splittingLinkingFile formats history
Executable code loading
Gabriel Laskar (EPITA) CAAL 2015 223 / 378
Object file formats Binary formats Simple binary formats
Simple binary formats
Consider an object program that does not need other object programs tobuild a binary program:
I the assembler collects code markers (labels)I the assembler resolves all labels and replaces all of them in the code
Gabriel Laskar (EPITA) CAAL 2015 224 / 378
Object file formats Binary formats Simple binary formats
Flat program
The labels are immediately replaced and written by the assembler in thebinary file:
00001200
...
90 nop
00001201 ...
my_function:
00000001
OpcodesAddress
00000006
00000000 90 nop
B8 78 56 34 12 mov eax, 0x12345678
E8 00 12 00 00 call my_function
Source code
Gabriel Laskar (EPITA) CAAL 2015 225 / 378
Object file formats Binary formats Simple binary formats
Flat binary format
In raw binary format, the whole program is written directly in file.Useful at boot time: the processor can only read raw code and there is nobinary loader.
Gabriel Laskar (EPITA) CAAL 2015 226 / 378
Object file formats Binary formats How about code reuse and splitting
6: Object file formats
Build process
Binary formatsSimple binary formatsHow about code reuse and splittingLinkingFile formats history
Executable code loading
Gabriel Laskar (EPITA) CAAL 2015 227 / 378
Object file formats Binary formats How about code reuse and splitting
Challenges
What happens when a function must be imported?
What happens when a function must be exported?
Gabriel Laskar (EPITA) CAAL 2015 228 / 378
Object file formats Binary formats How about code reuse and splitting
Complex program
When the symbol of a called function is unresolved, the assembler leavesthe call destination address undefined:
00000006 E8 00 12 00 00
External
reference
call printf
00001200
...
90 nop
00001201 ...
my_function:
00000001
OpcodesAddress
00000006
00000000 90 nop
B8 78 56 34 12 mov eax, 0x12345678
E8 00 12 00 00 call my_function
Source code
⇒ The binary format must hold information on created “holes”
Gabriel Laskar (EPITA) CAAL 2015 229 / 378
Object file formats Binary formats How about code reuse and splitting
Needs
To fill the holes left by the assembler later on, the binary file must hold:I A symbol table, which stores each label identifier,I A relocation table, which shows all the remaining “holes”I Labels associated to each “hole”.
More generaly, a binary file format must also hold:I A header containing general informations needed to access various
parts of the file,I Several sections holding code and data (raw data).
Gabriel Laskar (EPITA) CAAL 2015 230 / 378
Object file formats Binary formats How about code reuse and splitting
Abstraction of a binary file format
string table
symbol table
section table
file header
.bss
reloc. table
.data
.rodata
.text
.data
.rodata
.text
(not in file)
object file sectionsobject file headers
(Information regarding sections, symbol table etc. are usually identifiedwithin the file header)
Gabriel Laskar (EPITA) CAAL 2015 231 / 378
Object file formats Binary formats How about code reuse and splitting
The symbol table
The symbol table associates each symbol identifier to the code (or data)address it represents (when the address has been resolved):
symbol table
"printf"
"my_func"
"fopen"my_func:
...
0106
0105
0100
0000
0005
E8 ?? ?? ?? ??
90
E8 ?? ?? ?? ??
E8 00 01 00 00
E8 ?? ?? ?? ??
call printf
nop
call fopen
call my_func
call printf
Gabriel Laskar (EPITA) CAAL 2015 232 / 378
Object file formats Binary formats How about code reuse and splitting
The relocation table
The relocation table associates each “hole” address to a label identifieraddress:
relocation table
printf
0006
0107
printf
0101
fopen
symbol table
"printf"
"my_func"
"fopen"my_func:
...
0106
0105
0100
0000
0005
E8 ?? ?? ?? ??
90
E8 ?? ?? ?? ??
E8 00 01 00 00
E8 ?? ?? ?? ??
call printf
nop
call fopen
call my_func
call printf
Gabriel Laskar (EPITA) CAAL 2015 233 / 378
Object file formats Binary formats How about code reuse and splitting
BSS regions
Instead of keeping a bunch of empty bytes in the binary file for big staticvariables (arrays), the header can specify a whole region which must befilled with zero.BSS regions makes the file smaller and faster to load.
Gabriel Laskar (EPITA) CAAL 2015 234 / 378
Object file formats Binary formats Linking
6: Object file formats
Build process
Binary formatsSimple binary formatsHow about code reuse and splittingLinkingFile formats history
Executable code loading
Gabriel Laskar (EPITA) CAAL 2015 235 / 378
Object file formats Binary formats Linking
Linking
The operation that combines a list of object programs into a binaryprogram is called linking. The tasks that must be accomplished by thelinker are:1. Named sections from different object programs must be merged
together into one named section.2. Merged sections must be put together into the sections of the
memory model.3. Each use of a name in an object program’s references list must be
replaced by an address in the virtual address space.
Gabriel Laskar (EPITA) CAAL 2015 236 / 378
Object file formats Binary formats Linking
Merging object files
Combining each .text section together, each .data section together, etc.:
.data
.rodata
.text
.data
.rodata
.text
.data
.rodata
.text
Gabriel Laskar (EPITA) CAAL 2015 237 / 378
Object file formats Binary formats Linking
Zoom on .text section merging
Symbol = S Symbol = B + S
A object
section
B object
section
merged object
section
C
B
A
0x0 0x0 0x0
.text
.text
.text
Gabriel Laskar (EPITA) CAAL 2015 238 / 378
Object file formats Binary formats Linking
Symbol resolution
.text / .data
relocations symbols.data.textfile header
a.out format
raw binary format
Gabriel Laskar (EPITA) CAAL 2015 239 / 378
Object file formats Binary formats Linking
Executable binary files
When there is neither unresolved symbols nor relocations anymore, the fileis executable.Executable files usually have a fixed constant load address on a givensystem.Unresolved/unused symbols may exist in an executable file if no relocationuses it.The strip command wipes out unused symbols from the symbol table.
Gabriel Laskar (EPITA) CAAL 2015 240 / 378
Object file formats Binary formats File formats history
6: Object file formats
Build process
Binary formatsSimple binary formatsHow about code reuse and splittingLinkingFile formats history
Executable code loading
Gabriel Laskar (EPITA) CAAL 2015 241 / 378
Object file formats Binary formats File formats history
a.out: Assembler Output
relocations
raw binary format
a.out format
.text / .data
.text .data symbolsfile header
I Very simpleI .. but pretty fast to load
Gabriel Laskar (EPITA) CAAL 2015 242 / 378
Object file formats Binary formats File formats history
COFF (Common Object File Format) and ELF (Executableand Linking Format)
.data.text relocs
.data.textsymbolsrelocs
syms
file sectiontableheader
file sectiontableheader
ELF format
COFF format
I Permits use of dynamic libraries
Gabriel Laskar (EPITA) CAAL 2015 243 / 378
Object file formats Binary formats File formats history
Mach-O (Darwin)
rodata datatextload
commands
file
header
Gabriel Laskar (EPITA) CAAL 2015 244 / 378
Object file formats Binary formats File formats history
Mach-O (Darwin)“Fat binaries”
rodata datatextload
commands
file
header
rodata datatextload
commands
file
header
Fat arch table
Fat header
Mach−O fat binaries
Gabriel Laskar (EPITA) CAAL 2015 245 / 378
Object file formats Executable code loading
6: Object file formats
Build process
Binary formats
Executable code loadingBinary file loadingLoading processDynamic libraries
Gabriel Laskar (EPITA) CAAL 2015 246 / 378
Object file formats Executable code loading Binary file loading
6: Object file formats
Build process
Binary formats
Executable code loadingBinary file loadingLoading processDynamic libraries
Gabriel Laskar (EPITA) CAAL 2015 247 / 378
Object file formats Executable code loading Binary file loading
Binary file loading
Executable file format is like object file format, with the followingrestrictions:
I It has no relocations,I All addresses are resolvedI It is ready to load in memory
Gabriel Laskar (EPITA) CAAL 2015 248 / 378
Object file formats Executable code loading Binary file loading
Executable format
string table
symbol table
section table
file header
.bss
reloc. table
.data
.rodata
.text
.data
.rodata
.text
(not in file)
object file sectionsobject file headersGabriel Laskar (EPITA) CAAL 2015 249 / 378
Object file formats Executable code loading Binary file loading
Executable loading
Process memory structure is mapped to internal executable file format:I Loadable sections are loaded from file to memoryI Uninitialised data (.bss section) is allocated in process memoryI Internal file sections are ignoredI Heap and stack are allocated
Gabriel Laskar (EPITA) CAAL 2015 250 / 378
Object file formats Executable code loading Loading process
6: Object file formats
Build process
Binary formats
Executable code loadingBinary file loadingLoading processDynamic libraries
Gabriel Laskar (EPITA) CAAL 2015 251 / 378
Object file formats Executable code loading Loading process
Executable memory mapping
.data
.rodata
.text
memoryinitialized
noninitializedmemory
stack
heap
.bss
memory sections
process memory
string table
symbol table
section table
file header
.bss
reloc. table
.data
.rodata
.text
.data
.rodata
.text
(not in file)
object file sectionsobject file headers
Gabriel Laskar (EPITA) CAAL 2015 252 / 378
Object file formats Executable code loading Loading process
Memory attributes
Memory attributes depends on section and area type:I .text section may be loaded in executable only region (depending on
OS)I memory used to store .rodata section will be marked as read-onlyI memory used to store .data, .bss sections, heap and stack will be
marked as read/write
Gabriel Laskar (EPITA) CAAL 2015 253 / 378
Object file formats Executable code loading Loading process
Memory attributes
rw−
r−−
−−x
initializedmemory
noninitializedmemory
memory sections
from file
loaded
data
file datainternal
executable file
object file sections
executable
pages
read/write
virtual memory rights
process memory
.text
.rodata
.data
.text
.rodata
.data
.bss
heap
stack
not in file
sym. table
string table
pages
read only
pages
Gabriel Laskar (EPITA) CAAL 2015 254 / 378
Object file formats Executable code loading Dynamic libraries
6: Object file formats
Build process
Binary formats
Executable code loadingBinary file loadingLoading processDynamic libraries
Gabriel Laskar (EPITA) CAAL 2015 255 / 378
Object file formats Executable code loading Dynamic libraries
Dynamic libraries
Use of dynamic libraries implies:I different and more complex memory mappingI position independent codeI different way to handle relocationsI plenty of cool complicated stuff
Gabriel Laskar (EPITA) CAAL 2015 256 / 378
Object file formats Executable code loading Dynamic libraries
Dynamic libs in memory
Libraries have specific memory mapping and organisation:I A dynamic library is loaded only once even if several processes use itI Extra .data and .text sections are mapped in process memory-spaceI Library .text section is mapped in several process memoryI Library .data section is copied in each process memory
Gabriel Laskar (EPITA) CAAL 2015 257 / 378
Object file formats Executable code loading Dynamic libraries
Dynamic process memory layout
stack
heap heap
stack
copy
library A
.text
.data
.data
.rodata
.text .text
.rodata
.data
.data
.text
.text
.data
.data
.text
.text
.text
.data
.text
library B
executable A process A process B executable B
copy
Gabriel Laskar (EPITA) CAAL 2015 258 / 378
Object file formats Executable code loading Dynamic libraries
Handling code location change
Dynamic libraries may not be mapped with the same virtual address in allprocesses:
I Address may already be in use by an other libraryI The same code must be able to run with different base addresses
Position independent code solves these problems without going throughthe complex relocation process.Relative jumps and relative data memory accesses need to be handled bythe CPU.
Gabriel Laskar (EPITA) CAAL 2015 259 / 378
Object file formats Executable code loading Dynamic libraries
Position independent code
positionindependent
code
relativebranch0112 mov ...
0100 jmp +12
heap
stack
process
.text
.data
.rodata
.data
.rodata
.text
executable
.text
library
Gabriel Laskar (EPITA) CAAL 2015 260 / 378
Object file formats Executable code loading Dynamic libraries
Position dependent code
Some location change are more complicated with dynamic libraries:I .data can’t be referred with relative addressing from .text section if
no relative data access is available.I .data section won’t have the same address in all process memory
maps.I .text section is common to all processes and can’t reflect .data
location differences.The solution is to use indirect memory access to reach data objects fromcode. One GOT (Global Offset Table) and PLT (Procedure Linkage Table)will hold data object addresses for each process.Some dynamic relocations and data copy will also help to solve thisproblem.
Gabriel Laskar (EPITA) CAAL 2015 261 / 378
Object file formats Executable code loading Dynamic libraries
Use of GOT/PLT
GOT/PLTin process A
GOT/PLT
.textcall printf.data
heap
stack
process
.text
.data
.rodata
.data
.rodata
.text
executable
.text
library
Gabriel Laskar (EPITA) CAAL 2015 262 / 378
Assembly programming
Part VII
Assembly programming
Gabriel Laskar (EPITA) CAAL 2015 263 / 378
Assembly programming Introduction
7: Assembly programming
Introduction
Pure assembly files
Inline Assembly in C
Gabriel Laskar (EPITA) CAAL 2015 264 / 378
Assembly programming Introduction
What is assembly?
Assembly is notI binary dataI directly interpreted by the processorI don’t mix it up with machine code
Assembly isI a language, with a syntax, keywords, etc.I translated in a binary format
Gabriel Laskar (EPITA) CAAL 2015 265 / 378
Assembly programming Introduction
Benefits of assembly programming
I Improve developper’s understanding of computer architecture andprogramming
I Direct access to hardware resourcesI Access to system featuresI Speed and efficiency of programsI Low footprint of programs
Gabriel Laskar (EPITA) CAAL 2015 266 / 378
Assembly programming Introduction
Drawbacks of assembly programming
I Low-level programming forbids abstractionI Assembly code is hard to keep cleanI Slows development down: lots of keywords, unusual behaviorsI Each processor architecture has its own language: conventions,
registers, instructions, privileges, allowed arguments
Gabriel Laskar (EPITA) CAAL 2015 267 / 378
Assembly programming Pure assembly files
7: Assembly programming
Introduction
Pure assembly filesAnatomy of assembly codeExamples
Inline Assembly in C
Gabriel Laskar (EPITA) CAAL 2015 268 / 378
Assembly programming Pure assembly files Anatomy of assembly code
7: Assembly programming
Introduction
Pure assembly filesAnatomy of assembly codeExamples
Inline Assembly in C
Gabriel Laskar (EPITA) CAAL 2015 269 / 378
Assembly programming Pure assembly files Anatomy of assembly code
Assembly file organization
Source file is organized in different sections:code
Contains program binary opcodesdata
Contains program global variablesrodata
Contains read-only program variables
Gabriel Laskar (EPITA) CAAL 2015 270 / 378
Assembly programming Pure assembly files Anatomy of assembly code
Assembly language statements
directivesDetermine the assembler behavior
instructionsChosen in the CPU instruction set, translated in binaryformat, written in output file
operandsExpressions containing register identifiers, immediateoperands, symbols
labelsBetween instructions, used to mark specific locations insource code
Gabriel Laskar (EPITA) CAAL 2015 271 / 378
Assembly programming Pure assembly files Examples
7: Assembly programming
Introduction
Pure assembly filesAnatomy of assembly codeExamples
Inline Assembly in C
Gabriel Laskar (EPITA) CAAL 2015 272 / 378
Assembly programming Pure assembly files Examples
Assembly language statementsExample
.mod_load asm-sparc
.define FOO 5
.section code .text.mod_asm opcodes v8
add %g0, FOO, %g1.ends
Gabriel Laskar (EPITA) CAAL 2015 273 / 378
Assembly programming Pure assembly files Examples
Assembly language statementsExample
.mod_load asm-sparc
.mod_load out-elf32
.section code .text.mod_asm opcodes v8
mov 0x12, %g1mov 0x34533, %g2
add %g1, %g2, %g3.ends
Gabriel Laskar (EPITA) CAAL 2015 274 / 378
Assembly programming Pure assembly files Examples
Assembly language statementsExample
.mod_load asm-sparc
.include sparc/v8.def
.section data .datalbl:
.reserve 4.ends
.section code .text.mod_asm opcodes v8
@set .data:lbl, %g2ld [%g2],%g1
.ends
Gabriel Laskar (EPITA) CAAL 2015 275 / 378
Assembly programming Pure assembly files Examples
Assembly language statementsExample
.mod_load asm-sparc
.include sparc/v8.def
.section code .text.mod_asm opcodes v8
.export main
.proc mainretrestore
.endp.ends
Gabriel Laskar (EPITA) CAAL 2015 276 / 378
Assembly programming Pure assembly files Examples
Assembly language statementsExample
.mod_load asm-sparc
.include sparc/v8.def
.section code .text.mod_asm opcodes v8
.extern exit
.proc my_exitcall exitnop
.endp.ends
Gabriel Laskar (EPITA) CAAL 2015 277 / 378
Assembly programming Inline Assembly in C
7: Assembly programming
Introduction
Pure assembly files
Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz
Gabriel Laskar (EPITA) CAAL 2015 278 / 378
Assembly programming Inline Assembly in C Presentation
7: Assembly programming
Introduction
Pure assembly files
Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz
Gabriel Laskar (EPITA) CAAL 2015 279 / 378
Assembly programming Inline Assembly in C Presentation
Why?
I C can’t express everythingI Cpu-specific registers (flags, condition codes, hardware counters)I Features not addressed by language (atomic operations)I System features (Memory handling, interrupts handling, exception
handling)I Compiler are not always aware of possible optimizations
I Use of instruction side-effectsI Un-optimizable patterns (or optimization patterns specific to only one
algorithm)
Gabriel Laskar (EPITA) CAAL 2015 280 / 378
Assembly programming Inline Assembly in C Presentation
Inline AssemblyCompiler’s PoV
For the compiler, the assembly you pass in is:I a raw stringI with a printf-like syntaxI passed directly to the assemblerI surounded by optional statements
I input valuesI output valuesI clobbered values
Gabriel Laskar (EPITA) CAAL 2015 281 / 378
Assembly programming Inline Assembly in C Presentation
Inline AssemblyProgrammer’s PoV
For you, the assembly you pass is meant to eitherI compute a valueI change some hardware featureI have a side-effect
Gabriel Laskar (EPITA) CAAL 2015 282 / 378
Assembly programming Inline Assembly in C Simple examples
7: Assembly programming
Introduction
Pure assembly files
Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz
Gabriel Laskar (EPITA) CAAL 2015 283 / 378
Assembly programming Inline Assembly in C Simple examples
ExampleNo data
static inlinevoid disable_interrupts(void){
asm volatile("cli");}
Gabriel Laskar (EPITA) CAAL 2015 284 / 378
Assembly programming Inline Assembly in C Simple examples
ExampleOutput only
static inlinecpu_cycle_t cpu_cycle_count(void){
uint32_t low, high;
asm("rdtsc" : "=a" (low), "=d" (high));
return (low | ((uint64_t)high << 32));}
Gabriel Laskar (EPITA) CAAL 2015 285 / 378
Assembly programming Inline Assembly in C Simple examples
ExampleInput only
static inlinevoid cpu_io_write_8(uintptr_t addr, uint8_t data){
asm volatile("outb %0, %1":: "a" (data), "d" ((uint16_t)addr));
}
Gabriel Laskar (EPITA) CAAL 2015 286 / 378
Assembly programming Inline Assembly in C Simple examples
ExampleIn-out
static inlineuint32_t cpu_endian_swap32(uint32_t x){asm ("bswap %0"
: "=r" (x): "0" (x));
return x;}
Gabriel Laskar (EPITA) CAAL 2015 287 / 378
Assembly programming Inline Assembly in C Syntax
7: Assembly programming
Introduction
Pure assembly files
Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz
Gabriel Laskar (EPITA) CAAL 2015 288 / 378
Assembly programming Inline Assembly in C Syntax
Syntax
I one statement with optional argumentsasm [volatile]("statements \n\t"
"on many lines \n\t"[: [output_variables][: [input_variables][: clobbered_registers]]]
);I arguments are referenced in-order (%0..%n), whether they are input or
output!I arguments types abide constraints, enclosed in ""
Gabriel Laskar (EPITA) CAAL 2015 289 / 378
Assembly programming Inline Assembly in C Complex examples
7: Assembly programming
Introduction
Pure assembly files
Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz
Gabriel Laskar (EPITA) CAAL 2015 290 / 378
Assembly programming Inline Assembly in C Complex examples
Add with carry and overflowC
uint32_t add_cv(uint32_t a, uint32_t b, uint8_t cin,uint8_t *cout, uint8_t *vout)
{uint64_t sum = (uint64_t)a + (uint64_t)b + (uint64_t)cin;*cout = sum >> 32;*vout = ( (b ^ ((uint32_t)sum)) & ~(a^b) ) >> 31;return sum;
}
Gabriel Laskar (EPITA) CAAL 2015 291 / 378
Assembly programming Inline Assembly in C Complex examples
Add with carry and overflowASM
uint32_t add_cv(uint32_t a, uint32_t b, uint8_t cin,uint8_t *cout, uint8_t *vout)
{uint32_t result;
asm(" btl $0, %k5 \n"" adcl %k3, %k4 \n"" setc %b1 \n"" seto %b2 \n": "=r" (result), "=qm" (*cout), "=qm" (*vout): "r" (a), "0" (b), "r" (cin)
);
return result;}
Gabriel Laskar (EPITA) CAAL 2015 292 / 378
Assembly programming Inline Assembly in C Complex examples
Atomic increment for ARMstatic inlinebool_t cpu_atomic_inc(volatile atomic_int_t *a){
reg_t tmp = 0, tmp2;
asm volatile("1: \n\t""ldrex %0, [%2] \n\t""add %0, %0, #1 \n\t""strex %1, %0, [%2] \n\t""tst %1, #1 \n\t""bne 1b \n\t": "=&r" (tmp), "=&r" (tmp2): "r" (a): "m" (*a));
return tmp != 0;}
Gabriel Laskar (EPITA) CAAL 2015 293 / 378
Assembly programming Inline Assembly in C Named values
7: Assembly programming
Introduction
Pure assembly files
Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz
Gabriel Laskar (EPITA) CAAL 2015 294 / 378
Assembly programming Inline Assembly in C Named values
Named values
I Using numbers for values makes code harder to write and readI GCC permits named values in inline assembly
Gabriel Laskar (EPITA) CAAL 2015 295 / 378
Assembly programming Inline Assembly in C Named values
Named valuesWith numbers
static inlinebool_t cpu_atomic_inc(volatile atomic_int_t *a){
reg_t tmp = 0, tmp2;
asm volatile("1: \n\t""ldrex %0, [%2] \n\t""add %0, %0, #1 \n\t""strex %1, %0, [%2] \n\t""tst %1, #1 \n\t""bne 1b \n\t": "=&r" (tmp), "=&r" (tmp2): "r" (a): "m" (*a));
return tmp != 0;}
Gabriel Laskar (EPITA) CAAL 2015 296 / 378
Assembly programming Inline Assembly in C Named values
Named valuesNamed
static inlinebool_t cpu_atomic_inc(volatile atomic_int_t *a){
reg_t tmp = 0, tmp2;
asm volatile("1: \n\t""ldrex %[tmp], [%[atomic]] \n\t""add %[tmp], %[tmp], #1 \n\t""strex %[tmp2], %[tmp], [%[atomic]] \n\t""tst %[tmp2], #1 \n\t""bne 1b \n\t": [tmp] "=&r" (tmp), [tmp2] "=&r" (tmp2), [clobber] "=m" (*a): [atomic] "r" (a));
return tmp != 0;}
Gabriel Laskar (EPITA) CAAL 2015 297 / 378
Assembly programming Inline Assembly in C Quizz
7: Assembly programming
Introduction
Pure assembly files
Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz
Gabriel Laskar (EPITA) CAAL 2015 298 / 378
Assembly programming Inline Assembly in C Quizz
Quizz!What does this do?
asm volatile("":::"memory");
Gabriel Laskar (EPITA) CAAL 2015 299 / 378
Focus on x86
Part VIII
Focus on x86
Gabriel Laskar (EPITA) CAAL 2015 300 / 378
Focus on x86 x86 story
8: Focus on x86
x86 storyTimelinex86 architectureInstruction set
Gabriel Laskar (EPITA) CAAL 2015 301 / 378
Focus on x86 x86 story Timeline
8: Focus on x86
x86 storyTimelinex86 architectureInstruction set
Gabriel Laskar (EPITA) CAAL 2015 302 / 378
Focus on x86 x86 story Timeline
Early events and CPU development
1971 Intel 40041978 Intel 8086 & 8088, first x86 CPUs1981 Intel 801861982 Intel 80286: 16 bits, 24 address bits, protection
Gabriel Laskar (EPITA) CAAL 2015 303 / 378
Focus on x86 x86 story Timeline
Successful CPU development
1985 Intel 80386: 32 bits, MMU1989 Intel 80486: on-chip cache, pipeline1993 Intel PentiumTM: superscalar, mmx, 64 address bits1995 Intel Pentium Pro: ooo1997 Intel Pentium II: internal L21999 Intel Pentium III: cpuid!1999 AMD Athlon: ooo, 3dnow
Gabriel Laskar (EPITA) CAAL 2015 304 / 378
Focus on x86 x86 story Timeline
Technology barriers
2000 Intel Pentium 4: HT2001 Intel Itanium2003 AMD Athlon 642003 Intel Pentium M: P6 (PPro to PIII)2006 Intel Pentium 4 Prescott 2M: 64 bits2006 Intel Core/Core2: fusion2009 Intel Core i5/i7: ...
Gabriel Laskar (EPITA) CAAL 2015 305 / 378
Focus on x86 x86 story x86 architecture
8: Focus on x86
x86 storyTimelinex86 architectureInstruction set
Gabriel Laskar (EPITA) CAAL 2015 306 / 378
Focus on x86 x86 story x86 architecture
x86 architecture
The x86 architecture is:I based on a CISC instruction set.I based on little-endian memory access.I based on two operands instructions including memory access.I backwards compatible: all new x86 processors are fully compatible
with their predecessors.I very complex: Pentium IV has a 20 stages pipeline.I newer processors have less pipeline stages
Gabriel Laskar (EPITA) CAAL 2015 307 / 378
Focus on x86 x86 story x86 architecture
Available registers
First x86 generation had a few 16-bits registers available:I General purpose 16 bits registers:
I ax, bx, cx, dx, si, diI General purpose 8 bits registers:
I al, bl, cl, dlI ah, bh, ch, dh
I Stack and frame registers:I sp, bp
I Flag registers, ...
Gabriel Laskar (EPITA) CAAL 2015 308 / 378
Focus on x86 x86 story x86 architecture
Nested registers
registers8 bits
register16 bits
AH AL
AX
Gabriel Laskar (EPITA) CAAL 2015 309 / 378
Focus on x86 x86 story x86 architecture
Register extensions
I Starting with the 386 processor, all registers are now 32-bits wide.I Due to compatibility issue, 16 bits register are still present.I eax, ebx, ecx, edx, esi, edi, esp, ebp were added.I Even if registers size is increased, registers count remained the same:
only 6 registers are available for operations.
I For amd64, AMD extended the register count to 16, only availablewith 64 bits mode enabled
I rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp,I r8, r9, r10, r11, r12, r13, r14, r15
I AMD licensed the amd64 to Intel after the Itanium flop...
Gabriel Laskar (EPITA) CAAL 2015 310 / 378
Focus on x86 x86 story x86 architecture
Register extensions
I Starting with the 386 processor, all registers are now 32-bits wide.I Due to compatibility issue, 16 bits register are still present.I eax, ebx, ecx, edx, esi, edi, esp, ebp were added.I Even if registers size is increased, registers count remained the same:
only 6 registers are available for operations.I For amd64, AMD extended the register count to 16, only available
with 64 bits mode enabledI rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp,
I r8, r9, r10, r11, r12, r13, r14, r15
I AMD licensed the amd64 to Intel after the Itanium flop...
Gabriel Laskar (EPITA) CAAL 2015 310 / 378
Focus on x86 x86 story x86 architecture
Register extensions
I Starting with the 386 processor, all registers are now 32-bits wide.I Due to compatibility issue, 16 bits register are still present.I eax, ebx, ecx, edx, esi, edi, esp, ebp were added.I Even if registers size is increased, registers count remained the same:
only 6 registers are available for operations.I For amd64, AMD extended the register count to 16, only available
with 64 bits mode enabledI rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp,I r8, r9, r10, r11, r12, r13, r14, r15
I AMD licensed the amd64 to Intel after the Itanium flop...
Gabriel Laskar (EPITA) CAAL 2015 310 / 378
Focus on x86 x86 story x86 architecture
Register extensions
I Starting with the 386 processor, all registers are now 32-bits wide.I Due to compatibility issue, 16 bits register are still present.I eax, ebx, ecx, edx, esi, edi, esp, ebp were added.I Even if registers size is increased, registers count remained the same:
only 6 registers are available for operations.I For amd64, AMD extended the register count to 16, only available
with 64 bits mode enabledI rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp,I r8, r9, r10, r11, r12, r13, r14, r15
I AMD licensed the amd64 to Intel after the Itanium flop...
Gabriel Laskar (EPITA) CAAL 2015 310 / 378
Focus on x86 x86 story x86 architecture
32/64 bits nested registers
RAX64 bitsregister
EAX 32 bitsregister
registers8 bits
register16 bits
AH AL
AX
Gabriel Laskar (EPITA) CAAL 2015 311 / 378
Focus on x86 x86 story x86 architecture
Instruction format
x86 architecture is a CISC based architecture:I All instructions have a variable length,I Instructions length can’t be determined before reading first bytes,I Many instructions with different formats are available.
Gabriel Laskar (EPITA) CAAL 2015 312 / 378
Focus on x86 x86 story x86 architecture
Instruction format
8086-80286
opcode opc2 immediate memorydisplacement
modr/m
Gabriel Laskar (EPITA) CAAL 2015 313 / 378
Focus on x86 x86 story x86 architecture
Instruction format
80386-Pentium
size sizedata
prefix prefix
address
SIBprefixopcode
opcode opc2 immediate memorydisplacement
modr/m
Gabriel Laskar (EPITA) CAAL 2015 313 / 378
Focus on x86 x86 story x86 architecture
Instruction format
3dnow!
3dnowimm
size sizedata
prefix prefix
address
SIBprefixopcode
opcode opc2 immediate memorydisplacement
modr/m
Gabriel Laskar (EPITA) CAAL 2015 313 / 378
Focus on x86 x86 story x86 architecture
Instruction format
x86-64
prefixx86−64
3dnowimm
size sizedata
prefix prefix
address
SIBprefixopcode
opcode opc2 immediate memorydisplacement
modr/m
Gabriel Laskar (EPITA) CAAL 2015 313 / 378
Focus on x86 x86 story x86 architecture
Addressing mode
The x86 architecture supports several complex addressing modes. On 32bit processors (386 and later) a memory address can contain:
I A base address registerI A address displacementI An index registerI An multiply factor on index register
Example: mov eax, [ebx + 0x12345 + ecx * 8]
Gabriel Laskar (EPITA) CAAL 2015 314 / 378
Focus on x86 x86 story x86 architecture
Wired stack management
Specific instructions are available for stack management:I push x instruction can be used to store data on the top of the stack
and decrement the stack pointer.I pop x instruction removes data from the stack.I call and ret instructions directly push and pop return address on
and from the stack.
The (e)sp register is used as stack pointer.
Gabriel Laskar (EPITA) CAAL 2015 315 / 378
Focus on x86 x86 story x86 architecture
Branch
I The x86 architecture uses a flag register to manage conditionalbranches.
I Many different instructions with different lengthes can be useddepending on jump size.
I No delayed slot are used
Gabriel Laskar (EPITA) CAAL 2015 316 / 378
Focus on x86 x86 story x86 architecture
Jump size
immediate16/32 bits
opc
E9 EB
imm.8 bits
opc
long jump(−32768 to +32767)
short jump(−128 to +127)
Gabriel Laskar (EPITA) CAAL 2015 317 / 378
Focus on x86 x86 story Instruction set
8: Focus on x86
x86 storyTimelinex86 architectureInstruction set
Gabriel Laskar (EPITA) CAAL 2015 318 / 378
Focus on x86 x86 story Instruction set
Instruction set
x86 instruction set is huge:I Over 580 instructions handled by Pentium 3 CPUI Over 850 opcodes handled by Pentium 3 CPU
A single instruction can be very complex:I Memory access capableI Handle different data widthsI Perform complex computation
Gabriel Laskar (EPITA) CAAL 2015 319 / 378
Focus on x86 x86 story Instruction set
Instruction set extensions
Extensions to the x86 instruction set are common and all instructions arenot available on all CPUs. Starting with the pentium processor,SIMD-based instruction sets appeared:
I MMXI 3dNow!I MMX2, SSEI 3dNow2!, SSE2, SSE3, SSE4sI To be continued...
Gabriel Laskar (EPITA) CAAL 2015 320 / 378
Focus on x86 x86 story Instruction set
Instruction setSIMD operations
source 1register
source 2register
destinationregister
normal operation
64 bits value
64 bits value
64 bits value
SIMD operation
4 x 16 bits
4 x 16 bits
4 x 16 bits
Gabriel Laskar (EPITA) CAAL 2015 321 / 378
Focus on x86 x86 story Instruction set
x86-specific coding
Due to its amazing number of available instructions, x86 code is highlytunable for optimizations:
I High level languages compilers are not able to use all instructionsefficiently,
I Hand written assembly is often faster,I Complex Instructions can be used to process unexpected tasks.
Gabriel Laskar (EPITA) CAAL 2015 322 / 378
Focus on x86 x86 story Instruction set
Instruction sethas-been parts
I aaa, aad, aam, aas, daa, dasI xlat
Gabriel Laskar (EPITA) CAAL 2015 323 / 378
Focus on x86 x86 story Instruction set
x86-specific codingexample: LEA
The LEA instruction is designed to compute memory addresses. But it canbe used to add and multiply values.
lea eax, [ebx + ecx]
lea eax, [ebx * 8 + ebx]
Gabriel Laskar (EPITA) CAAL 2015 324 / 378
Focus on x86 x86 story Instruction set
x86-specific codingexample: tiniest mem*()?
memcpy rep movbs
memset rep stosd
memcmp repe cmpsb
strncmp repnz cmpsb
Gabriel Laskar (EPITA) CAAL 2015 325 / 378
Focus on RISC processors
Part IX
Focus on RISC processors
Gabriel Laskar (EPITA) CAAL 2015 326 / 378
Focus on RISC processors History
9: Focus on RISC processors
History
Some instruction sets
Gabriel Laskar (EPITA) CAAL 2015 327 / 378
Focus on RISC processors History
Let’s see a timeline...
Alpha
EV6
21264
Alpha
EV7
21364
STI project CELL
1965 1970 1990 2000 20101975 1980 1985 1995 2005
CDC 6600
(Cray)
Data General Nova
(Edson de Castro)
RISC−I RISC−II SPARC SPARC−v8 SPARC−v9 Ultra−1 Ultra−2 Ultra−3
Mips−r2000 Mips−r3000 Mips−r4000
64−bit
Mips−r10k Mips−r12k
Mips V
Mips−r8k
Mips IV Mips32
Mips−r16k r24k
Cheetah America Bellatrix
RS−6000
RIOS 1/9
RSC PPC POWER 3
64 bits
POWER 4 POWER 5
ARM project ARM 2ARM 1 ARM 6 (v3) ARM 7
Thumb
ARM 9−tdmi ARM 11 M3 A8 M1
A9
M0
A9 >2GHz
Alpha
EV5
21164
Alpha
EV4
21064
RISC CPU
POWER 7DA
RPA
ARM
ltd
SUN
SGI
IBM
Aco
rn
Superscalar
Que
en
awar
d
PRISM
Ope
n so
urce
Har
d / S
oft c
ore
+licen
sing
Soft−
core
+licen
sing
Har
d−co
re
Super H SH−2 SH−4SH−3 SH−6
SH−7
TS1
PA−Risc
PCX−LPCX−S
PA−7000 PA−7100−LC
(SIMD)
PCX−U
PA−8000 PA−8600
PCX−W+
PA−8900
ShortfinHP
DEC
Hita
chi
A−I−
M
(ann
ounc
ed)
licen
sing
issu
es
Uns
ure
Uns
ure
Uns
ure
N64
PlayStation 2
PlayStation
PSP
Saturn
Dreamcast
3DO
GameCube
XBox 360
Wii
PS3
DS
GBA
DA
RPA
Mips
RISC ProjectDA
RPA
UC−Berkeley
Patterson
Stanford
Hennessy
Cambridge
Gabriel Laskar (EPITA) CAAL 2015 328 / 378
Focus on RISC processors Some instruction sets
9: Focus on RISC processors
History
Some instruction setsMipsSPARCPPCARM
Gabriel Laskar (EPITA) CAAL 2015 329 / 378
Focus on RISC processors Some instruction sets Mips
9: Focus on RISC processors
History
Some instruction setsMipsSPARCPPCARM
Gabriel Laskar (EPITA) CAAL 2015 330 / 378
Focus on RISC processors Some instruction sets Mips
MipsInstruction format
012345678910111213141516171819202122232425262728293031
J/JAL offset
OP rs rt immSpecial rs rt rd ... op
I 32 GPRI hard-wired r0 to 0I 32 FPU registers, all sizes aliasedI delay slot
Gabriel Laskar (EPITA) CAAL 2015 331 / 378
Focus on RISC processors Some instruction sets Mips
MipsAsm code
addiu r2, r3, 0x42 ; alu reg / immor r3, r4, r5 ; alu reg / reg
lui r2, 0x1234 ; load upper immediateori r2, r2, 0x5678 ; r2 = 0x12345678
lw r4, 0x1234(r9) ; load word from r9 + 0x1234
slt r1, r2, r3 ; test if r2 < r3bnez r1, 2f ; jump if truenop ; delay slot
lbu r2, (r3) ; load byte, not sign extended
jal foo ; pc = foo; r31 = return addressnopGabriel Laskar (EPITA) CAAL 2015 332 / 378
Focus on RISC processors Some instruction sets SPARC
9: Focus on RISC processors
History
Some instruction setsMipsSPARCPPCARM
Gabriel Laskar (EPITA) CAAL 2015 333 / 378
Focus on RISC processors Some instruction sets SPARC
SPARCInstruction format 012345678910111213141516171819202122232425262728293031
01 offset012345678910111213141516171819202122232425262728293031
00 rd op2 imm
00 a cond op2 imm012345678910111213141516171819202122232425262728293031
1x rd op3 rs1 0 asi rs2
1x rd op3 rs1 1 imm
1x rd op3 rs1 opf rs2
I 32 GPRI hard-wired %g0 to 0I register windowI unwindowed aliased FPU registers, 8 * 128 bits, 32 * 32 bitsI delay slotGabriel Laskar (EPITA) CAAL 2015 334 / 378
Focus on RISC processors Some instruction sets SPARC
SPARCAsm code
add %g3, 0x42, %g2 ; alu reg / immor %g3, %g4, %g5 ; alu reg / reg
sethi 0x12345, %g2 ; load upper immediate (20 bits)or %g2, 0x678, %g2 ; g2 = 0x12345678
ld [%l2 + 0x1234], %g4 ; load word from l2 + 0x1234ld [%l2 + %i3], %g4 ; load word from l2 + i3
tst %l2, %l3 ; test if l2 < l3blt 3f ; jump if true
Gabriel Laskar (EPITA) CAAL 2015 335 / 378
Focus on RISC processors Some instruction sets PPC
9: Focus on RISC processors
History
Some instruction setsMipsSPARCPPCARM
Gabriel Laskar (EPITA) CAAL 2015 336 / 378
Focus on RISC processors Some instruction sets PPC
PowerPCInstruction format
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
op jump offset aa lk
op rd ra immop rd ra rb mb me rc
op rd ra rb op2 rc
I 32 GPRI 32 FPR, fixed internal representationI additional registers for lr, ctrI no global flags, but 8 “cause registers” crxI can merge causes with logical operations
Gabriel Laskar (EPITA) CAAL 2015 337 / 378
Focus on RISC processors Some instruction sets PPC
PowerPCAsm code
addi 3, 2, 42 ; alu reg / immor. 3, 4, 5 ; alu reg / reg, update cr0
lis 3, 0x1234 ; load upper immediate (16 bits)ori 3, 3, 0x5678 ; r3 = 0x12345678
lwz 4, 9, 0x1234 ; load word from r9 + 0x1234lwzx 4, 9, 3 ; load word from r9 + r3
mtctr 4 ; put r4 in count register...bdnzl 2f ; Decrement CTR, Branch if CTR != 0
rlwinm ... ; Rotate and mask
Gabriel Laskar (EPITA) CAAL 2015 338 / 378
Focus on RISC processors Some instruction sets ARM
9: Focus on RISC processors
History
Some instruction setsMipsSPARCPPCARM
Gabriel Laskar (EPITA) CAAL 2015 339 / 378
Focus on RISC processors Some instruction sets ARM
ARMInstruction features
I 16 GPR, one of them is pc (r15)I Every instruction is “guarded” by a condition. It may be:
I always, >, >=, “higher”, overflow, negative, carry, ==I their opposites
I aliased FPR: 16 double, 32 float
You can prefix any instruction with “never”:I 1/16th of the instruction set is “nop”...
Gabriel Laskar (EPITA) CAAL 2015 340 / 378
Focus on RISC processors Some instruction sets ARM
ARMInstruction features
I 16 GPR, one of them is pc (r15)I Every instruction is “guarded” by a condition. It may be:
I always, >, >=, “higher”, overflow, negative, carry, ==I their opposites
I aliased FPR: 16 double, 32 float
You can prefix any instruction with “never”:I 1/16th of the instruction set is “nop”...
Gabriel Laskar (EPITA) CAAL 2015 340 / 378
Focus on RISC processors Some instruction sets ARM
ARMInstruction format
012345678910111213141516171819202122232425262728293031
cond 00 i op s rn rd imm/reg
cond 01 i p u bw l rn rd imm/reg
cond 100 p u bw l rn reg-list
cond 101 l jmp offset
cond 11 coproc, sys
Gabriel Laskar (EPITA) CAAL 2015 341 / 378
Focus on RISC processors Some instruction sets ARM
ARMInstruction format (2)
012345678910111213141516171819202122232425262728293031
cond 00 i op s rn rd imm/reg
cond 01 i p u bw l rn rd imm/reg
1 rotate imm
0 rs 0 ty 1 rm
0 amount ty 0 rm
and r3, r7, #42 ; r3 = r7 & 0x2aadd r2, r8, r5, lsl r1 ; r2 = r8 + (r5 << r1)sub r1, r9, r1, lsl #1 ; r1 = r9 - (r1 << 1)
012345678910111213141516171819202122232425262728293031
1110 00 1 and 0 r7 r3 « 0 0x2a
1110 00 0 add 0 r8 r2 r1 0 lsl 1 r5
1110 00 0 sub 0 r9 r1 0x1 lsl 0 r1
Gabriel Laskar (EPITA) CAAL 2015 342 / 378
Focus on RISC processors Some instruction sets ARM
ARMInstruction format (2)
012345678910111213141516171819202122232425262728293031
cond 00 i op s rn rd imm/reg
cond 01 i p u bw l rn rd imm/reg
1 rotate imm
0 rs 0 ty 1 rm
0 amount ty 0 rm
and r3, r7, #42 ; r3 = r7 & 0x2aadd r2, r8, r5, lsl r1 ; r2 = r8 + (r5 << r1)sub r1, r9, r1, lsl #1 ; r1 = r9 - (r1 << 1)
012345678910111213141516171819202122232425262728293031
1110 00 1 and 0 r7 r3 « 0 0x2a
1110 00 0 add 0 r8 r2 r1 0 lsl 1 r5
1110 00 0 sub 0 r9 r1 0x1 lsl 0 r1Gabriel Laskar (EPITA) CAAL 2015 342 / 378
Focus on RISC processors Some instruction sets ARM
ARMAsm code
add r3, r2, #0x42 ; alu reg / immorr r3, r4, r5 ; alu reg / regbic r3, r4, r5, lsr #4 ; alu reg / reg & shift
cmp r2, #0 ; test if r2 < 0rsblt r2, r2, #0 ; then r2 = 0 - r2
ldr r2, #0x12345678 ; rewritten asldr r2, [pc, #offset] ; pc-relative load
streq r4, [r2, r3, lsl #4] ; store word to r2 + r3 << 4; only if last cmp is ‘eq’
str r4, [r2, #8]! ; store word to r2 + 8; and r2 = r2 + 8
offset: ; 32-bit immediates zone0x12345678Gabriel Laskar (EPITA) CAAL 2015 343 / 378
Focus on RISC processors Some instruction sets ARM
ARMBlock transfers
012345678910111213141516171819202122232425262728293031
cond 100 p u bw l rn reg-list
stmdb sp!, {r2, r3, r4, r8}push {r2, r3, r4, r8}
012345678910111213141516171819202122232425262728293031
cond 100 1 0 0 1 0 r13 0000000100011100
Gabriel Laskar (EPITA) CAAL 2015 344 / 378
Focus on RISC processors Some instruction sets ARM
ARMBlock transfers hacks (1)The link register is r14, then the following code saves r14 and restores it toPC afterwards...
push {r2, r3, r4, r8, lr}...pop {r2, r3, r4, r8, pc}
..00
..04 sp → r2
..08 r3
..0c r4
..10 r8
..14 lr
..18 old sp → xxx
..1c
..20Gabriel Laskar (EPITA) CAAL 2015 345 / 378
Focus on RISC processors Some instruction sets ARM
ARMBlock transfers hacks (2)
Do a memcpy with some free registers, for instance, copy 16 bytes from*r0 to *r1, not using r2 nor r5, update r0 and r1 to point on next block...
ldmia r0!, {r3, r4, r6, r7}stmia r1!, {r3, r4, r6, r7}
Gabriel Laskar (EPITA) CAAL 2015 346 / 378
Focus on RISC processors Some instruction sets ARM
ARM Instruction-space exhaustion
Eventually, ARM instruction-set continued to grow despite the obviousexhaustion of the “clean” instruction-set space.
I abuse of concatenation of seldom bits around the instruction wordI sometimes forbid r15 as an operand, and reuse other fieldsI reuse the “never” condition code
Gabriel Laskar (EPITA) CAAL 2015 347 / 378
Focus on RISC processors Some instruction sets ARM
ARM Instruction-space exhaustionIllustration
012345678910111213141516171819202122232425262728293031
cond 00 I op s rn rd op2
1 rotate imm
0 rs 0 ty 1 rm
0 amount ty 0 rm
How to add the multiplication?
cond 00 0 000 a s rn rd rs 1001 rm
Gabriel Laskar (EPITA) CAAL 2015 348 / 378
Focus on RISC processors Some instruction sets ARM
ARM Instruction-space exhaustionIllustration
012345678910111213141516171819202122232425262728293031
cond 00 I op s rn rd op2
1 rotate imm
0 rs 0 ty 1 rm
0 amount ty 0 rm
How to add the multiplication?
cond 00 0 000 a s rn rd rs 1001 rm
Gabriel Laskar (EPITA) CAAL 2015 348 / 378
Focus on RISC processors Some instruction sets ARM
Thumb mode
One goal: Code compression.Concessions to pack a maximum of operations in a minimal space:
I Access to only 8 registers among 16I Implicit stack opsI Implicit pc-relative operationsI No predicates any more
Dirty hacks for making it work:I Use lower bit of r15 as a mode indicatorI Use a new bx/blx instruction
Gabriel Laskar (EPITA) CAAL 2015 349 / 378
Focus on RISC processors Some instruction sets ARM
Thumb2
Keep up with thumb, this is great.Ideas:
I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumb
I 16 bit instruction words are not enough any moreNever mind
I Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...I Decode mixed 16/32-bits instructionsI Only 16-bit alignment constraint for 32-bit long thumb-2 ops
Then Thumb-2 is a RISC with a CISC encoding!
Gabriel Laskar (EPITA) CAAL 2015 350 / 378
Focus on RISC processors Some instruction sets ARM
Thumb2
Keep up with thumb, this is great.Ideas:
I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumbI 16 bit instruction words are not enough any more
Never mindI Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...I Decode mixed 16/32-bits instructionsI Only 16-bit alignment constraint for 32-bit long thumb-2 ops
Then Thumb-2 is a RISC with a CISC encoding!
Gabriel Laskar (EPITA) CAAL 2015 350 / 378
Focus on RISC processors Some instruction sets ARM
Thumb2
Keep up with thumb, this is great.Ideas:
I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumbI 16 bit instruction words are not enough any more
Never mindI Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...
I Decode mixed 16/32-bits instructionsI Only 16-bit alignment constraint for 32-bit long thumb-2 ops
Then Thumb-2 is a RISC with a CISC encoding!
Gabriel Laskar (EPITA) CAAL 2015 350 / 378
Focus on RISC processors Some instruction sets ARM
Thumb2
Keep up with thumb, this is great.Ideas:
I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumbI 16 bit instruction words are not enough any more
Never mindI Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...I Decode mixed 16/32-bits instructions
I Only 16-bit alignment constraint for 32-bit long thumb-2 opsThen Thumb-2 is a RISC with a CISC encoding!
Gabriel Laskar (EPITA) CAAL 2015 350 / 378
Focus on RISC processors Some instruction sets ARM
Thumb2
Keep up with thumb, this is great.Ideas:
I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumbI 16 bit instruction words are not enough any more
Never mindI Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...I Decode mixed 16/32-bits instructionsI Only 16-bit alignment constraint for 32-bit long thumb-2 ops
Then Thumb-2 is a RISC with a CISC encoding!
Gabriel Laskar (EPITA) CAAL 2015 350 / 378
Focus on RISC processors Some instruction sets ARM
Thumb2
Keep up with thumb, this is great.Ideas:
I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumbI 16 bit instruction words are not enough any more
Never mindI Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...I Decode mixed 16/32-bits instructionsI Only 16-bit alignment constraint for 32-bit long thumb-2 ops
Then Thumb-2 is a RISC with a CISC encoding!
Gabriel Laskar (EPITA) CAAL 2015 350 / 378
CPU-aware optimizations
Part X
CPU-aware optimizations
Gabriel Laskar (EPITA) CAAL 2015 351 / 378
CPU-aware optimizations Rationale
10: CPU-aware optimizations
Rationale
Examples
Code readability guidelines
Gabriel Laskar (EPITA) CAAL 2015 352 / 378
CPU-aware optimizations Rationale
Purpose
Knowing the low-level implementation of the processors, we can:I write code easier for the compiler to translateI write code easier for the CPU to run
I because nicer with the pipelineI because nicer with the memory subsystem
Gabriel Laskar (EPITA) CAAL 2015 353 / 378
CPU-aware optimizations Examples
10: CPU-aware optimizations
Rationale
Examplesabs()Sign extensionPower of two detectionMask merging
Code readability guidelines
Gabriel Laskar (EPITA) CAAL 2015 354 / 378
CPU-aware optimizations Examples abs()
10: CPU-aware optimizations
Rationale
Examplesabs()Sign extensionPower of two detectionMask merging
Code readability guidelines
Gabriel Laskar (EPITA) CAAL 2015 355 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineexample: abs()
The absolute value is a good example of optimized code which is easy toimplement in an efficient and smart way.
Gabriel Laskar (EPITA) CAAL 2015 356 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineabs() basic code
int abs(int x){
if (x < 0)return -x;
elsereturn x;
}
int abs(int x){
return (x < 0) ? -x : x;}
Gabriel Laskar (EPITA) CAAL 2015 357 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineexample: abs()
Let’s go back to some mathematical principles:
I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1
I if (x < 0), returnI if (x > 0), return
I Sign extension: (x » 31)
Gabriel Laskar (EPITA) CAAL 2015 358 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineexample: abs()
Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)
I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1
I if (x < 0), returnI if (x > 0), return
I Sign extension: (x » 31)
Gabriel Laskar (EPITA) CAAL 2015 358 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineexample: abs()
Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffff
I Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1
I if (x < 0), returnI if (x > 0), return
I Sign extension: (x » 31)
Gabriel Laskar (EPITA) CAAL 2015 358 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineexample: abs()
Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1
I not a = a xor 0xffffffff = a xor -1
I if (x < 0), returnI if (x > 0), return
I Sign extension: (x » 31)
Gabriel Laskar (EPITA) CAAL 2015 358 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineexample: abs()
Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1
I if (x < 0), returnI if (x > 0), return
I Sign extension: (x » 31)
Gabriel Laskar (EPITA) CAAL 2015 358 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineexample: abs()
Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1
I if (x < 0), return -xI if (x > 0), return x
I Sign extension: (x » 31)
Gabriel Laskar (EPITA) CAAL 2015 358 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineexample: abs()
Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1
I if (x < 0), return ((not x) + 1)I if (x > 0), return x
I Sign extension: (x » 31)
Gabriel Laskar (EPITA) CAAL 2015 358 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineexample: abs()
Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1
I if (x < 0), return ((x ^ 0xffffffff) - 0xffffffff)I if (x > 0), return ((x ^ 0) + 0)
I Sign extension: (x » 31)
Gabriel Laskar (EPITA) CAAL 2015 358 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineexample: abs()
Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1
I if (x < 0), return ((x ^ 0xffffffff) - 0xffffffff)I if (x > 0), return ((x ^ 0) + 0)
How to produce -1 when x < 0?
I Sign extension: (x » 31)
Gabriel Laskar (EPITA) CAAL 2015 358 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineexample: abs()
Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1
I if (x < 0), return ((x ^ 0xffffffff) - 0xffffffff)I if (x > 0), return ((x ^ 0) + 0)
How to produce -1 when x < 0?I Sign extension: (x » 31)
Gabriel Laskar (EPITA) CAAL 2015 358 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineabs() better code
int abs(int x){
int sign_word = x >> 31;return (x ^ sign_word) - sign_word;
}
int abs(int x){
int sign_word = -(1 & (x >> 31));return (x ^ sign_word) - sign_word;
}
Gabriel Laskar (EPITA) CAAL 2015 359 / 378
CPU-aware optimizations Examples abs()
Nicer with the pipelineabs() better code
int abs(int x){
int sign_word = x >> 31;return (x ^ sign_word) - sign_word;
}
int abs(int x){
int sign_word = -(1 & (x >> 31));return (x ^ sign_word) - sign_word;
}
Gabriel Laskar (EPITA) CAAL 2015 359 / 378
CPU-aware optimizations Examples Sign extension
10: CPU-aware optimizations
Rationale
Examplesabs()Sign extensionPower of two detectionMask merging
Code readability guidelines
Gabriel Laskar (EPITA) CAAL 2015 360 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary size
Sometimes, we need to sign extend a value from an arbitrary word width(say 13 bits) to a CPU word (say 32 bits).How to do it?
x 0000000000000000000snnnnnnnnnnnn
-high 11111111111111111111000000000000
result ssssssssssssssssssssnnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits - 1);return (val & high) ? (val | (-high)) : val;
}
Gabriel Laskar (EPITA) CAAL 2015 361 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary size
Sometimes, we need to sign extend a value from an arbitrary word width(say 13 bits) to a CPU word (say 32 bits).How to do it?
x 0000000000000000000snnnnnnnnnnnn
-high 11111111111111111111000000000000
result ssssssssssssssssssssnnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits - 1);return (val & high) ? (val | (-high)) : val;
}
Gabriel Laskar (EPITA) CAAL 2015 361 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary size
Sometimes, we need to sign extend a value from an arbitrary word width(say 13 bits) to a CPU word (say 32 bits).How to do it?
x 0000000000000000000snnnnnnnnnnnn
-high 11111111111111111111000000000000
result ssssssssssssssssssssnnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits - 1);return (val & high) ? (val | (-high)) : val;
}
Gabriel Laskar (EPITA) CAAL 2015 361 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary size
Sometimes, we need to sign extend a value from an arbitrary word width(say 13 bits) to a CPU word (say 32 bits).How to do it?
x 0000000000000000000snnnnnnnnnnnn
-high 11111111111111111111000000000000
result ssssssssssssssssssssnnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits - 1);return (val & high) ? (val | (-high)) : val;
}
Gabriel Laskar (EPITA) CAAL 2015 361 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary size
x 0000000000000000000snnnnnnnnnnnn
« snnnnnnnnnnnn0000000000000000000
» ssssssssssssssssssssnnnnnnnnnnnn
int sign_ext(int val, int bits){
int shift_bits = 32 - bits;return (val << shift_bits) >> shift_bits;
}
Gabriel Laskar (EPITA) CAAL 2015 362 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary size
x 0000000000000000000snnnnnnnnnnnn
« snnnnnnnnnnnn0000000000000000000
» ssssssssssssssssssssnnnnnnnnnnnn
int sign_ext(int val, int bits){
int shift_bits = 32 - bits;return (val << shift_bits) >> shift_bits;
}
Gabriel Laskar (EPITA) CAAL 2015 362 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary size
x 0000000000000000000snnnnnnnnnnnn
« snnnnnnnnnnnn0000000000000000000
» ssssssssssssssssssssnnnnnnnnnnnn
int sign_ext(int val, int bits){
int shift_bits = 32 - bits;return (val << shift_bits) >> shift_bits;
}
Gabriel Laskar (EPITA) CAAL 2015 362 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary size
x 0000000000000000000snnnnnnnnnnnn
« snnnnnnnnnnnn0000000000000000000
» ssssssssssssssssssssnnnnnnnnnnnn
int sign_ext(int val, int bits){
int shift_bits = 32 - bits;return (val << shift_bits) >> shift_bits;
}
Gabriel Laskar (EPITA) CAAL 2015 362 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnn
sb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn
x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn
x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits-1);return (val ^ high) - high;
}
Gabriel Laskar (EPITA) CAAL 2015 363 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000
xor 0000000000000000000Snnnnnnnnnnnn
x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn
x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits-1);return (val ^ high) - high;
}
Gabriel Laskar (EPITA) CAAL 2015 363 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn
x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn
x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits-1);return (val ^ high) - high;
}
Gabriel Laskar (EPITA) CAAL 2015 363 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn
x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn
x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits-1);return (val ^ high) - high;
}
Gabriel Laskar (EPITA) CAAL 2015 363 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn
x 00000000000000000001nnnnnnnnnnnn
sb 00000000000000000001000000000000x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn
x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits-1);return (val ^ high) - high;
}
Gabriel Laskar (EPITA) CAAL 2015 363 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn
x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn
x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits-1);return (val ^ high) - high;
}
Gabriel Laskar (EPITA) CAAL 2015 363 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn
x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn
x 00000000000000000000nnnnnnnnnnnn
sb 00000000000000000001000000000000x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits-1);return (val ^ high) - high;
}
Gabriel Laskar (EPITA) CAAL 2015 363 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn
x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn
x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits-1);return (val ^ high) - high;
}
Gabriel Laskar (EPITA) CAAL 2015 363 / 378
CPU-aware optimizations Examples Sign extension
Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn
x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn
x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000
x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn
int sign_ext(int val, int bits){
int high = 1 << (bits-1);return (val ^ high) - high;
} Gabriel Laskar (EPITA) CAAL 2015 363 / 378
CPU-aware optimizations Examples Power of two detection
10: CPU-aware optimizations
Rationale
Examplesabs()Sign extensionPower of two detectionMask merging
Code readability guidelines
Gabriel Laskar (EPITA) CAAL 2015 364 / 378
CPU-aware optimizations Examples Power of two detection
Power of twoProperties
I Powers of two have only one “1” bitI Substracting 1 from a power of two flips the only “1”
Examples:I 1110010 - 1 = 1110001I 0100000 - 1 = 0011111
I Lower bits up to the lowest 1 get flippedI Other bits stay the same
int is_pow2(int n){
return !(n & (n - 1));}
Gabriel Laskar (EPITA) CAAL 2015 365 / 378
CPU-aware optimizations Examples Power of two detection
Power of twoProperties
I Powers of two have only one “1” bitI Substracting 1 from a power of two flips the only “1”
Examples:I 1110010 - 1 = 1110001I 0100000 - 1 = 0011111
I Lower bits up to the lowest 1 get flippedI Other bits stay the same
int is_pow2(int n){
return !(n & (n - 1));}
Gabriel Laskar (EPITA) CAAL 2015 365 / 378
CPU-aware optimizations Examples Power of two detection
Power of twoProperties
I Powers of two have only one “1” bitI Substracting 1 from a power of two flips the only “1”
Examples:I 1110010 - 1 = 1110001I 0100000 - 1 = 0011111
I Lower bits up to the lowest 1 get flippedI Other bits stay the same
int is_pow2(int n){
return !(n & (n - 1));}
Gabriel Laskar (EPITA) CAAL 2015 365 / 378
CPU-aware optimizations Examples Mask merging
10: CPU-aware optimizations
Rationale
Examplesabs()Sign extensionPower of two detectionMask merging
Code readability guidelines
Gabriel Laskar (EPITA) CAAL 2015 366 / 378
CPU-aware optimizations Examples Mask merging
Mask merging
01234567
where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0
mask 0 0 1 1 1 1 0 0
result 1 1 0 0 1 1 0 0
int mask_merge(int mask, int where_0, int where_1){
return (mask & where_1) | (~mask & where_0);}
Gabriel Laskar (EPITA) CAAL 2015 367 / 378
CPU-aware optimizations Examples Mask merging
Mask merging
01234567
where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0
mask 0 0 1 1 1 1 0 0
result 1 1 0 0 1 1 0 0
int mask_merge(int mask, int where_0, int where_1){
return (mask & where_1) | (~mask & where_0);}
Gabriel Laskar (EPITA) CAAL 2015 367 / 378
CPU-aware optimizations Examples Mask merging
Mask mergingLess instructions
01234567
where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0
where_0 ^ where_1 0 1 1 0 0 1 1 0
mask 0 0 1 1 1 1 0 0
(w0 ^ w1) &mask 0 0 1 0 0 1 0 0
((w0 ^ w1) &mask) ^ w0 1 1 0 0 1 1 0 0
int mask_merge(int mask, int where_0, int where_1){
return ((where_0 ^ where_1) & mask) ^ where_0;}
Gabriel Laskar (EPITA) CAAL 2015 368 / 378
CPU-aware optimizations Examples Mask merging
Mask mergingLess instructions
01234567
where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0
where_0 ^ where_1 0 1 1 0 0 1 1 0
mask 0 0 1 1 1 1 0 0
(w0 ^ w1) &mask 0 0 1 0 0 1 0 0
((w0 ^ w1) &mask) ^ w0 1 1 0 0 1 1 0 0
int mask_merge(int mask, int where_0, int where_1){
return ((where_0 ^ where_1) & mask) ^ where_0;}
Gabriel Laskar (EPITA) CAAL 2015 368 / 378
CPU-aware optimizations Examples Mask merging
Mask mergingLess instructions
01234567
where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0
where_0 ^ where_1 0 1 1 0 0 1 1 0
mask 0 0 1 1 1 1 0 0
(w0 ^ w1) &mask 0 0 1 0 0 1 0 0
((w0 ^ w1) &mask) ^ w0 1 1 0 0 1 1 0 0
int mask_merge(int mask, int where_0, int where_1){
return ((where_0 ^ where_1) & mask) ^ where_0;}
Gabriel Laskar (EPITA) CAAL 2015 368 / 378
CPU-aware optimizations Examples Mask merging
Mask mergingLess instructions
01234567
where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0
where_0 ^ where_1 0 1 1 0 0 1 1 0
mask 0 0 1 1 1 1 0 0
(w0 ^ w1) &mask 0 0 1 0 0 1 0 0
((w0 ^ w1) &mask) ^ w0 1 1 0 0 1 1 0 0
int mask_merge(int mask, int where_0, int where_1){
return ((where_0 ^ where_1) & mask) ^ where_0;}
Gabriel Laskar (EPITA) CAAL 2015 368 / 378
CPU-aware optimizations Code readability guidelines
10: CPU-aware optimizations
Rationale
Examples
Code readability guidelines
Gabriel Laskar (EPITA) CAAL 2015 369 / 378
CPU-aware optimizations Code readability guidelines
Code readability
If you ever code with this kind of hackI always create functions with explicit name and prototypeI eventually document the intended behaviorI may use a static inline
int abs(int x);int sign_ext(int val, int bits);int is_pow2(int n);int mask_merge(int mask, int where_0, int where_1);
Gabriel Laskar (EPITA) CAAL 2015 370 / 378
Multi-/Many-core, heterogeneoussystems
Part XI
Multi-/Many-core, heterogeneous systems
Gabriel Laskar (EPITA) CAAL 2015 371 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
11: Multi-/Many-core, heterogeneous systems
Topology evolution
Challenges
Gabriel Laskar (EPITA) CAAL 2015 372 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
History of hardware topologies
Video
Floppy
Serial
Keyboard
Sound
Network Hard Disk
RAM
CPU
Gabriel Laskar (EPITA) CAAL 2015 373 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
History of hardware topologies
Floppy
Serial
BridgeBridge
VideoNetwork
Hard DiskSound
RAM
Keyboard
CPU
Gabriel Laskar (EPITA) CAAL 2015 373 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
History of hardware topologies
Floppy
Serial
BridgeBridge
VideoNetwork
Hard DiskSound
CPU CPU
RAM
Keyboard
Gabriel Laskar (EPITA) CAAL 2015 373 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
History of hardware topologies
CPU CPU
Floppy
Serial
BridgeRAM
BridgeBridge
Network
Hard DiskSound
Video
USB
Keyboard
Gabriel Laskar (EPITA) CAAL 2015 373 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
History of hardware topologies
CPU CPU
BridgeRAM Video
Bridge
ATA
Net
USB
Sound
Floppy
Kbd
Mouse
PCI
USB
Serial
Gabriel Laskar (EPITA) CAAL 2015 373 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
History of hardware topologies
CPU CPU
BridgeRAM Video
Bridge
ATA
Net
USB
Sound
Floppy
Kbd
Mouse
PCI
USB
SATA
Wlan
SerialSD
Gabriel Laskar (EPITA) CAAL 2015 373 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
History of hardware topologies
BridgeRAM
CPU
Many core
CPU
Many core
Bridge
ATA
Net
USB
Sound
Floppy
Kbd
Mouse
PCI
USB
SATA
Wlan
SerialSD
Video
PCI−e
Gabriel Laskar (EPITA) CAAL 2015 373 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
History of hardware topologies
Bridge
CPU
Many core
CPU
Many coreRAM RAM
Video
Bridge
Net
USB
Sound
PCI
USB
SATA
Wlan
SD
PCI−e
PCI−e
ATA Floppy
Kbd
Mouse
Serial
Gabriel Laskar (EPITA) CAAL 2015 373 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
Future of hardware topologies?
USB
Bridge
Net
USB
Sound
SATA
Wlan
SD
ATA Floppy
Kbd
Mouse
Serial
PCI
Video
CPU
Many core
CPU
Many core
CPU
Many core
CPU
Many core
CPU
Many core
CPU
Many core
RAMRAM
RAMRAM
RAMRAM
GPGPUGPGPU
GPGPUGPGPU
GPGPUGPGPU
Gabriel Laskar (EPITA) CAAL 2015 374 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
Hardware topologiesObservations
Busses don’t scaleI Bandwidth is shared among connected peersI They need rest cycles between elections
We don’t have busses any moreI We use networks (QPI, HyperTransport)I This forbids snoopingI Coherence has to be done explicitely
Gabriel Laskar (EPITA) CAAL 2015 375 / 378
Multi-/Many-core, heterogeneoussystems Topology evolution
NUMAObservations
I There are more and more coresI Memory gets closer to the coresI Memory connections get distributed among coresI Systems become NUMA
Gabriel Laskar (EPITA) CAAL 2015 376 / 378
Multi-/Many-core, heterogeneoussystems Challenges
11: Multi-/Many-core, heterogeneous systems
Topology evolution
Challenges
Gabriel Laskar (EPITA) CAAL 2015 377 / 378
Multi-/Many-core, heterogeneoussystems Challenges
Some scalability bottlenecks
Coherent shared-memory systemsI are hard to designI dont scale wellI get slower with the load
NUMA systemsI are hard to program forI are not well supported in all OSes
Uncoherent shared-memory systems are not ready for prime-time yet
Gabriel Laskar (EPITA) CAAL 2015 378 / 378