arm assembly language and machine codecs107e.github.io/lectures/asm/slides.pdf · 2020-02-26 ·...
TRANSCRIPT
ARM
Assembly Language and
Machine Code
Goal: Blink an LED
///SET0=0x2020001c//SET0=0x20200020movr0,#0x20//0x00000020lslr1,r0,#24//0x20000000lslr2,r0,#16//0x00200000orrr0,r1,r2//0x20200000orrr0,r0,#0x1c//0x2020001cmovr1,#1//0x00000001lslr1,#20//0x00010000strr1,[r0]//store1<<20to0x2020001c
//loopforeverloop:bloop
//configureGPIO20foroutput////FSEL0=0x20200000//FSEL1=0x20200004//FSEL2=0x20200008//...movr0,#0x20//0x00000020lslr1,r0,#24//0x20000000lslr2,r0,#16//0x00200000orrr0,r1,r2//0x20200000orrr0,r0,#0x08//0x20200008movr1,#1//1indicatesOUTPUTstrr1,[r0]//store1to0x20200008
Watchoutfor…Manualsays:0x7E200000Replace7Ewith20:0x20200000
Ref: BCM2835-ARM-Peripherals.pdf
GPIO Function Select Registers Addresses
Specifying 1 of 8 functions requires 3 bits
Bit pattern Pin Function000 The pin in an input 001 The pin is an output 100 The pin does alternate function 0 101 The pin does alternate function 1 110 The pin does alternate function 2111 The pin does alternate function 3011 The pin does alternate function 4010 The pin does alternate function 5
GPIO Pins can be configured to be INPUT, OUTPUT, or ALT0-ALT5
2 1 05 4 38 7 69101114 13 1215161720 19 1821222326 25 242728293031
GPIO 0GPIO 1GPIO 3 GPIO 4GPIO 5GPIO 6 GPIO 7GPIO 8GPIO 9 GPIO 2
GPIO Function Select Register
3 bits per GPIO pin
10 pins per 32-bit register (2 wasted bits)
Function is INPUT, OUTPUT, or ALT0-ALT5
54 GPIOs pins requires 6 registers
2 1 05 4 38 7 69101114 13 1215161720 19 1821222326 25 242728293031
34 33 3237 36 3540 39 3841424346 45 4447484952 51 5053
2020001C:GPIOSET0Register20200020:GPIOSET1Register
GPIO Function SET Register
Notes 1. 1 bit per GPIO pin 2. 54 pins requires 2 registers
02000000016
10000000016Memory Map 4 GB
Ref: BCM2835-ARM-Peripherals.pdf
Peripheral registers are mapped into address space
Memory-Mapped IO (MMIO)
MMIO space is above physical memory
512 MB
3 Types of Instructions
1. Data processing instructions
2. Loads from and stores to memory
3. Conditional branches to new program locations
Data Processing Instructions and
Machine Code
From armisa.pdf
#dataprocessinginstruction##ra=rboprc
oprbrarc111000ioooosbbbbaaaacccccccccccc
Data processing instruction
Always execute the instruction
Immediate mode instruction
Set condition codes
Assembly Code OperationsAND 0000 ra=rb&rcEOR(XOR) 0001 ra=rb^rcSUB 0010 ra=rb-rcRSB 0011 ra=rc-rbADD 0100 ra=rb+rcADC 0101 ra=rb+rc+CARRYSBC 0110 ra=rb-rc+(1-CARRY)RSC 0111 ra=rc-rb+(1-CARRY)TST 1000 rb&rc(ranotset)TEQ 1001 rb^rc(ranotset)CMP 1010 rb-rc(ranotset)CMN 1011 rb+rc(ranotset)ORR(OR) 1100 ra=rb|rcMOV 1101 ra=rcBIC 1110 ra=rb&~rcMVN 1111 ra=~rc
#dataprocessinginstruction#ra=rboprc#
oprbrarc111000ioooosbbbbaaaacccccccccccc
#i=0,s=0addr1r0r211100000100000010000000000000010
#dataprocessinginstruction#ra=rboprc#
oprbrarc111000ioooosbbbbaaaacccccccccccc
#i=0,s=0addr1r0r211100000100000010000000000000010
11100000100000010000000000000010E0810002
E0 81 00 02020081E0
ADDRADDR+1ADDR+2ADDR+3
little-endian(LSBfirst)
most-significant-byte (MSB)
least-significant-byte (LSB)
ARM uses little-endian
E0 81 00 02E0810002
big-endian(MSBfirst)
most-significant-byte (MSB)
ADDRADDR+1ADDR+2ADDR+3
least-significant-byte (LSB)
Read: Holy Wars and a Plea For Peace, D. Cohen
The 'little-endian' and 'big-endian' terminology which is used to denote the two approaches [to addressing memory] is derived from Swift's Gulliver s Travels. The inhabitants of Lilliput, who are well known for being rather small, are, in addition, constrained by law to break their eggs only at the little end. When this law is imposed, those of their fellow citizens who prefer to break their eggs at the big end take exception to the new rule and civil war breaks out. The big-endians eventually take refuge on a nearby island, which is the kingdom of Blefuscu. The civil war results in many casualties.
Registers
ALU
DATA
ADDR
INST
+
Memory
ADDR
1
r1
r0
addr0,r1,#1
Immediate value (#1) stored in INST
#dataprocessinginstruction#ra=rbop#imm##imm=uuuuuuuu
addr1r0imm111000101000000100000000uuuuuuuu
addr0,r1,#1
#i=1,s=0##Asinimmediatelyavailable,#i.e.noneedtofetchfrommemory
#dataprocessinginstruction#ra=rbop#imm##imm=uuuuuuuu
addr1r0imm111000101000000100000000uuuuuuuu
addr0,r1,#1addr1r0#111100010100000010000000000000001
#dataprocessinginstruction#ra=rbop#imm##imm=uuuuuuuu
addr1r0imm111000101000000100000000uuuuuuuu
addr0,r1,#1addr1r0#111100010100000010000000000000001
11100010100000010000000000000001E2810001
Registers
ALU
Shift
DATA
ADDR
INST
+
Memory
ADDR
Rotate Right (ROR) - Rotation amount = 2x
#dataprocessinginstruction#ra=rbopimm#imm=(uuuuuuuu)ROR(2*rrrr)
oprbrarorimm1110001oooo0bbbbaaaarrrruuuuuuuu
RORmeansRotateRight(imm>>>rotate)
#dataprocessinginstruction#ra=rbopimm#imm=(uuuuuuuu)ROR(2*rrrr)
oprbraroruuu1110001oooo0bbbbaaaarrrruuuuuuuu
addr0,r1,#0x10000addr1r00x01>>>2*811100010100000010000100000000001
0x01>>>160000000000000000000000000000000100000000000000010000000000000000
#dataprocessinginstruction#ra=rbopimm#imm=(uuuuuuuu)ROR(2*rrrr)
oprbrarorimm1110001oooo0bbbbaaaarrrruuuuuuuu
addr0,r1,#0x10000addr1r00x01>>>2*811100010100000010000100000000001
11100010100000010000100000000001E2810801
#Determinethemachinecodefor
subr7,r5,#0x300
#imm=(uuuuuuuu)ROR(2*rrrr)
#Rememberthatraistheresult
oprbrarorimm111000ioooosbbbbaaaarrrruuuuuuuu
//Whatisthemachinecode?
Assembly Code OperationsSUB 0010 ra=rb-rchint:
#dataprocessinginstruction#ra=rbopimm#imm=uuuuuuuuROR(2*rrrr)
oprbraror111000ioooosbbbbaaaarrrruuuuuuuu
subr7,r5,#0x300subr5r7#0x03>>>2411100010010001010111110000000011
11100010010001010111110000000011E2457C03
…
//SET1=0x2020001cmovr0,#0x20000000//0x20>>>8orrr0,#0x00200000//0x20>>>16orrr0,#0x0000001c//0x1c>>>0
ldrr0,[r1,#4]
r1
ADDR=r1+4DATA= Memory[ADDR]
Load from Memory to Register (LDR)
4Registers
ALU
DATA
ADDR
INST
+
Memory
//configureGPIO20foroutputldrr0,FSEL2movr1,#1strr1,[r0]
//setbit20
ldrr0,SET0movr1,#0x00100000strr1,[r0]
loop:bloop
FSEL0:.word0x20200000FSEL1:.word0x20200004FSEL2:.word0x20200008SET0:.word0x2020001CSET1:.word0x20200020CLR0:.word0x20200028CLR1:.word0x2020002C
Fetch
3 steps to run an instruction
Decode Execute
Fetch
3 instructions takes 9 steps
Decode Execute Fetch DecodeDecode Execute
Fetch Decode Execute
Fetch Decode Execute
Fetch Decode Execute
To speed things up, steps are overlapped ("pipelined")
Fetch
To speed things up, steps are overlapped ("pipelined")
Decode Execute
Fetch Decode Execute
Fetch Decode Execute
PC value in the executing instruction is equal to the pc value of the instruction being fetched - which is 2 instructions ahead (PC+8)
Blink
movr1,#(1<<20)
//TurnonLEDconnectedtoGPIO20ldrr0,SET0strr1,[r0]
//TurnoffLEDconnectedtoGPIO20ldrr0,CLR0strr1,[r0]
2 1 05 4 38 7 69101114 13 1215161720 19 1821222326 25 242728293031
34 33 3237 36 3540 39 3841424346 45 4447484952 51 5053
//ConfigureGPIO20forOUTPUT
loop:
//TurnonLED
//TurnoffLED
bloop
Loops and Condition Codes
//defineconstant.equDELAY,0x3f0000
movr2,#DELAY
loop:
subsr2,r2,#1//ssetcondcode
bneloop//branchifr2!=0
Manipulating Bit Fields
//SetGPIO20toOUTPUTmovr1,#1strr1,[r0]
//SetGPIO21toOUTPUTmovr1,#(1<<3)strr1,[r0]
//WhatvalueisinFSEL2now?//WhatmodeisGPIO20settonow?
2 1 05 4 38 7 69101114 13 1215161720 19 1821222326 25 242728293031
GPIO2 0GPIO2 1GPIO2 3 GPIO 24GPIO2 5GPIO 26 GPIO 27GPIO 28GPIO 29 GPIO 22
//SetGPIO20toOUTPUTmovr1,#1strr1,[r0]…
//PreserveGPIO20,setGPIO21toOUTPUTldrr1,[r0]andr1,#~(0x7<<3)orrr1,#(0x1<<3)strr1,[r0]
//WhatvalueisinFSEL2now?
2 1 05 4 38 7 69101114 13 1215161720 19 1821222326 25 242728293031
GPIO2 0GPIO2 1GPIO2 3 GPIO 24GPIO2 5GPIO 26 GPIO 27GPIO 28GPIO 29 GPIO 22
//LDRFSEL2,GPIO20isOUTPUTldrr1,[r0]00000000000000000000000000000001//0x700000000000000000000000000000111//0x7<<300000000000000000000000000111000//~(0x7<<3)11111111111111111111111111000111andr1,#~(0x7<<3)00000000000000000000000000000001orrr1,#(0x1<<3)00000000000000000000000000001001
Orthogonal InstructionsAny operation
Register vs. immediate operands
All registers the same**
Predicated/conditional execution
Set or not set condition code
Orthogonality leads to composability
SummaryYou need to understand how processors represent and execute instructions
Instruction set architecture often easier to understand by looking at the bits. Encoding instructions in 32-bits requires trade-offs, careful design
Only write assembly when it is needed. Reading assembly more important than writing assembly Allows you to see what the compiler and processor are actually doing
Normally write code in C (Julie, starting Fri)
The Fun Begins …Lab1
■ Install tool chain before lab
■Read lab1 instructions (now online)
■Assemble Raspberry Pi Kit
■Bring USB-C to USB-A adapter (if you need it)
Assignment 1
■ Larson scanner
■YEAH office hours Thu 3-4pm in B21
Definitive ReferencesBCM2865 peripherals document + errata
Raspberry Pi schematic
ARMv6 architecture reference manual
see Resources on cs107e.github.io
Extra Material on Branches
Branch Instructions
Condition CodesZ - Result is 0
N - Result is <0
C - Carry generated
V - Arithmetic overflow
Carry and overflow will be covered later
Condition CodesZ - the result is 0
N - the result is negative
C - a carry was generated
V - an arithmetic overflow occurred
#branchcondaddrcccc101Loooooooooooooooooooooooo
b=bal=branchalwayscondaddr1110101Loooooooooooooooooooooooo
bnecondaddr0001101Loooooooooooooooooooooooo