arm assembly tutorial
TRANSCRIPT
-
7/29/2019 ARM ASSEMBLY TUTORIAL
1/85
EE382N-4 Embedded Systems Architecture
TheARMInstructionSetArchitecture
MarkMcDermott
WithhelpfromourgoodfriendsatARM
Fall2008
8/22/2008
-
7/29/2019 ARM ASSEMBLY TUTORIAL
2/85
EE382N-4 Embedded Systems Architecture
MainfeaturesoftheARMInstructionSet
Allinstructions
are
32
bits
long.
Mostinstructionsexecuteinasinglecycle.
Mostinstructionscanbeconditionallyexecuted.
Aload/storearchitectureDataprocessinginstructionsactonlyonregisters
Threeoperandformat
Combined
ALU
and
shifter
for
high
speed
bit
manipulationSpecificmemoryaccessinstructionswithpowerfulautoindexingaddressingmodes. 32bitand8bitdatatypes
andalso16bitdatatypesonARMArchitecturev4.
Flexiblemultiple
register
load
and
store
instructions
Instructionsetextensionviacoprocessors
Verydense16bitcompressedinstructionset(Thumb)
28/22/2008
-
7/29/2019 ARM ASSEMBLY TUTORIAL
3/85
EE382N-4 Embedded Systems Architecture
Coprocessors
3
Upto16 coprocessorscanbedefined
ExpandstheARMinstructionset
Eachcoprocessorcanhaveupto16privateregistersofanyreasonablesize
Loadstore
architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
4/85
EE382N-4 Embedded Systems Architecture
Thumb
Thumbis
a16
bit
instruction
set
OptimizedforcodedensityfromCcodeImprovedperformanceformnarrowmemorySubsetofthefunctionalityoftheARMinstructionset
Corehastwoexecutionstates ARMandThumbSwitchbetweenthemusingBXinstruction
Thumbhascharacteristicfeatures:Most
Thumb
instruction
are
executed
unconditionally
ManyThumbdataprocessinstructionusea2addressformatThumbinstructionformatsarelessregularthanARMinstructionformats,as
aresultofthedenseencoding.
4
-
7/29/2019 ARM ASSEMBLY TUTORIAL
5/85
EE382N-4 Embedded Systems Architecture
ProcessorModes
TheARM
has
six
operating
modes:
User(unprivilegedmodeunderwhichmosttasksrun)
FIQ(enteredwhenahighpriority(fast)interruptisraised)IRQ(enteredwhenalowpriority(normal)interruptisraised)Supervisor(enteredonresetandwhenaSoftwareInterruptinstructionis
executed)
Abort(usedtohandlememoryaccessviolations)Undef(usedtohandleundefinedinstructions)
ARMArchitectureVersion4addsaseventhmode:System(privilegedmodeusingthesameregistersasusermode)
58/22/2008
-
7/29/2019 ARM ASSEMBLY TUTORIAL
6/85
EE382N-4 Embedded Systems Architecture
TheRegisters
ARMhas
37
registers
in
total,
all
of
which
are
32
bits
long.
1dedicatedprogramcounter1dedicatedcurrentprogramstatusregister5dedicatedsavedprogramstatusregisters30generalpurposeregisters
Howeverthesearearrangedintoseveralbanks,withthe
accessiblebankbeinggovernedbytheprocessormode.Each
modecan
access
aparticularsetofr0r12registersaparticularr13(thestackpointer)andr14(linkregister)r15(theprogramcounter)cpsr
(the
current
program
status
register)
Andprivilegedmodescanalsoaccessaparticularspsr(savedprogramstatusregister)
68/22/2008
-
7/29/2019 ARM ASSEMBLY TUTORIAL
7/85
-
7/29/2019 ARM ASSEMBLY TUTORIAL
8/85
EE382N-4 Embedded Systems Architecture
RegisterOrganizationSummary
88/22/2008
Usermode
r0-r7,r15,andcpsr
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
spsr
FIQ
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)r15 (pc)
cpsr
r0
r1
r2
r3
r4
r5
r6
r7
User
r13 (sp)
r14 (lr)
spsr
IRQ
Usermoder0-r12,
r15,andcpsr
r13 (sp)
r14 (lr)
spsr
Undef
Usermoder0-r12,
r15,andcpsr
r13 (sp)
r14 (lr)
spsr
SVC
Usermoder0-r12,
r15,andcpsr
r13 (sp)
r14 (lr)
spsr
Abort
Usermoder0-r12,
r15,andcpsr
Thumb stateLow register
Thumb stateHigh register
Note: System mode uses the User mode register set
-
7/29/2019 ARM ASSEMBLY TUTORIAL
9/85
EE382N-4 Embedded Systems Architecture
AccessingRegistersusingARMInstructions
Nobreakdown
of
currently
accessible
registers.
Allinstructionscanaccessr0r14directly.MostinstructionsalsoallowuseofthePC.
SpecificinstructionstoallowaccesstoCPSRandSPSR.
Note:When
in
aprivileged
mode,
it
is
also
possible
to
load
store
the(bankedout)usermoderegisterstoorfrommemory.
98/22/2008
-
7/29/2019 ARM ASSEMBLY TUTORIAL
10/85
EE382N-4 Embedded Systems Architecture
TheProgramStatusRegisters (CPSRandSPSRs)
108/22/2008
CopiesoftheALUstatusflags(latchediftheinstructionhasthe"S"bitset).
N=NegativeresultfromALUflag.Z=ZeroresultfromALUflag.C=ALUoperationCarriedoutV=ALUoperationoVerflowed
* InterruptDisablebits.I =
1,
disables
the
IRQ.
F =1,disablestheFIQ.
* TBit (Architecturev4Tonly)
T=0,
Processor
in
ARM
stateT=1,ProcessorinThumbstate
*
Condition
Code
Flags
ModeN Z C V
2831 8 4 0
I F T
* ModeBitsM[4:0]definetheprocessormode.
-
7/29/2019 ARM ASSEMBLY TUTORIAL
11/85
EE382N-4 Embedded Systems Architecture
LogicalInstruction ArithmeticInstruction
Flag
Negative Nomeaning Bit31oftheresulthasbeenset(N=1) Indicatesanegativenumberin
signedoperations
Zero Resultisallzeroes Resultofoperationwaszero(Z=1)
Carry AfterShiftoperation Resultwasgreaterthan32bits
(C=1) 1
was
left
in
carry
flag
oVerflow Nomeaning Resultwasgreaterthan31bits(V=1) Indicatesapossiblecorruptionof
thesignbitinsignednumbers
ConditionFlags
118/22/2008
-
7/29/2019 ARM ASSEMBLY TUTORIAL
12/85
EE382N-4 Embedded Systems Architecture
TheProgramCounter(R15)
Whenthe
processor
is
executing
in
ARM
state:
Allinstructionsare32bitsinlengthAllinstructionsmustbewordalignedThereforethePCvalueisstoredinbits[31:2]withbits[1:0]equaltozero(as
instructioncannot
be
halfword
or
byte
aligned).
R14isusedasthesubroutinelinkregister(LR)andstoresthe
returnaddresswhenBranchwithLinkoperationsareperformed,
calculatedfrom
the
PC.
Thustoreturnfromalinkedbranch:MOVr15,r14
orMOVpc,lr
128/22/2008
-
7/29/2019 ARM ASSEMBLY TUTORIAL
13/85
EE382N-4 Embedded Systems Architecture
ExceptionHandlingandtheVectorTable
Whenan
exception
occurs,
the
core:
CopiesCPSRintoSPSR_SetsappropriateCPSRbits
IfcoreimplementsARMArchitecture4Tandis
currentlyin
Thumb
state,
then
ARMstateisentered.
Modefieldbits
Interruptdisableflagsifappropriate.
Mapsin
appropriate
banked
registers
StoresthereturnaddressinLR_SetsPCtovectoraddress
Toreturn,exceptionhandlerneedsto:Restore
CPSR
from
SPSR_
RestorePCfromLR_
138/22/2008
-
7/29/2019 ARM ASSEMBLY TUTORIAL
14/85
EE382N-4 Embedded Systems Architecture
TheOriginalInstructionPipeline
TheARM
uses
apipeline
in
order
to
increase
the
speed
of
the
flowofinstructionstotheprocessor.Allowsseveraloperationstobeundertakensimultaneously,ratherthan
serially.
Ratherthanpointingtotheinstructionbeingexecuted,thePC
pointsto
the
instruction
being
fetched.
148/22/2008
FETCH
DECODE
EXECUTE
Instruction fetched from memory
Decoding of registers used in instruction
Register(s) read from Register Bank
Shift and ALU operationWrite register(s) back to Register Bank
PC
PC - 4
PC - 8
-
7/29/2019 ARM ASSEMBLY TUTORIAL
15/85
EE382N-4 Embedded Systems Architecture
PipelinechangesforARM9TDMI
InstructionFetch Shift + ALU MemoryAccess RegWriteRegRead
RegDecode
FETCH DECODE EXECUTE MEMORY WRITE
ARM9TDMI
ARM or Thumb
Inst Decode
Reg Select
RegRead
Shift ALURegWrite
Thumb ARMdecompress
ARM decodeInstructionFetch
FETCH DECODE EXECUTE
ARM7TDMI
-
7/29/2019 ARM ASSEMBLY TUTORIAL
16/85
EE382N-4 Embedded Systems Architecture
Pipelinechangesfor ARM10vs.ARM11Pipelines
ARM11
Fetch1
Fetch2
Decode Issue
Shift ALU Saturate
Writeback
MAC1
MAC2
MAC3
AddressDataCache
1
DataCache
2
Shift + ALUMemoryAccess Reg
Write
FETCH DECODE EXECUTE MEMORY WRITE
Reg Read
Multiply
BranchPrediction
InstructionFetch
ISSUE
ARM orThumb
InstructionDecode Multiply
Add
ARM10
EE N E b dd d S A hi
-
7/29/2019 ARM ASSEMBLY TUTORIAL
17/85
EE382N-4 Embedded Systems Architecture
ARMInstructionSetFormat
178/22/2008
3
1
3
0
2
9
2
8
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0 9 8 7 6 5 4 3 2 1 0InstructionType
Condition 0 0 I OPCODE S Rn Rs OPERAND2 Dataprocessing
Condition 0 0 0 0 0 0 A S Rd Rn Rs 1 0 0 1 Rm Multiply
Condition 0 0 0 0 1 U A S RdHIGH Rd LOW Rs 1 0 0 1 Rm Long
Multiply
Condition 0 0 0 1 0 B 0 0 Rn Rd 0 0 0 0 1 0 0 1 Rm Swap
Condition 0 1 I P U B W L Rn Rd OFFSET Load/Store Byte/Word
Condition 1 0 0 P U B W L Rn REGISTERLIST Load/Store Multiple
Condition 0 0 0 P U 1 W L Rn Rd OFFSET1 1 S H 1 OFFSET2 Halfword TransferImm Off
Condition 0 0 0 P U 0 W L Rn Rd 0 0 0 0 1 S H 1 Rm Halfword TransferReg Off
Condition 1 0 1 L BRANCH OFFSET Branch
Condition 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 Rn Branch Exchange
Condition 1 1 0 P U N W L Rn CRd CPNum OFFSET COPROCESSOR DATAXFER
Condition 1 1 1 0 Op1 CRn CRd CPNum OP2 0 CRm COPROCESSOR DATAOP
Condition OP1 L CRn Rd CPNum OP2 1 CRm COPROCESSOR REGXFER
Condition 1 1 1 1 SWI NUMBER SoftwareInterrupt
EE382N 4 E b dd d S A hi
-
7/29/2019 ARM ASSEMBLY TUTORIAL
18/85
EE382N-4 Embedded Systems Architecture
ConditionalExecution
Mostinstruction
sets
only
allow
branches
to
be
executed
conditionally.
Howeverbyreusingtheconditionevaluationhardware, ARM
effectivelyincreases
number
of
instructions.
AllinstructionscontainaconditionfieldwhichdetermineswhethertheCPUwillexecutethem.
Nonexecutedinstructionsconsume1cycle. Cant
collapse
the
instruction
like
aNOP.
Still
have
to
complete
cycle
so
as
to
allow
fetchinganddecodingofthefollowinginstructions.
Thisremovestheneedformanybranches,whichstallthe
pipeline(3
cycles
to
refill).Allowsverydenseinlinecode,withoutbranches.
TheTimepenaltyofnotexecutingseveralconditionalinstructionsisfrequentlylessthanoverheadofthebranch
orsubroutine
call
that
would
otherwise
be
needed.
188/22/2008
EE382N 4 E b dd d S t A hit t
-
7/29/2019 ARM ASSEMBLY TUTORIAL
19/85
EE382N-4 Embedded Systems Architecture
TheConditionField
198/22/2008
1001 = LS - C c lear o r Z ( se t uns igned l ow e ro r same)
1010 = GE - N se t and V set , o r N c l ea r and Vcl ea r ( > o r = )
1011 = LT - N se t and V cl ear , o r N cl ear andV se t ( > )
1100 = GT - Z c lear , and e i t he r N se t and Vset , o r N c lea r and V se t ( > )
1 1 0 1 = LE - Z se t , o r N se t a n d V cl ea r ,o r Nclear and V set ( < , o r = )
1 11 0 = AL - a lw ay s
1 1 1 1 = N V - r ese rv ed .
0 0 0 0 = EQ - Z se t ( e q u al )
0 0 0 1 = N E - Z cl ea r ( n o t e q u al )
0 0 1 0 = H S / CS - C s et ( u n s ig n ed h i gh e r orsame)
0 0 1 1 = LO / CC - C cl ea r ( u n s i g n ed l o w e r )
0 1 0 0 = M I - N se t ( n e g a t i v e)
0 1 0 1 = PL - N cl ea r ( p o s it i v e o r ze r o )
0 1 1 0 = V S - V set ( o v er f lo w )
0 1 1 1 = VC - V cl ea r ( n o o v er f l o w )
1 0 0 0 = H I - C se t a n d Z cl ea r ( u n s i g n ed
h ighe r )
3
1
3
0
2
9
2
8
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0 9 8 7 6 5 4 3 2 1 0InstructionType
Condition 0 0 I OPCODE S Rn Rs OPERAND2 Dataprocessing
EE382N 4 E b dd d S t A hit t
-
7/29/2019 ARM ASSEMBLY TUTORIAL
20/85
EE382N-4 Embedded Systems Architecture
UsingandupdatingtheConditionField
Toexecute
an
instruction
conditionally,
simply
postfix
it
with
the
appropriate
condition: Forexampleanaddinstructiontakestheform:
ADDr0,r1,r2 ;r0=r1+r2(ADDAL)
Toexecute
this
only
if
the
zero
flag
is
set:
ADDEQr0,r1,r2 ;Ifzeroflagsetthen
;...r0=r1+r2
Bydefault,
data
processing
operations
do
not
affect
the
condition
flags
(apart
fromthecomparisonswherethisistheonlyeffect).Tocausethecondition
flagstobeupdated,theSbitoftheinstructionneedstobesetbypostfixing
theinstruction(andanyconditioncode)withanS.
Forexample
to
add
two
numbers
and
set
the
condition
flags:
ADDSr0,r1,r2 ;r0=r1+r2 ;...
andsetflags
208/22/2008
EE382N 4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
21/85
EE382N-4 Embedded Systems Architecture
ConditionalExecutionandFlags
ARMinstructions
can
be
made
to
execute
conditionally
by
postfixing
them
with
the
appropriateconditioncodefield.
Thisimprovescodedensityandperformancebyreducingthenumberofforwardbranchinstructions.
CMP r3,#0 CMP r3,#0BEQ skip ADDNE r0,r1,r2
ADD r0,r1,r2
skip
Bydefault,dataprocessinginstructionsdonotaffecttheconditioncodeflagsbuttheflagscanbeoptionallysetbyusingS. CMPdoesnotneedS.
loop
SUBS r1,r1,#1BNE loop
218/22/2008
if Z flag clear then branch
decrement r1 and set flags
EE382N 4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
22/85
EE382N-4 Embedded Systems Architecture
Branchinstructions(1)
228/22/2008
Branch: B{}label
BranchwithLink: BL{}sub_routine_label
Theoffsetforbranchinstructionsiscalculatedbytheassembler: Bytakingthedifferencebetweenthebranchinstructionandthetargetaddress
minus8(toallowforthepipeline).
Thisgivesa26bitoffsetwhichisrightshifted2bits(asthebottomtwobitsarealwayszeroasinstructionsareword aligned)andstoredintotheinstructionencoding.
Thisgivesarangeof 32Mbytes.
Conditionfield
Linkbit 0=Branch1=Branchwithlink
3
1
3
0
2
9
2
8
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0 9 8 7 6 5 4 3 2 1 0
Condition 1 0 1 L BRANCH OFFSET
EE382N 4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
23/85
EE382N-4 Embedded Systems Architecture
Branchinstructions(2)
Whenexecuting
the
instruction,
the
processor:
shiftstheoffsetlefttwobits,signextendsitto32bits,andaddsittoPC.
ExecutionthencontinuesfromthenewPC,oncethepipelinehas
beenrefilled.
The"Branchwithlink"instructionimplementsasubroutinecall
bywritingPC4intotheLRofthecurrentbank.
i.e.
the
address
of
the
next
instruction
following
the
branch
with
link
(allowingforthepipeline).
Toreturnfromsubroutine,simplyneedtorestorethePCfrom
theLR:MOV
pc,
lr
Again,pipelinehastorefillbeforeexecutioncontinues.
238/22/2008
EE382N 4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
24/85
EE382N-4 Embedded Systems Architecture
Branchinstructions(3)
The"Branch"
instruction
does
not
affect
LR.
Note:Architecture4ToffersafurtherARMbranchinstruction,BXSeeThumbInstructionSetModulefordetails.
BL
StoresreturnaddressinLRReturningimplementedbyrestoringthePCfromLRFornonleaffunctions,LRwillhavetobestacked
248/22/2008
STMFDsp!,{regs,lr}:BLfunc2
:LDMFDsp!,{regs,pc}
func1 func2
::BLfunc1
::
:::
::MOVpc,lr
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
25/85
EE382N-4 Embedded Systems Architecture
ConditionalBranches
258/22/2008
Branch Interpretation NormalusesB
BAL
Unconditional
Always
Alwaystakethisbranch
Alwaystakethisbranch
BEQ Equal Comparisonequalorzeroresult
BNE Notequal Comparisonnotequalornonzeroresult
BPL
Plus
Result
positive
or
zero
BMI Minus Resultminusornegative
BCC
BLO
Carryclear
Lower
Arithmeticoperationdidnotgivecarryout
Unsignedcomparisongavelower
BCS
BHS
Carryset
Higher
or
same
Arithmeticoperationgavecarryout
Unsigned
comparison
gave
higher
or
same
BVC Overflowclear Signedintegeroperation;nooverflowoccurred
BVS Overflowset Signedintegeroperation;overflowoccurred
BGT Greaterthan Signedintegercomparisongavegreaterthan
BGE Greaterorequal Signedintegercomparisongavegreaterorequal
BLT
Less
than
Signed
integer
comparison
gave
less
than
BLE Lessorequal Signedintegercomparisongavelessthanorequal
BHI Higher Unsignedcomparisongavehigher
BLS Lowerorsame Unsignedcomparisongavelowerorsame
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
26/85
EE382N-4 Embedded Systems Architecture
DataprocessingInstructions
Largestfamily
of
ARM
instructions,
all
sharing
the
same
instructionformat.
Contains:
Arithmeticoperations
Comparisons(noresults justsetconditioncodes)LogicaloperationsDatamovementbetweenregisters
Remember,this
is
aload
/store
architecture
Theseinstructiononlyworkonregisters, NOT memory.
Theyeachperformaspecificoperationononeortwooperands.
Firstoperand
always
aregister
Rn
SecondoperandsenttotheALUviabarrelshifter.
Wewillexaminethebarrelshiftershortly.
268/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
27/85
EE382N-4 Embedded Systems Architecture
ArithmeticOperations
Operationsare:
ADD operand1+operand2 ;AddADC operand1+operand2+carry ;AddwithcarrySUB operand1 operand2 ;SubtractSBC operand1
operand2
+carry
1
;Subtract
with
carry
RSB operand2 operand1 ;ReversesubtractRSC operand2 operand1+carry 1 ;Reversesubtractwithcarry
Syntax:{}{S}Rd,Rn,Operand2
ExamplesADDr0,r1,r2SUBGT
r3,
r3,
#1
RSBLESr4,r5,#5
278/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
28/85
EE382N 4 Embedded Systems Architecture
Comparisons
Theonly
effect
of
the
comparisons
is
to
update
the
condition
flags.ThusnoneedtosetSbit.
Operationsare:
CMP operand1
operand2 ;Compare
CMN operand1+operand2 ;ComparenegativeTST operand1ANDoperand2 ;TestTEQ operand1EORoperand2 ;Testequivalence
Syntax:{}Rn,Operand2
Examples:
CMP r0,r1
TSTEQ r2,#5
288/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
29/85
EE382N 4 Embedded Systems Architecture
LogicalOperations
Operationsare:
AND operand1ANDoperand2
EOR operand1EORoperand2
ORR operand1ORoperand2
ORN
operand1NOR
operand2
BIC operand1ANDNOToperand2[iebitclear]
Syntax:
{}{S}Rd,
Rn,
Operand2
Examples:AND r0,r1,r2
BICEQ r2,r3,#7
EORS r1,r3,r0
298/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
30/85
EE382N 4 Embedded Systems Architecture
DataMovement
Operationsare:
MOV operand2
MVN NOToperand2
Notethat
these
make
no
use
of
operand1.
Syntax:{}{S}Rd,Operand2
Examples:MOV r0,r1MOVS r2,#10
MVNEQ r1,#0
308/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
31/85
EE382N 4 Embedded Systems Architecture
TheBarrelShifter
TheARM
doesnt
have
actual
shift
instructions.
Insteadithasabarrelshifterwhichprovidesamechanismto
carryout
shifts
as
part
of
other
instructions.
Sowhatoperationsdoesthebarrelshiftersupport?
318/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
32/85
38 bedded Syste s c tectu e
BarrelShifter LeftShift
Shiftsleft
by
the
specified
amount
(multiplies
by
powers
of
two)
e.g.LSL#5 =>multiplyby32
328/22/2008
LogicalShift
Left
(LSL)
DestinationCF 0
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
33/85
y
BarrelShifter RightShifts
338/22/2008
LogicalShiftRight(LSR)
Shiftsrightbythespecified
amount(dividesbypowersof
two)e.g.
LSR#5=divideby32
ArithmeticShiftRight(ASR)
Shiftsright(dividesbypowersof
two)andpreservesthesignbit,
for2'scomplementoperations.e.g.
ASR#5=divideby32
Destination CF
Destination CF
LogicalShiftRight
ArithmeticShiftRight
...0
Signbitshiftedin
zeroshiftedin
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
34/85
y
BarrelShifter Rotations
348/22/2008
RotateRight(ROR)
SimilartoanASRbutthebitswraparoundastheyleavethe
LSB
and
appear
as
the
MSB.e.g. ROR#5
NotethelastbitrotatedisalsousedastheCarryOut.
RotateRightExtended(RRX)
ThisoperationusestheCPSRCflagasa33rdbit.
Rotatesrightby1bit.Encodedas ROR#0
Destination CF
RotateRight
Destination CF
RotateRightthroughCarry
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
35/85
y
UsingtheBarrelShifter:TheSecondOperand
358/22/2008
Register,optionallywithshift
operationapplied.
Shiftvaluecanbeeitherbe:
5bitunsignedinteger
Specifiedinbottombyteofanotherregister.
* Immediatevalue
8bitnumber
Canberotatedrightthroughanevennumber
ofpositions.
Assemblerwillcalculaterotateforyoufromconstant.
Operand1
Result
ALU
BarrelShifter
Operand2
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
36/85
y
SecondOperand:ShiftedRegister
Theamount
by
which
the
register
is
to
be
shifted
is
contained
in
either:theimmediate5bitfieldintheinstruction
NOOVERHEAD
Shiftis
done
for
free
executes
in
single
cycle.
thebottombyteofaregister(notPC) Thentakesextracycletoexecute
ARMdoesnthaveenoughreadportstoread3registersatonce.
Thensame
as
on
other
processors
where
shift
is
separateinstruction.
Ifnoshiftisspecifiedthenadefaultshiftisapplied:LSL#0i.e.barrelshifterhasnoeffectonvalueinregister.
368/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
37/85
SecondOperand:UsingaShiftedRegister
Usingamultiplication
instruction
to
multiply
by
aconstant
means
first
loading
theconstantintoaregisterandthenwaitinganumberofinternalcyclesfor
theinstructiontocomplete.
Amoreoptimumsolutioncanoftenbefoundbyusingsomecombinationof
MOVs,ADDs,
SUBs
and
RSBs
with
shifts.
Multiplicationsbyaconstantequaltoa((powerof2) 1)canbedoneinonecycle.
MOVR2,R0,LSL#2 ;ShiftR0leftby2,writetoR2,(R2=R0x4)
ADDR9,
R5,
R5,
LSL
#3
;R9
=R5
+R5
x8or
R9
=R5
x9
RSBR9,R5,R5,LSL#3 ;R9=R5x8 R5orR9=R5x7
SUBR10,R9,R8,LSR#4;R10=R9 R8/16
MOVR12,R4,RORR3 ;R12=R4rotatedrightbyvalueofR3
378/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
38/85
SecondOperand:ImmediateValue(1)
Thereis
no
single
instruction
which
will
load
a32
bit
immediate
constant
into
aregisterwithoutperformingadataloadfrommemory. AllARMinstructionsare32bitslong ARMinstructionsdonotusetheinstructionstreamasdata.
Thedata
processing
instruction
format
has
12
bits
available
for
operand2
Ifuseddirectlythiswouldonlygivearangeof4096.
Insteaditisusedtostore8bitconstants,givingarangeof0 255.
These
8
bits
can
then
be
rotated
right
through
an
even
number
of
positions
(ie
RORsby0,2,4,..30). Thisgivesamuchlargerrangeofconstantsthatcanbedirectlyloaded,thoughsome
constantswillstillneedtobeloadedfrommemory.
388/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
39/85
SecondOperand: ImmediateValue(2)
Thisgives
us:
0 255 [0 0xff] 256,260,264,..,1020 [0x1000x3fc,step4,0x400xffror 30] 1024,1040,1056,..,4080 [0x4000xff0,step16,0x400xffror 28]
4096,4160,
4224,..,16320 [0x1000
0x3fc0,
step
64,
0x40
0xff
ror 26] Thesecanbeloadedusing,forexample:
MOVr0,#0x40,26 ;=>MOVr0,#0x1000 (ie4096)
Tomakethiseasier,theassemblerwillconverttothisformforusifsimply
giventhe
required
constant:
MOVr0,#4096 ;=>MOVr0,#0x1000 (ie0x40ror 26)
ThebitwisecomplementscanalsobeformedusingMVN: MOVr0,#0xFFFFFFFF ;assemblestoMVNr0,#0
Iftherequiredconstantcannotbegenerated,anerrorwillbereported.
398/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
40/85
Loadingfull32bitconstants
Althoughthe
MOV/MVN
mechanism
will
load
alarge
range
of
constants
into
a
register,sometimesthismechanismwillnotgeneratetherequiredconstant.
Therefore,theassembleralsoprovidesamethodwhichwillloadANY32bit
constant: LDR
rd,=numeric
constant
IftheconstantcanbeconstructedusingeitheraMOVorMVNthenthiswillbe
theinstructionactuallygenerated.
Otherwise,the
assembler
will
produce
an
LDR
instruction
with
aPC
relative
addresstoreadtheconstantfromaliteralpool.LDRr0,=0x42 ;generates MOVr0,#0x42
LDRr0,=0x55555555 ;generate LDRr0,[pc,offsettolitpool]
:
:
DCD 0x55555555
Asthismechanismwillalwaysgeneratethebestinstructionforagivencase,it
is
the
recommended
way
of
loading
constants.
408/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
41/85
MultiplicationInstructions
TheBasic
ARM
provides
two
multiplication
instructions.
MultiplyMUL{}{S}Rd,Rm,Rs ;Rd=Rm*Rs
MultiplyAccumulate
does
addition
for
free
MLA{}{S}Rd,Rm,Rs,Rn ;Rd=(Rm*Rs)+Rn
Restrictionsonuse:
Rd
and
Rm
cannot
be
the
same
register CanbeavoidedbyswappingRmandRsaround.Thisworksbecausemultiplicationiscommutative.
CannotusePC.
Thesewill
be
picked
up
by
the
assembler
if
overlooked.
OperandscanbeconsideredsignedorunsignedUptousertointerpretcorrectly.
418/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
42/85
MultiplicationImplementation
TheARM
makes
use
of
Booths
Algorithm
to
perform
integer
multiplication.
OnnonMARMsthisoperateson2bitsofRsatatime.
Foreach
pair
of
bits
this
takes
1cycle
(plus
1cycle
to
start
with).
Howeverwhentherearenomore1sleftinRs,themultiplicationwillearlyterminate.
Example:Multiply18and 1:Rd=Rm*Rs
Note:Compilerdoesnotuseearlyterminationcriteriato
decideon
which
order
to
place
operands.
428/22/2008
0000 00100001000000000000 0000 0000
1111 1
1111
1111
1111
1111
111 1
111 1
111
Rm
Rs
17cycles
Rs
Rm
4cycles
18
1
18
1
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
43/85
ExtendedMultiplyInstructions
Mvariants
of
ARM
cores
contain
extended
multiplication
hardware.Thisprovidesthreeenhancements:An8bitBoothsAlgorithmisused
Multiplicationiscarriedoutfaster(maximumforstandardinstructionsisnow5
cycles).
Earlyterminationmethodimprovedsothatnowcompletesmultiplicationwhenallremainingbitsetscontain allzeroes(aswithnonMARMs),or
allones.
Thusthepreviousexamplewouldearlyterminatein2cyclesinbothcases.
64bitresultscannowbeproducedfromtwo32bitoperands
Higheraccuracy.
Pairofregistersusedtostoreresult.
438/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
44/85
MultiplyLong&MultiplyAccumulateLong
Instructionsare
MULLwhichgivesRdHi,RdLo:=Rm*RsMLALwhichgivesRdHi,RdLo:=(Rm*Rs)+RdHi,RdLo
However
the
full
64
bit
of
the
result
now
matter
(lower
precision
multiplyinstructionssimplythrowstop32bitsaway)Needtospecifywhetheroperandsaresignedorunsigned
Thereforesyntaxofnewinstructionsare:UMULL{}{S}
RdLo,RdHi,Rm,Rs
UMLAL{}{S}RdLo,RdHi,Rm,RsSMULL{}{S}RdLo,RdHi,Rm,RsSMLAL{}{S}RdLo,RdHi,Rm,Rs
Notgeneratedbythecompiler.
Warning:UnpredictableonnonMARMs.
448/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
45/85
Load/StoreInstructions
TheARM
is
aLoad
/Store
Architecture:
Doesnotsupportmemorytomemorydataprocessingoperations.Mustmovedatavaluesintoregistersbeforeusingthem.
This
might
sound
inefficient,
but
in
practice
it
isnt:Loaddatavaluesfrommemoryintoregisters.Processdatainregistersusinganumberofdataprocessinginstructions
whicharenotsloweddownbymemoryaccess.
Storeresults
from
registers
out
to
memory.
TheARMhasthreesetsofinstructionswhichinteractwithmain
memory.Theseare:Singleregisterdatatransfer(LDR/STR).Blockdatatransfer(LDM/STM).SingleDataSwap(SWP).
458/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
46/85
Singleregisterdatatransfer
Thebasic
load
and
store
instructions
are:
LoadandStoreWordorByte LDR/STR/LDRB/STRB
ARM
Architecture
Version
4
also
adds
support
for
Halfwords
and
signeddata.LoadandStoreHalfword
LDRH/STRH
LoadSigned
Byte
or
Halfword
load
value
and
sign
extend
it
to
32
bits. LDRSB/LDRSH
Alloftheseinstructionscanbeconditionallyexecutedby
insertingtheappropriateconditioncodeafterSTR/LDR.e.g.
LDREQB
Syntax:{}{}Rd,
468/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
47/85
LoadandStoreWordorByte: BaseRegister
Thememory
location
to
be
accessed
is
held
in
abase
register
STRr0,[r1] ;Storecontentsofr0tolocationpointedto
;
by
contents
of
r1.LDRr2,[r1] ;Loadr2withcontentsofmemorylocation
;pointedtobycontentsofr1.
478/22/2008
r1
0x200Base
Register
Memory
0x50x200
r0
0x5Source
RegisterforSTR
r2
0x5Destination
RegisterforLDR
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
48/85
Load/StoreWordorByte:OffsetsfromtheBaseRegister
Aswell
as
accessing
the
actual
location
contained
in
the
base
register,theseinstructionscanaccessalocationoffsetfromthe
baseregisterpointer.
Thisoffset
can
be
Anunsigned12bitimmediatevalue(ie0 4095bytes).Aregister,optionallyshiftedbyanimmediatevalue
Thiscanbeeitheraddedorsubtractedfromthebaseregister:Prefixtheoffsetvalueorregisterwith+(default)or.
Thisoffsetcanbeapplied:beforethetransferismade:Preindexedaddressing
optionallyauto
incrementing
the
base
register,
by
postfixing
the
instruction
with
an!.
afterthetransferismade:Postindexedaddressing causingthebaseregistertobeautoincremented.
488/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
49/85
Load/StoreWordorByte:PreindexedAddressing
Example:STR
r0,
[r1,#12]
Tostoretolocation0x1f4insteaduse:STRr0,[r1,#12]Toautoincrementbasepointerto0x20cuse:STRr0,[r1,#12]!Ifr2contains3,access0x20cbymultiplyingthisby4:
STRr0,[r1,r2,LSL#2]
498/22/2008
r1
0x200
Base
Register
Memory
0x5
0x200
r0
0x5Source
RegisterforSTR
Offset
12 0x20c
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
50/85
LoadandStoreWordorByte:PostindexedAddressing
Example:STR
r0,
[r1],
#12
Toautoincrementthebaseregistertolocation0x1f4insteaduse: STRr0,[r1],#12
Ifr2contains3,autoincrementbaseregisterto0x20cbymultiplyingthisby4: STRr0,[r1],r2,LSL#2
508/22/2008
r1
0x200Original
BaseRegister
Memory
0x50x200
r0
0x5Source
Registerfor STR
Offset12 0x20c
r10x20c
UpdatedBase
Register
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
51/85
LoadandStoreswithUserModePrivilege
Whenusing
post
indexed
addressing,
there
is
afurther
form
of
Load/StoreWord/Byte:{}{B}TRd,
Whenusedinaprivilegedmode,thisdoestheload/storewith
usermodeprivilege.
Normallyusedbyanexceptionhandlerthatisemulatingamemoryaccess
instructionthatwouldnormallyexecuteinusermode.
518/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
52/85
ExampleUsageofAddressingModes
Imaginean
array,
the
first
element
of
which
is
pointed
to
by
the
contents
of
r0.
Ifwewanttoaccessaparticularelement,
thenwecanusepreindexedaddressing: r1iselementwewant.
LDRr2,
[r0,
r1,
LSL
#2]
Ifwewanttostepthroughevery
element
of
the
array,
for
instancetoproducesumofelementsinthe
array,thenwecanusepostindexedaddressingwithinaloop: r1isaddressofcurrentelement(initiallyequaltor0). LDRr2,[r1],#4
Useafurtherregistertostoretheaddressoffinalelement,
sothattheloopcanbecorrectlyterminated.
528/22/2008
0
1
2
3
element
0
4
8
12
MemoryOffset
r0
Pointer tostart of array
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
53/85
OffsetsforHalfwordandSignedHalfword/ByteAccess
TheLoad
and
Store
Halfword
and
Load
Signed
Byte
or
Halfword
instructionscanmakeuseofpre andpostindexedaddressingin
muchthesamewayasthebasicloadandstoreinstructions.
Howeverthe
actual
offset
formats
are
more
constrained:
Theimmediatevalueislimitedto8bits(ratherthan12bits)givinganoffsetof0255bytes.
Theregisterformcannothaveashiftappliedtoit.
538/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
54/85
Effectofendianess
TheARM
can
be
set
up
to
access
its
data
in
either
little
or
big
endianformat.
Littleendian:
Leastsignificant
byte
of
aword
is
stored
in
bits
07of
an
addressed
word.
Bigendian:Leastsignificantbyteofawordisstoredinbits2431ofanaddressedword.
Thishas
no
real
relevance
unless
data
is
stored
as
words
and
then
accessedinsmallersizedquantities(halfwords orbytes).Whichbyte/halfwordisaccessedwilldependontheendianess ofthe
systeminvolved.
548/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
55/85
YAEndianess Example
558/22/2008
Big-endianLittle-endian
r1 = 0x100
r0 = 0x1122334431 24 23 16 15 8 7 0
11 22 33 44
31 24 23 16 15 8 7 0
11 22 33 44
31 24 23 16 15 8 7 0
44 33 22 11
31 24 23 16 15 8 7 0
00 00 00 44
31 24 23 16 15 8 7 0
00 00 00 11
r2 = 0x44 r2 = 0x11
STR r0, [r1]
LDRB r2, [r1]
r1 = 0x100Memory
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
56/85
BlockDataTransfer(1)
TheLoad
and
Store
Multiple
instructions
(LDM
/STM)
allow
betweeen1and16registerstobetransferredtoorfrom
memory.
Thetransferred
registers
can
be
either:
Anysubsetofthecurrentbankofregisters(default).Anysubsetoftheusermodebankofregisterswheninapriviledgedmode
(postfixinstructionwitha^).
568/22/2008
Cond 1 0 0 P U S W L Rn Register list
Condition field Base registerLoad/Store bit0 = Store to memory1 = Load from memory
Write- back bit0 = no write-back1 = write address into base
PSR and force user bit0 = dont load PSR or force user mode1 = load PSR or force user mode
Up/Down bit0 = Down; subtract offset from base1 = Up ; add offset to base
Pre/Post indexing bit0 = Post; add offset after transfer,1 = Pre ; add offset before transfer
2831 22 16 023 21 1527 20 1924
Each bit corresponds to a particularregister. For example: Bit 0 set causes r0 to be transferred. Bit 0 unset causes r0 not to be transferred.
At least one register must betransferred as the list cannot be empty.
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
57/85
BlockDataTransfer(2)
Baseregister
used
to
determine
where
memory
access
should
occur.4differentaddressingmodesallowincrementanddecrementinclusiveor
exclusiveofthebaseregisterlocation.
Baseregister
can
be
optionally
updated
following
the
transfer
(by
appending
itwithan!.
Lowestregisternumberisalwaystransferredto/fromlowestmemorylocationaccessed.
TheseinstructionsareveryefficientforSavingandrestoringcontext
Forthisusefultoviewmemoryasastack.
Movinglarge
blocks
of
data
around
memory
Forthisusefultodirectlyrepresentfunctionalityoftheinstructions.
578/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
58/85
Stacks
Astack
is
an
area
of
memory
which
grows
as
new
data
is
pushedontothetopofit,andshrinksasdataispoppedoff
thetop.
Twopointers
define
the
current
limits
of
the
stack.
Abasepointer usedtopointtothebottomofthestack(thefirstlocation).
Astackpointer
usedto
point
the
current
top
of
the
stack.
588/22/2008
SPBASE
PUSH{1,2,3}
1
23
BASE
SP
POP
1
2Result ofpop = 3
BASE
SP
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
59/85
StackOperation
Traditionally,astack
grows
down
in
memory,
with
the
last
pushed
value
at
thelowestaddress.TheARMalsosupportsascendingstacks,wherethestack
structuregrowsupthroughmemory.
Thevalueofthestackpointercaneither:
Pointto
the
last
occupied
address
(Full
stack)
andsoneedspredecrementing(iebeforethepush)
Pointtothenextoccupiedaddress(Emptystack) andsoneedspostdecrementing(ieafterthepush)
Thestack
type
to
be
used
is
given
by
the
postfix
to
the
instruction:
STMFD/LDMFD:FullDescendingstack STMFA/LDMFA:FullAscendingstack. STMED/LDMED:EmptyDescendingstack
STMEA
/
LDMEA
:
Empty
Ascending
stack Note:ARMCompilerwillalwaysuseaFulldescendingstack.
598/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
60/85
StackExamples
608/22/2008
STMFD sp!,{r0,r1,r3-r5}
r5
r4
r3
r1r0SP
Old SP
STMED sp!,{r0,r1,r3-r5}
r5r4
r3
r1
r0SP
Old SP
r5
r4
r3
r1
r0
STMFA sp!,{r0,r1,r3-r5}
SP
Old SP 0x400
0x418
0x3e8
STMEA sp!,{r0,r1,r3-r5}
r5
r4
r3
r1
r0
SP
Old SP
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
61/85
StacksandSubroutines
Oneuse
of
stacks
is
to
create
temporary
register
workspace
for
subroutines.
Anyregistersthatareneededcanbepushedontothestackatthestartofthe
subroutineandpoppedoffagainattheendsoastorestorethembefore
returntothecaller:
STMFD sp!,{r0-r12, lr} ; stack all registers
........ ; and the return address
........
LDMFD sp!,{r0-r12, pc} ; load all the registers
; and return automatically
SeethechapterontheARMProcedureCallStandardintheSDTReference
Manualforfurtherdetailsofregisterusagewithinsubroutines.
IfthepopinstructionalsohadtheSbitset(using^)thenthetransferofthe
PCwhen
in
aprivileged
mode
would
also
cause
the
SPSR
to
be
copied
into
the
CPSR(seeexceptionhandlingmodule).
618/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
62/85
DirectfunctionalityofBlockDataTransfer
WhenLDM
/STM
are
not
being
used
to
implement
stacks,
it
is
clearertospecifyexactlywhatfunctionalityoftheinstructionis:i.e.specifywhethertoincrement/decrementthebasepointer,beforeor
afterthememoryaccess.
Inordertodothis,LDM/STMsupportafurthersyntaxin
additiontothestackone:STMIA/LDMIA:IncrementAfter
STMIB/LDMIB
:Increment
Before
STMDA/LDMDA:DecrementAfterSTMDB/LDMDB:DecrementBefore
628/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
63/85
Example:BlockCopy
Copyablock
of
memory,
which
is
an
exact
multiple
of
12
words
long
from
the
locationpointedtobyr12tothelocationpointedtobyr13.r14pointstothe
endofblocktobecopied.
; r12 points to the start of the source data
; r14 points to the end of the source data
; r13 points to the start of the destination data
loop LDMIA r12!, {r0-r11} ; load 48 bytes
STMIA r13!, {r0-r11} ; and store them
CMP r12, r14 ; check for the end
BNE loop ; and loop until done
Thislooptransfers48bytesin31cyclesOver50Mbytes/secat33MHz
638/22/2008
r13
r14
r12
IncreasingMemory
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
64/85
SwapandSwapByteInstructions
Atomicoperation
of
amemory
read
followed
by
amemory
write
whichmovesbyteorwordquantitiesbetweenregistersand
memory.
Syntax:SWP{}{B}Rd,Rm,[Rn]
ToimplementanactualswapofcontentsmakeRd=Rm.
Thecompilercannotproducethisinstruction.
648/22/2008
Rm Rd
Rn
32
1temp
Memory
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
65/85
SoftwareInterrupt(SWI)
Ineffect,aSWIisauserdefinedinstruction.
Itcauses
an
exception
trap
to
the
SWI
hardware
vector
(thus
causingachangetosupervisormode,plustheassociatedstate
saving),thuscausingtheSWIexceptionhandlertobecalled.
Thehandler
can
then
examine
the
comment
field
of
the
instructiontodecidewhatoperationhasbeenrequested.
BymakinguseoftheSWImechanism,anoperatingsystemcan
implementaset
of
privileged
operations
which
applications
runninginusermodecanrequest.
SeeExceptionHandlingModuleforfurtherdetails.
658/22/2008
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10 9 8 7 6 5 4 3 2 1 0 InstructionType
Condition 1 1 1 1 SWI NUMBER SoftwareInterrupt
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
66/85
Backup
8/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
67/85
Assembler:Pseudoops
AREA >chunksofdata($data)orcode($code)
ADR >loadaddressintoaregister
ADRR0,BUFFER
ALIGN >adjustlocationcountertowordboundaryusuallyaftera
storagedirective
END >nomoretoassemble
678/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
68/85
Assembler:Pseudoops
DCD >definedwordvaluestoragearea
BOW DCD 1024,2055,9051
DCB >definedbytevaluestoragearea
BOBDCB10,12,15
% >zeroedoutbytestoragearea
BLBYTE%30
688/22/2008
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
69/85
Assembler:Pseudoops
IMPORT >nameofroutinetoimportforuseinthisroutine
IMPORT_printf;Cprintroutine
EXPORT >nameofroutinetoexportforuseinotherroutines
EXPORTadd2;add2routine
EQU >symbolreplacement
loopcntEQU5
698/22/2008
EE382N-4 Embedded Systems Architecture
bl
-
7/29/2019 ARM ASSEMBLY TUTORIAL
70/85
AssemblyLineFormat
708/22/2008
label instruction ; comment
label: created by programmer, alphanumeric
whitespace: space(s) or tab character(s)
instruction: op-code mnemonic or pseudo-op with required fields
comment: preceded by ;ignored by assembler but usefulto the programmer for documentation
NOTE: All fields are optional.
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
71/85
Example:Cassignments
C:
x = (a + b) - c;
Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b, reusing r4
LDR r1,[r4] ; get value of b
ADD r3,r0,r1 ; compute a+b
ADR r4,c ; get address for c
LDR r2,[r4] ; get value of cSUB r3,r3,r2 ; complete computation of x
ADR r4,x ; get address for x
STR r3,[r4] ; store value of x
718/22/2008
2008WayneWolf ComputersasComponents2nded.
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
72/85
Example:Cassignment
C:y = a*(b+c);
Assembler:
ADR r4,b ; get address for bLDR r0,[r4] ; get value of b
ADR r4,c ; get address for c
LDR r1,[r4] ; get value of c
ADD r2,r0,r1 ; compute partial resultADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
MUL r2,r2,r0 ; compute final value for y
ADR r4,y ; get address for ySTR r2,[r4] ; store y
728/22/2008
2008WayneWolf ComputersasComponents2nded.
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
73/85
Example:Cassignment
C:z = (a
-
7/29/2019 ARM ASSEMBLY TUTORIAL
74/85
Example:ifstatement
C:
if (a > b) { x = 5; y = c + d; } else x = c - d;
Assembler:
; compute and test condition
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
BLE fblock ; if a >
-
7/29/2019 ARM ASSEMBLY TUTORIAL
75/85
ifstatement,contd.
; true block
MOV r0,#5 ; generate value for x
ADR r4,x ; get address for x
STR r0,[r4] ; store xADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value of d
ADD r0,r0,r1 ; compute y
ADR r4,y ; get address for y
STR r0,[r4] ; store yB after ; branch around false block
758/22/2008
2008WayneWolf ComputersasComponents2nded.
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
76/85
ifstatement,contd.
; false block
fblock ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute a-b
ADR r4,x ; get address for xSTR r0,[r4] ; store value of x
after ...
768/22/2008
2008WayneWolf ComputersasComponents2nded.
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
77/85
Example:Conditionalinstructionimplementation
; true block
MOVLT r0,#5 ; generate value for x
ADRLT r4,x ; get address for x
STRLT r0,[r4] ; store x
ADRLT r4,c ; get address for c
LDRLT r0,[r4] ; get value of c
ADRLT r4,d ; get address for dLDRLT r1,[r4] ; get value of d
ADDLT r0,r0,r1 ; compute y
ADRLT r4,y ; get address for ySTRLT r0,[r4] ; store y
778/22/2008
2008WayneWolf ComputersasComponents2nded.
EE382N-4 Embedded Systems Architecture
-
7/29/2019 ARM ASSEMBLY TUTORIAL
78/85
Conditionalinstructionimplementation,contd.
; false block
ADRGE r4,c ; get address for c
LDRGE r0,[r4] ; get value of c
ADRGE r4,d ; get address for d
LDRGE r1,[r4] ; get value for d
SUBGE r0,r0,r1 ; compute a-b
ADRGE r4,x ; get address for xSTRGE r0,[r4] ; store value of x
788/22/2008
2008WayneWolf ComputersasComponents2nded.
EE382N-4 Embedded Systems Architecture
l h
-
7/29/2019 ARM ASSEMBLY TUTORIAL
79/85
Example:switchstatement
C:
switch (test) { case 0: break; case 1: }
Assembler:
ADR r2,test ; get address for test
LDR r0,[r2] ; load value for test
ADR r1,switchtab ; load address for switch table
LDR r1,[r1,r0,LSL #2] ; index switch table
switchtab DCD case0
DCD case1
...
798/22/2008
2008WayneWolf ComputersasComponents2nded.
EE382N-4 Embedded Systems Architecture
l fil
-
7/29/2019 ARM ASSEMBLY TUTORIAL
80/85
Example:FIRfilter
C:for (i=0, f=0; i
-
7/29/2019 ARM ASSEMBLY TUTORIAL
81/85
FIRfilter,cont.d
ADR r3,c ; load r3 with base of c
ADR r5,x ; load r5 with base of x
; loop body
loop LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sumADD r8,r8,#4 ; add one word offset to array index
ADD r0,r0,#1 ; add 1 to i
CMP r0,r1 ; exit?BLT loop ; if i < N, continue
818/22/2008
2008WayneWolf ComputersasComponents2nded.
EE382N-4 Embedded Systems Architecture
ARM Instruction Set Summary (1/4)
-
7/29/2019 ARM ASSEMBLY TUTORIAL
82/85
ARMInstructionSetSummary(1/4)
82
EE382N-4 Embedded Systems Architecture
ARM I t ti S t S (2/4)
-
7/29/2019 ARM ASSEMBLY TUTORIAL
83/85
83
ARMInstructionSetSummary(2/4)
EE382N-4 Embedded Systems Architecture
ARM I t ti S t S (3/4)
-
7/29/2019 ARM ASSEMBLY TUTORIAL
84/85
84
ARMInstructionSetSummary(3/4)
EE382N-4 Embedded Systems Architecture
ARM I t ti S t S (4/4)
-
7/29/2019 ARM ASSEMBLY TUTORIAL
85/85
ARMInstructionSetSummary(4/4)