AppswithHardwareEnablingRun-timeArchitecturalCustomizationinSmartPhones
MichaelCoughlin,AliIsmail,EricKellerUniversityofColoradoBoulder
MobileDevices
2
Devicesaredesignedaroundcertainrestrictions
Thisleadsvendorstomaketradeoffs
Whatifusersanddeveloperscouldchoose?
Vision:SmartPhonewithanFPGA
3
HW SW
Android
FPGA ARM
App
Software-definedRadio
4
High-performanceComputing
5
Cryptography
http://www.nallatech.com/40gbit-aes-encryption-using-opencl-and-fpgas/
Analytics
http://www.datanami.com/2015/03/10/fpga-system-smokes-spark-on-streaming-analytics/
ArchitecturalEnhancements
6
Somniloquy (NSDI09)(SEC04)
Whyisnowtherighttime?
7
SoCs withProgrammableLogiccoupledwithARMCortexA9 (sameasiPhone4andmanyothersmartphones)
High-levelSynthesisWriteC/C++/SystemC /OpenCL code
8
FundamentalProblem:
SharingtheFPGAbetweenapplications
Whatwecanalreadydo
9
Processor
Apploads:softwarerunsonprocessor,FPGAconfiguredwithhardware
FPGA
AppX
AppXHardware
AppXSoftware
Whatwecanalreadydo
10
Thisiscurrentlypossible– run-timereconfiguration
Processor FPGAAppXHardware
AppXSoftware
Apploads:softwarerunsonprocessor,FPGAconfiguredwithhardware
Sortof
Whatwecan’tdo
11
Whatifwehavetwoapps?
Processor FPGAAppXHardware
AppXSoftware
AppY
AppYHardware
AppYSoftware
Whatwecan’tdo
12
Whatifit’sasinglechip(andsomeI/OgoesthroughtheFPGA)
I/O
Processor FPGAAppXHardware
AppXSoftware
I/O
AppY
AppYHardware
AppYSoftware
• Overadecadeofresearchhasproposedtwomainsolutions:– Run-timeplace-and-route– Slot-basedreconfiguration
Whyhasn’tthisbeensolvedbefore?
13
• ThereisfreespaceintheFPGA• Placeanewmodulethere
14
Approach1:Run-timePlace/Route
• Routingcanfail• Routingisalsoverytimeconsuming
• Therefore,isnotpractical
15
Approach1:Run-timePlace/Route
• IdenticalemptyregionsarereservedinFPGA
• Constraintoolsto:– Notusewires/logicinsideofslots– Useexactsamewiresforinterface
16
Approach2:Slot-BasedReconfiguration
Slot1
Slot2
Slot3
• Hardwareisloadedintoslots• Problem:ifotherlogicexists,wireroutingbecomesveryconstrained
• Therefore,isalsonotpractical
17
Approach2:Slot-BasedReconfiguration
Slot1
Slot2
Slot3
• Run-timePlaceandRoute– Isverycomputationallyexpensive– Canpossiblyfail
• Slot-baseReconfiguration– Constrainedroutingisveryrestrictiveandnotapplicablegenerally
• Therefore,previousresearchisnotpractical
PreviousResearch
18
• AllowsforsharingoftheFPGAbetweengeneralapps
• Usesexistingvendortechnologies
• Adoptstheideaofslotsfrompreviousresearch
• CloudRTRmakesexistingvendortechnologyworkforgeneralapps
IntroducingCloudRTR
19
TheAppDeploymentModel
20
CloudRTR
21
Manufacturers
Developer
CloudRTR
Android
FPGA ARM
Consumer
StaticDesign
1 2 3
StaticDesign
1 2 3
StaticDesign
1 2 3
• Createsastaticdesign– Alllogicthatdoesnotchange
• Designincludesareasreservedforslots
• Sendsthistothecloudcompiler
Manufacturer
22
StaticDesign
1 2 3
GPU AXI
• Createanappusingexistingtools
• CreateahardwaredefinitioninC
Developer
23
boolexample(ap_uint<32>*inap_uint<32>*out,bool*enabled,
)
• Compileshardwareforeachapp– Foreachdevicevariant– Foreachslotineachvariant
AppStore(CloudCompiler)
24
X
App
[device1:[slot1:a.bit,slot2:b.bit,slot3:c.bit]]
[device2:[slot1:d.bit,slot2:e.bit]]
CloudCompiler
StaticDesign
1 2 3
StaticDesign
1 2 3
StaticDesign
1 2 3
• Asystemservicemanagesslots
• Downloadedappsincludeslothardware
• Thesystemserviceloadsapphardwareforapps
User(OperatingSystem)
25
.apk:[device1:[slot1:a.bit,slot2:b.bit,slot3:c.bit]]
FPGAGPU AXI
1 2 3X
• Theslotmanagerenforcesaccesstohardware
• However,FPGAscantheoreticallydirectlyaccesssensitiveresources(whilebypassingtheOS)
• Asecureloadingsystemensuresthatappscannotaccesssensitiveresources
SecurityConsiderations
26
Secureloadingsystem
27
Processor
FPGA
Howdoesthesecureloaderwork?
Slot1 Slot2
MemoryController
OperatingSystem SignatureVerification
ReconfigurationModule
ICAP
Secureloadingsystem
28
Processor
FPGA
Slot2
MemoryController
OperatingSystem SignatureVerification
ReconfigurationModule
ICAP
Signedmodule
Slot1
TheOSwantstoreconfigureSlot1
Secureloadingsystem
29
Processor
FPGA
Slot1 Slot2
MemoryController
OperatingSystem SignatureVerification
ReconfigurationModule
ICAP
Signedmodule
Thesignatureofthemoduleisverified
Secureloadingsystem
30
Processor
FPGA
Slot1 Slot2
MemoryController
OperatingSystem SignatureVerification
ReconfigurationModule
ICAP
Signedmodule
ThemoduleiswrittentotheICAP
Secureloadingsystem
31
Processor
FPGA
Slot1 Slot2
MemoryController
OperatingSystem SignatureVerification
ReconfigurationModule
ICAPSignedmodule
TheICAPperformsthereconfiguration
• Istherevalueinappswithhardware?
• Isthecloud-basedcompilationofCloudRTRpractical?
Evaluation
32
Microbenchmark1:QAMdemodulator
33
4ordersofmagnitude
Microbenchmark2:AES
34
FPGAis3xvs.
OpenSSL
• Wealsoimplementedahardwarememoryscanner
• ItcanscantheentireaddressspacetransparentlytotheOS– 2.7%memoryreadperformancehit– 5.5%memorywriteperformancehit
• WetestedthisusingtheLMbench testbench
Microbenchmark3:MemoryScanner
35
Brute-forcecompilation
36
GooglePlayStore Figures
#ofAppsas of Dec 14 1.43Million
AverageMonthlyAppGrowth 6.10%
#ofAppsforJanuary16 117,521
provided by AppFigures.
Brute-forcecompilation
37
Max#ofApps Compiledperday
#ofSlots
Apps
2 121
3 96
4 76
5 59
6 51
2SlotsRequirements %ofAprilApps thatuseHardware(#ofAppsUploadedperDay)
0.1(3)
1(34)
10(347)
#ofDeviceVariants
#ofMachines RequiredtoCompileApps
1 1 1 310 1 3 29100 3 29 2881000 29 288 2875
Reasonableformostscenarios
Brute-forcecompilation
38
6SlotsRequirements %ofAprilApps thatuseHardware(#ofAppsUploadedperDay)
0.1(3)
1(34)
10(347)
#ofDeviceVariants
#ofMachines RequiredtoCompileApps
1 1 1 710 1 7 69100 7 69 6811000 69 681 6809
Max#ofApps Compiledperday
#ofSlots
Apps
2 121
3 96
4 76
5 59
6 51
Stillreasonableformostscenarios
• Compilationcanbeoffloadedtomanufacturers
• Manufacturerswilllikelyreusedesigns (Qualcomm,ARMchipsareoftenreused)
• Developerswilllikelyuselibraries
Reducingthenumbersevenmore
39
• ToronAndroid
• AESisonthecriticalpath
• ExamineAESasanintegrationstudy
ImplementationCaseStudy:Orbot
40
Whatwefound:• Memoryoperationsarethebottleneck– Datamustbeplacedcorrectlyinmemory– Userspace I/Ohashighoverhead– ManysystemcallsareincompatiblewithUIO
• Itiseasiertobuildanapplicationfromground-up
ImplementationCaseStudy:Orbot
41
• Wehavepresentedourvisionofappswithhardware
• CloudRTRimplementsourvisionbyleveragingthemobileappdeploymentmodel
• Wehavedemonstratedthevalueandpracticalityofourvision
Conclusion
42
VendorSupportedPartialReconfiguration
44
TargetFPGA
StaticDesign
DynamicModule (s)
Vendor tools
• base.bit• partial_1.bit• partial_2.bit
(Partialbitstreams workin1location,andarejustforbase.bit)
Goal:Spacesavingforcustomer
• Crypto– Asymmetric(RSA,ECDSA,etc…)– Symmetric(3DES,Twofish,Blowfish)
• Softprocessors• Encoding– Networkencoding(Reed-Solmon,etc…)– Mediaencoding(JPEG,MPEG,etc…)
• DSP– FFTs,Filters,etc…
ExamplesofLibraries
45
boolexample(ap_uint<32>*inap_uint<32>*out,bool*enabled,
)
Examplehardwaredefinition
46
typedef ap_uint<32>uint32_t_hw;typedef hls::stream<uint32_t_hw>mem_stream32;
boolaes(volatileunsignedintm_mm2s_ctl[500],volatileunsignedintm_s2mm_ctl[500],volatileunsignedsourceAddress,ap_uint<128>*key_in,ap_uint<128>*iv,volatileunsigneddestinationAddress,unsignedint numBytes,intmode,mem_stream32&s_in,mem_stream32&s_out
)
Morecomplicatedhardwaredefinition
47
Theproblem
48
Let’sexaminetheproblem
Processor FPGA
AppXhardware
AppXsoftware
I/O
I/O
Theproblem
49
Processor FPGA
AppXhardware
AppXsoftware
I/O
I/O
First,therearevariousinterconnectsneeded
Theproblem
50
Processor FPGA
AppXhardware
AppXsoftware
I/O
I/O
Controlsignalsandlogicmustalsobeplaced
Theproblem
51
Processor FPGA
AppXhardware
AppXsoftware
I/O
I/O
Theappmayhavecomplexinputs,orneedtointeractwithotherlogic
• AtrustedsystemisbootedwithSecureBoot
• Includedisastaticmodulethatreconfiguresslots
• Thismoduleonlyallowssignedmodulesintoslotsthataccesssensitiveresources
Secureloadingsystem
52
• Buildsoffofpriorresearch…
• …butinawaythatiscompatiblewithvendortools
• Todothis,weleveragethedeploymentmodelformobileapps
Oursolution
53