nabla containers: a new approach to container isolation · 2019-05-15 · making and running a...
TRANSCRIPT
![Page 1: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/1.jpg)
Nablacontainers:anewapproachtocontainerisolationBrandonLum,RicardoKoller ,DanWilliams,Sahil Suneja
IBMResearchhttps://nabla-containers.github.io
Kubecon China2018
![Page 2: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/2.jpg)
ContainersarenotsecurelyIsolated
2
![Page 3: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/3.jpg)
ContainersarenotsecurelyIsolated
3
- Whatdoesthisexactlymean?
- WhyareVMsconsideredsecurebutnotcontainers?
- Howdoweimprovecontainerisolation?
![Page 4: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/4.jpg)
Overview
• ThreatModel:Isolation• Isolationthroughsurfacereduction• Ourapproach:Nabla• MeasuringIsolation• Nabla vsVMs?
4
![Page 5: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/5.jpg)
Whatdoesitmeantobeisolated?
• Containersthatareco-locatedshouldnotbeabletoaccessdataofanother
• Scenarios:• Horizontalattacksfromvulnerableservices
• Container-nativemulti-tenantcloud
Kernel
attacker
ServiceA
secret
containers
![Page 6: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/6.jpg)
ContainerIsolationReality• Containers==namespacedprocessesà Kernelexploitsmostlywork
• Sep2018:CVE-2018-14634• DirtyCOW (CVE-2016-5195)• Manymore(CVEdatabase),2018:Codexec (3),Mem.Corrupt(8)
• Horizontalattackpossibleviasharedprivilegedcomponent(kernel) Kernel
attacker
ServiceA
secret
containers
attacker
Exploitviasyscalls
![Page 7: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/7.jpg)
DirtyCOW
• DityCow ExploitSketch:• mmap apage• Createathreadthatinvokesmadvise
• CreateathreadthatinvokesRead/Write procfs
• TriggersraceconditioninKernelMem.managementcode
// FROM: https://dirtycow.ninja/
map=mmap(NULL,st.st_size,PROT_READ,MAP_PRIVATE,f,0); printf("mmap %zx\n\n",(uintptr_t) map);
/* You have to do it on two threads. */ pthread_create(&pth1,NULL,madviseThread,argv[1]); //madvisepthread_create(&pth2,NULL,procselfmemThread,argv[2]); // R/W procfs
/* You have to wait for the threads to finish. */ pthread_join(pth1,NULL); pthread_join(pth2,NULL); return 0;
7
![Page 8: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/8.jpg)
ContainerIsolationReality
Kernel
attacker
ServiceA
secret
containers
attacker
![Page 9: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/9.jpg)
Application
Kernel
KernelFootprint
>300Syscalls
disk
FS
• Exploitstargetvulnerablepartofkernelviasyscalls.
• Ifwerestrictthenumberofsyscalls• à Lessreachablekernelfunctions• à Lesspotentialvulnerabilities• à Lesspossibleexploits
![Page 10: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/10.jpg)
Application
Kernel
DockerDefaultSeccomp Policy
~280Syscalls
disk
FS
• Dockerdefaultseccomp policy• disablesaround44systemcallsoutof300+.
• Genericseccomp policies– hardtocreates.t. itissecure
• Syscall profilingismostlyheuristicbased
44Syscallsseccomp (Whitelistingpolicy)
Greyed– unreachablefunctions
![Page 11: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/11.jpg)
Application
Kernel
Nabla
7Syscalls
disk
FS
• Deterministic andgenericseccomp policy
• Only7syscalls!• UsesLibOS techniques
seccompLibOS
Original300+Syscall interface*
![Page 12: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/12.jpg)
Nabla
• Takingunikernel ideasandputtingitintocontainers
• Usingtools/technologiesfromtherumprun andsolo5community
• Modifyunikernel toworkasaprocess
12
“Unikernels asProcesses”(ACMSoCC ’18)
(https://dl.acm.org/citation.cfm?id=3267845)
![Page 13: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/13.jpg)
MakingandrunningaNabla
• Buildapp.withcustombuildprocess*
• Nabla runtime,runnc loadsthenabla binariesandsetsupseccompprofiles
13
Application
7Syscalls
seccompLibOS
*currentlimitationofbuildprocess,weareinvestigatingwaystoconsiderremovingacustombuildprocess
Application
>300SyscallsBuildprocess* Nabla
Binary
ContainerRuntime
runc
Application Application
runnc
Application
7Syscalls
seccomp
LibOS
Application
7Syscalls
seccomp
LibOS
![Page 14: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/14.jpg)
Demo
14
![Page 15: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/15.jpg)
strace/ftracemeasurements(Lowisgood)
15
Application
Kernel
>300Syscalls
disk
FS
ftracemeasuresnumberofboxestouched.
stracemeasuressyscallsinvoked.
![Page 16: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/16.jpg)
ftracemeasurements(lowerisbetter)
16
Kata-containers(VMs)
Nabla
WhatdoesthissayaboutourisolationvsVMs?
![Page 17: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/17.jpg)
HavewesurpassedVMisolation?
• Weexploredandcontestedthisideainourpaper:
“SayGoodbyetoVirtualizationforaSaferCloud”(USENIXHotCloud 2018)
(https://www.usenix.org/conference/hotcloud18/presentation/williams)
• Maybe… Butseveralquestions:• Implementationspecificcomparisons?KVMvsotherhypervisors• Hardwareinclusivethreatmodel(Spectre/Meltdown,etc.)• Othermetrics
17
![Page 18: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/18.jpg)
What’sNext?
• Wewanttoengagethecommunity:
• Developmentworkforrunnc/nabla-base-build/nabla-demo-apps• Removeneedtorebuildnabla containers(SupportfordynamiclinkingLibOS)• Createnewimagesandmorelanguagesupportforapplications
• ChimeinonImprovingSecurityAnalysis/Metrics• https://github.com/nabla-containers/nabla-measurements
18
![Page 19: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/19.jpg)
19
ThankYou!https://nabla-containers.github.io
BrandonLum (@lumjjb)– [email protected]
#NablaContainers
![Page 20: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/20.jpg)
Backup
20
![Page 21: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/21.jpg)
ftracemeasurements(lowerisbetter)
21
Application
Kernel
>300Syscalls
disk
FS
MeasuringnumberofboxesTouched.
![Page 22: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/22.jpg)
Throughput(higherisbetter)
22
![Page 23: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/23.jpg)
Demo
23
ContainerRuntime
Kubelet
containerd
CNIPlugin
Cri-containerd
CRI
CNI runnc
IMAGEREGISTRY
Imagepull(OCIimagespec)
RunContainer(OCIRuntimeSpec)
OtherConfigfrompodSpeci.e.mounts,security,etc.
runc
![Page 24: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/24.jpg)
InsideaNabla container
• Unmodifiedusercode(e.g.,Node.js,redis,nginx,etc.)
• Rumprun libraryOS• UnmodifiedNetBSD code+someglue• RunsonthinSolo5unikernel interface
• Nabla Tender• Setupofseccomp policy• TranslatesSolo5callstosystemcalls
Libc
Rumprun glue
NetBSD
Solo5
FSTCP/IP
…
Application
𝛁 Tender
OriginalContainer
![Page 25: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/25.jpg)
Backup:ContainersvsVMs
25
![Page 26: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/26.jpg)
Overview
• ThreatModel:Isolation• WhatmakesVMsisolated?• Nabla:Howdowegetthoseisolationpropertieswithoutoverhead?
26
Disclaimer:Inthistalk,wearedoinga1:1comparison.Defenseindepthisavaliddiscussionwithadifferentsetoftrade-offs.
![Page 27: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/27.jpg)
ContainersVMs
27
Hypervisor(+HostKernel(root))
GuestOS ☠
HostKernel
Pro-cess ☠
HighLevel- Syscalls:Filesysteminterface,socketinterface,etc.
LowLevel– VT:BlockDev.Interface,TAPinterface,etc.
![Page 28: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/28.jpg)
ContainersVMs
28
Infra
Interface
FS
GuestApplication Process
disk
ALOTmoreexploitablecodeintheinfrastructure!!!
Infra
Interface
Guest .OS .
disk
FS
![Page 29: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/29.jpg)
Lower level interface
Less code
Fewer vulnerabilities
Stronger isolation
![Page 30: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/30.jpg)
30
![Page 31: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/31.jpg)
Kernelfunctionsaccessedbyapplications
• Comparedtostandardcontainers
• 5-6xlesskernelfunctionsaccessed
• 8-14xfewersyscalls
• AbouthalfthenumberofkernelfunctionsaccessedasVMs!
0 200 400 600 800
1000 1200 1400 1600
nginxnginx-large
node-express
redis-get
redis-set
Uni
que
kern
elfu
nctio
ns a
cces
sed process
ukvmnabla
ContainerVM
nabla
![Page 32: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/32.jpg)
AccessiblekernelfunctionsunderNabla policy
0
100
200
300
400
500
600
700
0 50 100 150 200 250 300
Uni
que
kern
el fu
nctio
ns
acceptnablablock
0
30
0 10
• Trinitykernelfuzztestertotrytoaccessasmuchofkernelaspossible
• Nabla policyreducesamountofaccessiblekernelfunctionsby98%
![Page 33: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/33.jpg)
Unikernel isolationcomesfromtheinterface
• Directmappingbetween10hypercalls andsystemcall/resourcepairs
33
Hypercallwalltime
puts
poll
blkinfo
blkwrite
blkread
netinfo
netwrite
netread
halt
• 6forI/O• Network:packetlevel• Storage:blocklevel
• vs.>350syscalls
SystemCall Resourceclock_gettime
write stdoutppoll net_fd
pwrite64 blk_fdpread64 blk_fd
write net_fdread net_fdexit_group
![Page 34: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/34.jpg)
SOCC
34
![Page 35: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/35.jpg)
Implementation:nabla 𝛁
35
• ExtendedSolo5unikernelecosystemandukvm
• Prototypesupports:• MirageOS• IncludeOS• Rumprun
• https://github.com/solo5/solo5
![Page 36: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/36.jpg)
Measuringisolation:commonapplications
0 200 400 600 800
1000 1200 1400 1600
nginxnginx-large
node-express
redis-get
redis-set
Uni
que
kern
elfu
nctio
ns a
cces
sed process
ukvmnabla
36
• Codereachablethroughinterfaceisametricforattacksurface
• Usedkernelftrace
• Results:• Processes:5-6xmore• VMs:2-3xmore
![Page 37: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/37.jpg)
Measuringisolation:fuzztesting
37
0
100
200
300
400
500
600
700
0 50 100 150 200 250 300
Uni
que
kern
el fu
nctio
ns
acceptnablablock
0
30
0 10
• Usedkernelftrace• Usedtrinitysystemcallfuzzer totrytoaccessmoreofthekernel
• Results:• Nabla policyreducesby98%overa“normal”process
![Page 38: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/38.jpg)
Measuringperformance:throughput
80%
100%
120%
140%
160%
180%
200%
py_tornado
py_chameleon
node_fib
mirage_H
TTP
py_2to3
node_express
nginx_large
redis_get
redis_set
includeos_TCP
nginx
includeos_UD
P
Nor
mal
ized
thro
ughp
ut
245
no I/O with I/O
ukvmnablaQEMU/KVM
38
• Applicationsinclude:• Webservers• Pythonbenchmarks• Redis• etc.
• Results:• 101%-245%higherthroughputthanukvm
![Page 39: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/39.jpg)
Measuringperformance:CPUutilization
0 20 40 60 80
100 120
(a)
CPU
%
0 20 40 60 80
100
(b)
VM
exits
/ms
0
0.5
1
1.5
0 5000 10000 15000 20000(c
) IP
C (i
ns/c
ycle
)Requests/sec
nablaukvm
39
• vmexits haveaneffectoninstructionspercycle
• ExperimentwithMirageOSwebserver
• Results:• 12%reductionincpuutilizationoverukvm
![Page 40: Nabla containers: a new approach to container isolation · 2019-05-15 · Making and running a Nabla • Build app. with custom build process* • Nablaruntime, runncloads the nablabinaries](https://reader030.vdocuments.mx/reader030/viewer/2022040303/5e8f43583f87dd461749c839/html5/thumbnails/40.jpg)
Measuringperformance:startuptime
0
250
500
750
Hello
world
QEMU/KVM
0
10
20
30ukvm
0
10
20
30nabla
0
10
20
30process
0
500
QEMU/KVM
ukvm
nabla
process
2 4 6 8 10 12 14 160
500
1000
1500
HTTP
POST
2 4 6 8 10 12 14 160
50
100
150
200
2 4 6 8 10 12 14 16
Number of cores
0
50
100
150
200
2 4 6 8 10 12 14 160
50
100
150
200
0 2 4 6 8 10 12 14
0
500
1000
1500
40
• Startuptimeisimportantforserverless,NFV
• Results:• Ukvm has30-370%higherlatencythannabla
• MostlydueavoidingKVMoverheads
Helloworld
HTTPPost