kpatch without stop machine - linux foundation

43
© Hitachi, Ltd. 2014. All rights reserved. Kpatch Without Stop Machine The Next Step of Kernel Live Patching Masami Hiramatsu <[email protected]> Linux Technology Research Center Yokohama Research Lab. Hitachi Ltd., LinuxCon NA 2014 (Aug. 2014) 1

Upload: others

Post on 16-Oct-2021

8 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Kpatch Without Stop MachineThe Next Step of Kernel Live Patching

Masami Hiramatsu<[email protected]>Linux Technology Research CenterYokohama Research Lab. Hitachi Ltd.,

LinuxCon NA 2014(Aug. 2014)

1

Page 2: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Speaker

• Masami Hiramatsu– A researcher, working for Hitachi

• Researching many RAS features

– A linux kprobes-related maintainer• Ftrace dynamic kernel event (a.k.a. kprobe-tracer)• Perf probe (a tool to set up the dynamic events)• X86 instruction decoder (in kernel)

2

Page 3: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Agenda: Kpatch Without Stop Machine

Background Story

Kpatch internal

Kpatch without stop_machine

Conclusion and Discussion

Note: this presentation is only focusing on the kernel-module side of kpatch.

More generic design and implementation, please attend to -

kpatch: Have Your Security And Eat It Too! – Josh Poimboeuf

Aug. 22. pm2:30

3

Page 4: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Agenda: Kpatch Without Stop Machine

Background Story What is kpatch?

Live patching requirements

Major updates and Minor updates

Kpatch internal Kpatch Overview

Active Safeness check

Stop_machine

Kpatch without stop_machine Live Patcing Rules

Kpatch Reference Counter

Safeness check without stop_machine

Conclusion and Discussion

4

Page 5: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

What’s Kpatch?

• Kpatch is a LIVE patching function for kernel– This applys a binary patch to kernel on-line– Patching is done without shutdown

• Only for a small and critical issues– Not for major kernel update

5

Page 6: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Why Live Patching Required?

• Live patching is important for appliances for mission critical systems– Some embedded appliances are hard to maintain

frequently• Those are distributed widely in country side• Not in the big data center!

– Some appliances can’t accept 10ms downtime• Factory control system etc.

6

Page 7: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

But for Major Updates?

• M.C. systems have periodic maintenance– Major fixes can be applied and rebooted– In between the maintenance, live patching will be used

• Live patching and major update are complementeach other– Live patching temporarily fixes small critical incidents– Major update permanently fixes all bugs

7

Planned Maintenance

Major UpdateLive Patching

criticalminor

Page 8: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

The History of Live patching on Linux

• Live patching is not new

8

Pannus Live Patch (2004-2006)http://pannus.sourceforge.net/

Livepatch (2005)http://ukai.jp/Slides/2005/1202-b2con/mop/livepatch.html

ksplice (2007-)https://www.ksplice.com/

kGraft (2014)

kpatch (2014)

2004

For application

For kernel

2014

Use ftrace to replace functionWill be supported by major distributors

Build from scratchAcquired and supported by Oracle

Developed for CGLNo distribution support

Ptrace and mmap based one

Page 9: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Agenda: Kpatch Without Stop Machine

Background Story What is kpatch?

Live patching requirements

Major updates and Minor updates

Kpatch internal Kpatch Overview

Active Safeness check

Stop_machine

Kpatch without stop_machine Live Patcing Rules

Kpatch Reference Counter

Safeness check without stop_machine

Conclusion and Discussion

9

Page 10: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Kpatch: Overview

• Kpatch has 2 components– Kpatch build: Build a binary patch module– Kpatch.ko: The kernel module of Kpatch

10

Originalsrc

Patchedsrc

Patch

Build Currentvmlinux

Build Patchedvmlinux

Per-functiondiff

Patch module

A module whichcontains new

functions

Done by Kpatch build

Runninglinux

Kpatch.ko

Done by kpatch.ko

Do patchNew functions

On old functions

Today’s talk

Page 11: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Kpatch: How to Patch

• Kpatch uses Ftrace to patch– Hook the target function entry with registers– Change regs->ip to new function (change the flow)

11

Call foo()Call foo()

Call fentryCall fentrySave RegsSave Regs

Call ftrace_opsCall ftrace_ops Get new IP fromHash table

Get new IP fromHash table

Restore RegsRestore RegsChange regs->ipChange regs->ip

ReturnReturn

……

ReturnReturn

……

foo()

New foo()Ret to regs->ipRet to regs->ip

……

Ftrace hooksFoo() entry

Find the new IP from hashtable

Ftrace returnsTo new foo()

ReturnReturn

Foo() is called evenAfter patching.

Function pointer isAvailable

ftrace

kpatch.ko

Page 12: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Conflict of Old and New Functions

• Kpatch will update the execution path of a function– Q: What happen if the patched function is under

executed?– A: Old and new functions are executed at the

same time

!!This should not happen!!

• Kpatch ensures the old functions are not executed when patching– “Active Safeness Check”

12

Page 13: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Active Safeness Check

• Executing functions are on the stack• And IP register points current function too

• Active Safeness Check– Do stack dump to check the target functions are

not executed, for each thread.– Need to be done when the process is stopped.

– stop_machine is used13

Func1Func1

Func2Func2

Func3Func3 Func1+XX

Func2+YY

Stack at here

Func1+XX

Stack at here

Page 14: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Active Safeness Check With Stop_machine

• Kpatch uses stop_machine to check stacks

14

Time

PatchingProcess

Process1

Process2

Process3

AddFtrace entry

Call old func1

Call old func2

Page 15: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Active Safeness Check With Stop_machine

• Kpatch uses stop_machine to check stacks

15

Time

PatchingProcess

Process1

Process2

Process3

AddFtrace entry

Call old func1

Call old func2

All running processes and interrupts are stopped

Call Stop Machine

Page 16: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Active Safeness Check With Stop_machine

• Kpatch uses stop_machine to check stacks

16

Time

PatchingProcess

Process1

Process2

Process3

AddFtrace entry

SafenessCheck

Call old func1

Call old func2

Walk through the allThread and check

Old funcs on stacks

All running processes and interrupts are stopped

Call Stop Machine

Page 17: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Active Safeness Check With Stop_machine

• Kpatch uses stop_machine to check stacks

17

Time

PatchingProcess

Process1

Process2

Process3

AddFtrace entry

Update hashtable

SafenessCheck

Call old func1

Call old func2

Call New func2

Call New func2

Call New func1

Call New func1

Walk through the allThread and check

Old funcs on stacks

Now switch toNew functionsAll running processes and

interrupts are stopped

Call Stop Machine Return

Page 18: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Stop_machine: Pros and Cons

• Pros– Safe, simple and easy to review, Good for the 1st

version

• Cons– Stop_machine stops all processes a while

• It is critical for control/network appliances

– In virtual environment, this takes longer time• We need to wait all VCPUs are scheduled on the host

machine

18

Page 19: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Agenda: Kpatch Without Stop Machine

Background Story What is kpatch?

Live patching requirements

Major updates and Minor updates

Kpatch internal Kpatch and Ftrace

Kpatch and Safeness check

Stop_machine

Kpatch without stop_machine Live Patching Rules

Kpatch Reference Counter

Safeness check without stop_machine

Conclusion and Discussion

19

Page 20: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Live Patching Rules

• Live patching must follow the rules1. All the new functions in a patch must be applied

at once● We need an atomic operation

2. After switching new function, the old function must not be executed● We have to ensure no threads runs on old

functions● And no threads sleeps on them

20

Page 21: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Two actions for the solution

1. Introduce an atomic reference counter

2. Active safeness check at the context switch

21

Page 22: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Atomic Function Reference Counter

1. Introduce an atomic reference counter– Without stop_machine, functions can be called

while patching• Ensure no one actually runs functions -> refcounter• Increment the refcounter at entry• Decrement the refcounter at exit

– If refcounter is 0, update ALL function paths• We are sure there is no users

22

Page 23: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Kpatch Reference Counter

• Patching(switching) controlled by refcountTime

PatchingProcess

Process1

Process2

Process3

AddFtrace entry+1

Start patcingRefcnt = 1

Page 24: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Kpatch Reference Counter

• Patching(switching) controlled by refcountTime

PatchingProcess

Process1

Process2

Process3

AddFtrace entry

Prepare newHash table

SafenessCheck

Call old func1

Call old func2

Call old func1

+1 +1

+1

-1-1

-1

+1

Start patcingRefcnt = 1

Each function callInc/dec refcnt

While patching

Page 25: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Kpatch Reference Counter

• Patching(switching) controlled by refcount

25

Time

PatchingProcess

Process1

Process2

Process3

AddFtrace entry

Prepare newHash table

SafenessCheck

Call old func1

Call old func2

Call old func3

Call old func3

Call old func1

+1 +1

+1

+1

+1

-1

-1-1

-1

-1

+1

Patching processIs over, but refcnt

Is not 0

Start patcingRefcnt = 1

Each function callInc/dec refcntWhile patching

Page 26: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Kpatch Reference Counter

• Patching(switching) controlled by refcount

26

Time

PatchingProcess

Process1

Process2

Process3

AddFtrace entry

Prepare newHash table

SafenessCheck

Call old func1

Call old func2

Call old func3

Call old func3

Call old func1

Call New func2

Call New func2

Call New func1

Call New func1

+1 +1

+1

+1

+1

-1

-1-1 -1

-1

-1

+1

Patching processIs over, but refcnt

Is not 0

Refcnt is 0Patch enabled on

all funcs

Start patcingRefcnt = 1

Don’t countrefcnt anymore

Each function callInc/dec refcntWhile patching

Page 27: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Kpatch Reference Counter (cont.)

• Control the reference counter– Need to stop counting before and after patching– Use atomic_inc_not_zero/dec_if_positive

• These are stopped automatically if counter == 0

27

func call

00

Do not hook functionEntry/exit before patching

refcnt

Page 28: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Kpatch Reference Counter (cont.)

• Control the reference counter– Need to stop counting before and after patching– Use atomic_inc_not_zero/dec_if_positive

• These are stopped automatically if counter == 0

28

1 2 1 0

atomic_dec_if_positiveatomic_inc_not_zerofunc call

atomic_incforcibly inc refcnt Patching

atomic_dec

func call

00refcnt

Do not hook functionEntry/exit before patching

Page 29: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Kpatch Reference Counter (cont.)

• Control the reference counter– Need to stop counting before and after patching– Use atomic_inc_not_zero/dec_if_positive

• These are stopped automatically if counter == 0

29

1 2 1 0 0 0

atomic_dec_if_positive

->do nothing

atomic_inc_not_zero-> do nothing

func call

atomic_dec_if_positiveatomic_inc_not_zerofunc call

atomic_incPatching

atomic_dec

func call

0

atomic_dec_if_positive

->do nothing

0refcnt

Do not hook functionEntry/exit before patching

Page 30: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Active Safeness check without stop_machine

2. Active safeness check at the context switch– To find threads sleeping(or going to sleep) on the

functions– For all running processes, hook the context

switch and check stack entries safely.– For the sleeping tasks, we can check it safely.

30

Page 31: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Safeness Check without Stop_machine

• 2 stages safeness checking

31

PatchingProcess

stack check if task is not running

rcu

lock

rcu unlo ck

check over all threads

List upRunningthreads

TimeStage1: Check sleeping tasks

Running process A

Running process B

wait on runqRun

Run

context switch

wait on runq

Page 32: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Safeness Check without Stop_machine

• 2 stages safeness checking

32

PatchingProcess

stack check if task is not running

rcu

lock

rcu unlo ck

check over all threadsrunning pid

running pidrunning pid

List upRunningthreads

hook

stack checkfor current process

Update

Wait for all runningtask is checked

Time

Stage1: Check sleeping tasks Stage2: Check running tasks

Running process A

Running process B

wait on runqRun

Run

context switch

wait on runq

Page 33: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

How to add refcnt and context switch hook?

• To hook the function entry/return– Use kretprobe to hook it– For each function entry/return, inc/dec refcount

• To hook the context switch– Use kprobe to hook it– Do safeness check (on stack) and update running pid list

• Both are dynamic probe– After checking the safeness, all kretprobes/kprobes are r

emoved from the target functions– We have minimal overhead

33

Page 34: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Demo

• Demonstrating kpatching with/without stop_machine– Using ftrace to trace stop_machine()

34

(Setup ftrace)# echo stop_machine > /sys/kernel/debug/tracing/set_ftrace_filter# echo function_graph > /sys/kernel/debug/tracing/current_tracer

(Run the kpatch)# kpatch load kpatch-test-patch.ko

(Check the result)# echo 0 > /sys/kernel/debug/tracing/tracing_on# cat /sys/kernel/debug/tracing/trace

(Setup ftrace)# echo stop_machine > /sys/kernel/debug/tracing/set_ftrace_filter# echo function_graph > /sys/kernel/debug/tracing/current_tracer

(Run the kpatch)# kpatch load kpatch-test-patch.ko

(Check the result)# echo 0 > /sys/kernel/debug/tracing/tracing_on# cat /sys/kernel/debug/tracing/trace

Page 35: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Demo result

• With stop_machine

• Without stop_machine

35

cat trace# tracer: function_graph## CPU DURATION FUNCTION CALLS# | | | | | | | 0) ! 6410.455 us | stop_machine();

cat trace# tracer: function_graph## CPU DURATION FUNCTION CALLS# | | | | | | | 0) ! 6410.455 us | stop_machine();

cat trace# tracer: function_graph## CPU DURATION FUNCTION CALLS# | | | | | | |

cat trace# tracer: function_graph## CPU DURATION FUNCTION CALLS# | | | | | | |

No stop_machine is executed

Page 36: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Agenda: Kpatch Without Stop Machine

Background Story What is kpatch?

Live patching requirements

Major updates and Minor updates

Kpatch internal Kpatch Overview

Kpatch and Safeness check

Stop_machine

Kpatch without stop_machine Live Patcing Rules

Kpatch Reference Counter

Safeness check without stop_machine

Conclusion and Discussion

36

Page 37: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Conclusion

• Succeed to get rid of stop_machine() from kpatch– This is a proof of concept of stop_machine-free

kpatch• This means kpatch CAN BE ready for mission critical

systems• But still under discussion stage

• Upstreaming could be a long way– At first, push current stop_machine-based kpatch

to upstream– Stop_machine-free will be the next step

37

Page 38: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Current Issues

• Possible to miscount the function reference– Kretprobe has no error notification

• Kretprobe can be failed to handle the function return because of return-address buffer shortage

• Possible to fail patching with big patch– We have to monitor all the functions are safe in the patch– Big patch has many patched functions

• Some of them can be always used in the system• Incremental patching could be better

• Module unloading using stop_machine– This will happen if we replace old patch with new one– Incremental patching can avoid this.

38

Page 39: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Future work

• Use a generic return-hook mechanism– Kretprobe is for PoC, not for general use

• It can’t detect the failure of hooking

– Should be more safe (e.g. miss-hook handler)

• Context switch hook can be more general– Tracepoint/traceevent makes it better.– This requires kpatch as embedded feature

• Upstreaming

39

Page 40: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Resources and Links

• Get rid of the stop_machine from kpatch• https://github.com/dynup/kpatch/issues/138

• My no-stopmachine branch• https://github.com/mhiramathitachi/kpatch/tree/no-stop

machine-v1

– This requires IPMODIFY flag patchset for kernel• http://thread.gmane.org/gmane.linux.kernel/1757201

40

Page 41: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Questions?

Page 42: Kpatch Without Stop Machine - Linux Foundation
Page 43: Kpatch Without Stop Machine - Linux Foundation

© Hitachi, Ltd. 2014. All rights reserved.

Trademarks

43

• Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

• Other company, product, or service names may be trademarks or service marks of others.