deadlock detection in linux

37
Deadlock Detection in Linux By: Gal Nave and Dan Slov Supervisor: Dmitri Perelman Technion, Electrical engineering department NSSL Laboratory

Upload: deanna

Post on 22-Feb-2016

135 views

Category:

Documents


0 download

DESCRIPTION

Technion , Electrical engineering department NSSL Laboratory. Deadlock Detection in Linux . By: Gal Nave and Dan Slov Supervisor: Dmitri Perelman. Overview Introduction Deadlock detection algorithm Algorithm implementation in Linux Summary. Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Deadlock Detection in Linux

Deadlock Detection in Linux

By: Gal Nave and Dan SlovSupervisor: Dmitri Perelman

Technion, Electrical engineering departmentNSSL Laboratory

Page 2: Deadlock Detection in Linux

Overview

• Introduction

• Deadlock detection algorithm

• Algorithm implementation in Linux

• Summary

Page 3: Deadlock Detection in Linux

Introduction

Parallel vs Serial computation

• An evolution of serial computing •Parallel comp solves larger problems

• Provides concurrency

•Faster Bigger Better More principle

Page 4: Deadlock Detection in Linux

Introduction (cont.)

Parallel computing pitfalls

• Synchronization •Ordering

• DEADLOCKS

• We will concentrate on deadlocks from now on

Page 5: Deadlock Detection in Linux

Problem definition

Deadlock is a specific condition when two or more processes (or threads) are each waiting for each other to release a resource, thus forming a circular chain

Page 6: Deadlock Detection in Linux

Project Definition :

•Design and implementation of deadlock detection mechanism in Linux

• Supply support for successful debugging - backtrace - dependency description

• Evaluate the performance overhead implied by the solution

Page 7: Deadlock Detection in Linux

The Algorithm

Deadlock detection algorithm by M.Herlihy and E.Koskinen which they called:

DREADLOCKS

Page 8: Deadlock Detection in Linux

Deadlock Detection Algorithm • The algorithm exploits the fact that during busy wait no useful work is done

•The thread that tries to lock a mutex checks if it is not already lock using atomic operation test-and-set (TSL)

•If the mutex is already locked, the thread waits for it to get unlocked (or in other words thread spins about the lock).

•From time to time a thread tests the mutex using TSL :

Test-and-setIf the lock has already been set by another thread - waitRepeat until TSL returns zero (the lock is set)

Page 9: Deadlock Detection in Linux

Deadlock Detection Algorithm (cont.) •

The Idea :

•Why don’t we use this time between 2 consecutive test-and-set’s to look for deadlock!!!!!

THE BASIC ASSUMPTION OF THE ALGORITHM IS THAT THE TIME WHILE A THREAD IS SPINNING COULD BE USED FOR SOME USEFUL WORK – DEADLOCK DETECTION!!!

Page 10: Deadlock Detection in Linux

Deadlock Detection Algorithm (cont.)

•Now lock algorithm would look like the following:

1. Try to lock the mutex using test-and-set2. If the lock has already been set by another thread – try to detect a deadlock3. If there is no deadlock try to lock the mutex again using test-and-set Else if there is a deadlock alert the user4. Repeat until TSL returns zero (the lock is set) or deadlock is detected

Page 11: Deadlock Detection in Linux

Deadlock Detection – how it gets done •Each thread has a list of processes/threads, it is waiting for, to acquire some resource (mutex ). Let’s call this list digest.

•Thread trying to acquire mutex that is already locked checks the owner’s digest for its TID.

Digest of A:

{}

Digest of B:

{}

Page 12: Deadlock Detection in Linux

Deadlock Detection – how it gets done •If TID is found in mutex owner’s digest, it would imply that thread is waiting for mutex owner to release the lock while owner is waiting for the thread itself to release another lock – classic deadlock!!!!

Digest of A:

{B}

Digest of B:

{A}

Page 13: Deadlock Detection in Linux

Deadlock Detection – how it gets done

•If TID is not found - set union of the thread’s digest with the one of the mutex owner.

•Keep spinning until the lock is acquired or deadlock is detected

Digest of A:

{B}

Digest of B:

{}

Page 14: Deadlock Detection in Linux

Algorithm Implementation for Linux :

The Implementation is based upon:

• GLIBC GNU C library version 2.6

• NPTL Native POSIX Thread Library

• Any distribution of Linux supporting glibc 2.6

Page 15: Deadlock Detection in Linux

Implementation Details

• Three most important structure in implementations are:

• thread struct• mutex struct• digest struct

• The details of each of these structs are in the next foils

Page 16: Deadlock Detection in Linux

Thread struct modification

• Each thread is described by a structure defined in descr.h

• the structure defining thread includes all sorts of fields like thread ID, attributes etc

• We have added additional field to the structure to hold the digest entry (the list of TID’s the thread is waiting for)

• Initially digest is empty

• If the thread that tries to acquire mutex and sees that its already locked, it scans the mutex owning thread’s digest for its own ID.

Page 17: Deadlock Detection in Linux

Digest struct

• Dependency list is implemented as a linked list.

• digest of the thread that is waiting for a mutex has a field pointing at the thread owner’s digest:

Digest of a thread that is waiting for a mutex

Digest of a mutex owning thread

Mutex owning thread’s digest may point to other digests

Page 18: Deadlock Detection in Linux

Digest structure detailed

• Digest structure is defined in pthread_digest.h :

typedef struct _digest_t {unsigned __tid;int __time_stamp;int __cnt;int __ref_cnt;int __is_alive;pthread_digest_p digest;} pthread_digest_t;

•Explanation all the fields purpose will follow

Page 19: Deadlock Detection in Linux

Digest structure detailed

• __tid specifies thread ID of the thread owning the digest

•__time_stamp specifies the time of last update of the digest

•__cnt counts the number of mutexes the thread holds

•__ref_cnt specifies the number of threads pointing at the digest

• digest points at the next digest (NULL if the thread is not waiting for a mutex)

Page 20: Deadlock Detection in Linux

Implementation Details (cont.)

Since digest is ADT, some methods should be added to allow stronger decoupling.

Most important actions on digest are:

• append another thread’s digest• upon acquiring a mutex release dependency list• update mutex owner• compare time stamps of two digests• print dependency list and stack n case of a deadlock

Page 21: Deadlock Detection in Linux

The usage of digest ADT methods in the algorithm

1. Init_thread_digest ()2. Try to lock the mutex using test-and-set3. If the lock has already been set by another thread – append_owner_digest()

compare_time_stamp() if time_stamp is outdatedscan_digest()elsedo nothing

4. If there is no deadlock try to lock the mutex again using test-and-set Else if there is a deadlock print_dependency_list() 5. Repeat step 2-4 until the lock is set or deadlock is detected6. If mutex is acquired

release_dependency_list()update_mutex_owner()

Page 22: Deadlock Detection in Linux

Mutex structure

• How to get a pointer to owning mutex thread’s digest? • Mutex struct needs to be modified!!!

•Each mutex is described by a structure defined in pthreadtypes.h

•Additional field is added to mutex_t structure to hold the owner digest entry

• If a thread acquires mutex it updates digest owner field in a mutex structure to contain a pointer to the thread’s digest

Page 23: Deadlock Detection in Linux

What if deadlock is detected?

The following is done:

• Print dependency list to stderr

• Print backtrace to stderr

• Return deadlock_found error code

Page 24: Deadlock Detection in Linux

What if deadlock is detected?

stderr example (real example from the test):

Thread ID 1090525520 is waiting for thread ID 1082132816Thread ID 1082132816 is waiting for thread ID 1090525520Thread ID 1090525520 has detected a deadlock...Backtrace/home/dmitri/deadlock_detection/glibc261_build/nptl/libpthread.so.0 [0x2afa80b53fbc]/home/dmitri/deadlock_detection/glibc261_build/nptl/libpthread.so.0 [0x2afa80b54088]/home/dmitri/deadlock_detection/glibc261_build/nptl/libpthread.so.0(pthread_mutex_lock+0x1db) [0x2afa80b4c6ab]./simp_test(lock_mutex+0x15) [0x400d41]./simp_test(thr_b_func+0x5b) [0x400e4c]

Page 25: Deadlock Detection in Linux

Problems and solutions

• Memory leakage

Description:

Thread finished its task and exits and all the memory it used gets freed. But other threads may point at its digest!!!

Solution:

Add 2 additional fields to digest structure: _is_alive to delete digest logically_ref_count to count how many threads reference the digest. Free the memory if is_alive is false and ref_cnt is zero.

Page 26: Deadlock Detection in Linux

Problems and solutions

• Memory leakage take 2

Description:

So thread does not necessarily free the digest memory upon exiting, but operating system does! Operating system has a maintaining daemon process that frees all the leaked memory.

Solution:

Add a global hash-table that references all digests. Delete entries from the hash table using the same principle as in freeing the digest memory Standard hash table of glibc 2.6 is not suitable due to very limited number of operations on it, so another library was added gnu hash table: ghtlib

Page 27: Deadlock Detection in Linux

Problems and solutions

• Mutex struct is used by a lot (all ?) processes .

Description:

Mutex struct is used by a lot if not all processes and change in its size requires changes in kernel.

Solution:

Remove changes from mutex struct and add another hash table to contain all mutexes, using mutex address as a key, and a pointer to the digest of the owner as a data.Performance degradation? Not really!!!

Page 28: Deadlock Detection in Linux

Problems and solutions

• Thread struct is used by a lot (all ?) processes .

Description:

Thread struct is used by a lot if not all processes and change in its size requires changes in kernel.

Solution:

Remove changes from thread struct and use one of the field provided by glibc author as a size buffer.

Page 29: Deadlock Detection in Linux

Verification

The following test suit was used :

Controlled deadlockDescription: Deliberately create deadlock using small amounts

of thread, so its easy to monitor the detection. Compare to standard glibc

Test result: Passed. Dependency list was printed while standard glibc got frozen in a deadlock.

Statistical deadlockDescription: Create a number of threads (from 10 to 150

depending on test) that randomly lock mutexes controlling the probability of deadlock by a number of locked mutexes and a number of threads.

Test result: Number of detected deadlocks are proportional to a number of randomly locked mutexes.

Page 30: Deadlock Detection in Linux

Verification (cont.)The typical graph (50 threads locking number of random

mutexes)

Page 31: Deadlock Detection in Linux

Performance EvaluationThe following benchmarks were used to evaluate

performance:

1. Locking performance

• Create a fixed number of threads

• Let each thread lock and unlock the same number of mutexes, without doing any other task

• Repeat the step above for different number of mutexes for both standard and modified glibc.

Page 32: Deadlock Detection in Linux

LOCKING PERFORMANCE (50 threads locking number of mutexes)

Page 33: Deadlock Detection in Linux

Performance Evaluation (cont.)

2. CPU bound performance

• Create a fixed number of threads (50 for instance)

• Let each thread lock and unlock the same number of mutexes, while doing a heavy calculation in between

• Repeat the step above for different number of mutexes for both standard and modified glibc. The graph below represents typical picture where 50 mutexes were used

Page 34: Deadlock Detection in Linux

Performance Evaluation (cont.)CPU BOUND PERFORMANCE

Page 35: Deadlock Detection in Linux

Usage Proposal

• Compile the library using provided makefile

• Either install it or leave it as an alternative to mainstream glibc using LD_LIBRARY_PATH system variable

• Could be configured to do the following:

a) Quit the task and return the error code in case of deadlockb) Print to stderr the deadlock information and remain in

deadlock letting user to decide what to do c) Detect hotspots (printing __ref_cnt field from digest

structure) may be easily added

Page 36: Deadlock Detection in Linux

Summary

• The integration of deadlock detection mechanism into Linux is definitely possible

• Simple deadlocks may and should be detected

• Programs with not very critical performance and relatively low number of locking operation could use deadlock detection mechanism all the time without major impact on performance

• NPTL is not completely decoupled from the kernel and therefore some changes in kernel are needed to make deadlock detection more effective

Page 37: Deadlock Detection in Linux

Final thoughts:

•Most of personal computers havemore than one core

•Parallel programming is gettingmore and more essential

•Deadlocks problem becomes THE PROBLEM

• Deadlock prevention is not in our hands

• Deadlock avoidance demands operational system overhead

• Deadlock detection may be effective and low cost in particular cases

• GNU C library allows code modification