lkcd linux kernel crash dumps
DESCRIPTION
LKCD Linux Kernel Crash Dumps. Matt D. Robinson [email protected]. LKCD Overview. Description Kernel Implementation Configuration Invocation/Kernel State User-Level Analysis (lcrash) lcrash Example Output Future Development/Evolution. Description. - PowerPoint PPT PresentationTRANSCRIPT
04/19/23 Version 1.0 2
LKCD Overview
Description
Kernel Implementation
Configuration
Invocation/Kernel State
User-Level Analysis (lcrash)
lcrash Example Output
Future Development/Evolution
04/19/23 Version 1.0 3
Description
LKCD is a set of kernel and application code to configure, implement, and analyze system crash dumps.
These slides will cover a high-level view of the kernel side of LKCD, with a brief introduction to the user-level analysis tools.
04/19/23 Version 1.0 4
Kernel Implementation
dump.o is the primary kernel driver, and can be either a module or built by default into the kernelDump driver is dormant until either invoked for configuration or for dumpingConfiguration of dump device determines what occurs on invocationDisruptive and non-disruptive dumping available
04/19/23 Version 1.0 5
Kernel Implementation
Dump compression available through modules (or standalone) – GZIP or RLE
Access to dump driver through /dev/dump (device pair 227,0)
panic() or die_if_kernel() will invoke the dumping process – dumping only occurs if dumps are configured
04/19/23 Version 1.0 6
Kernel Implementation
Current dump path uses existing I/O subsystem for dumping
Disks (primarily swap) are used for now – future direction will be MUCH different
panic() die_if_kernel()
dump()
dump_execute()
dump_add_page()
dump_write_pages()
dump_compress_page()
I/O Subsystem(Disk, Network, Etc.)
04/19/23 Version 1.0 7
Configuration
Dump configuration takes place via ioctl() to the kernel driver: DIOSDUMPLEVEL
DUMP_LEVEL_NONE – Don’t dump any pages DUMP_LEVEL_ALL – Dump all memory pages DUMP_LEVEL_KERN – Dump just kernel level pages
DIOSDUMPFLAGS DUMP_FLAGS_NONE – No flags set DUMP_FLAGS_NONDISRUPT – Try and continue
standard system operation after a dump takes place
04/19/23 Version 1.0 8
Configuration
DIOSDUMPCOMPRESS DUMP_COMPRESS_NONE – Raw dump format DUMP_COMPRESS_RLE – Use RLE compression DUMP_COMPRESS_GZIP – Use GZIP compression
DIOSDUMPDEV This is the device to dump to (for example, /dev/sda4)
Each configuration parameter is dependent on the system state, whether dump compression is loaded into the kernel, etc.
04/19/23 Version 1.0 9
User-Level Analysis (lcrash)
Linux Crash (lcrash) is used for analyzing system crash dumps. It is extremely powerful for support and engineering personnel for finding solutions to kernel crashes:
Evaluates CPU state Mode, register settings, etc.
Displays all tasks Includes which task is running on a given CPU
Stack trace for each running task This is accomplished WITHOUT frame pointers built into the kernel (-
fomit-frame-pointer)
Allows for memory dumping, struct analysis, finding symbols, etc. lcrash is amazingly versatile for problem analysis Crash dump reports can be created automatically on boot-up after a system
crash
04/19/23 Version 1.0 10
lcrash Example Output>> stat | head
sysname : Linux
nodename : crashme.atmyhouse.com
release : 2.4.8
version : #9 SMP Mon Dec 10 00:05:19 PST 2001
machine : i686
domainname : (none)
LOG_BUF:
>> dump log_buf 10
0xc0332c60: 4c3e343c 78756e69 72657620 6e6f6973 : <4>Linux version
0xc0332c70: 342e3220 2820382e 746f6f72 74617740 : 2.4.8 (root@cra
0xc0332c80: 79657265 70612e65 : shme.atm
04/19/23 Version 1.0 11
lcrash Example Output>> task ADDR UID PID PPID STATE FLAGS CPU NAME======================================================================0xc02e4000 0 0 0 0 0 - swapper0xdfffc000 0 1 0 0 0x100 - init0xdfff2000 0 2 1 1 0x40 - keventd0xdffee000 0 3 0 0 0x40 - ksoftirqd_CPU0
[ . . . ]
0xde47a000 0 867 1 1 0x100 - mingetty0xda0fe000 0 1017 660 0 0x140 - sshd0xd9c06000 0 1018 1017 1 0x100 - bash0xde4b4000 0 1101 1018 0 0x100 0 insmod======================================================================31 active task structs found
04/19/23 Version 1.0 12
lcrash Example Output>> t 0xda0fe000=========================================================STACK TRACE FOR TASK: 0xda0fe000(sshd) 0 schedule+1040 [0xc0111250] 1 schedule_timeout+121 [0xc0110d89] 2 do_select+506 [0xc014251a] 3 sys_select+820 [0xc01428c4] 4 system_call+44 [0xc0106ed4]=========================================================
>> fsym panic_timeout ADDR OFFSET TYPE NAME============================================================0xc0332804 0 GLOBAL_DATA panic_timeout============================================================1 symbol found
>> od panic_timeout0xc0332804: 00000005 : ....
04/19/23 Version 1.0 13
lcrash Example Output>> px ((struct task_struct *)0xd8abf000).thread.esp00x15a159
>> px ((struct task_struct *)0xd8abf000).thread.debugreg[0]0x0
>> whatis user_structstruct user_struct { atomic_t __count; atomic_t processes; atomic_t files; struct user_struct *next; struct user_struct **pprev; uid_t uid;};
>> px (struct user_struct *)(((struct task_struct *)0xd8abf000).user).uid0xfffff000
04/19/23 Version 1.0 14
Future Development/Evolution
The 2.5 implementation of LKCD will use dump methods to allow multiple dumping paths through the kernel (multiple devices!)Low-level device drivers will register their own set of dump functions so that each driver does what it thinks is correctAdditions to lcrash and other LKCD utilities will be extended to allow for this functionalityLKCD will be extended to work on multiple OS architectures (such as FreeBSD)
04/19/23 Version 1.0 15
Questions/Comments?