windows kernel debugging session 2
TRANSCRIPT
Windows Kernel Debugging Windows Kernel Debugging (with lab) Session – 2(with lab) Session – 2
Sisimon SomanSisimon Soman
AgendaAgenda
Part 1 – Kernel Debugging conceptsPart 1 – Kernel Debugging conceptsWhy Blue Screen of DEATH ?.Why Blue Screen of DEATH ?.Crash dump writing.Crash dump writing.Types of crash dumps.Types of crash dumps.Classifying Windows Kernel issues.Classifying Windows Kernel issues.Common terms before starting the Workshop.Common terms before starting the Workshop.
Part 2 – Workshop with crash dumps/live systemPart 2 – Workshop with crash dumps/live systemDebugging IRQL_NOT_LESS_THAN_OR_EQUAL.Debugging IRQL_NOT_LESS_THAN_OR_EQUAL.Debugging Higher IRQL scenarios.Debugging Higher IRQL scenarios.Debug Memory leaks.Debug Memory leaks.Debugging Hung system.Debugging Hung system.Debugging Citrix CDM driver issue.Debugging Citrix CDM driver issue.Show me your crash dumps.Show me your crash dumps.
Why Kernel Debugging is Why Kernel Debugging is Important.Important.
Kernel Debugging is a skill for the entire career – Kernel Debugging is a skill for the entire career – OSROSR
Minimal changes in Kernel land – The legacy NT Minimal changes in Kernel land – The legacy NT driver knowledge is still relevant. WDM, WDF , driver knowledge is still relevant. WDM, WDF , Filter Manager, MPIO DSM framework, .. built up Filter Manager, MPIO DSM framework, .. built up over it.over it.
If you are an expert in Kernel debugging, you If you are an expert in Kernel debugging, you know how the OS works. You know the know how the OS works. You know the important Kernel data structure, routines etc.important Kernel data structure, routines etc.
Why BSOD ?Why BSOD ?
Mostly caused by buggy drivers. Drivers and Mostly caused by buggy drivers. Drivers and Kernel share the same memory address space.Kernel share the same memory address space.Cannot recover from fatal error.Cannot recover from fatal error.System/driver call System/driver call KeBugCheckEx(BugCheckCode, <parmas>)KeBugCheckEx(BugCheckCode, <parmas>)KeBugCheck does, KeBugCheck does, – Turn off interruptsTurn off interrupts– In SMP box, tell other processors, ‘I am dying…’In SMP box, tell other processors, ‘I am dying…’– Paint Paint blueblue screen.screen.– Write crash dump to page file.Write crash dump to page file.
Crash dump writing..Crash dump writing..
Not the normal storage stack for writing Not the normal storage stack for writing dump, the error may be because one filter dump, the error may be because one filter driver in the storage stack.driver in the storage stack.
Alternate small stack, may not have a file Alternate small stack, may not have a file system also. system also.
M$ is improving crash dump writing in M$ is improving crash dump writing in each Windows version.each Windows version.
After RebootAfter Reboot
Session manager check if the pagefile has crash Session manager check if the pagefile has crash dump header, protect it.dump header, protect it.Winlogon later check if it need to write the crash Winlogon later check if it need to write the crash dump from pagefile.dump from pagefile.Winlogon execute \windows\system32\Winlogon execute \windows\system32\savedump.exesavedump.exesavedump.exe write the dump file and create savedump.exe write the dump file and create event log entry.event log entry.This step may not be same for all Windows This step may not be same for all Windows version, I cannot see savedump.exe in Vista.version, I cannot see savedump.exe in Vista.
Types of crash dumpsTypes of crash dumps
Minidump and Fulldump.Minidump and Fulldump.Minidump contains,Minidump contains,– Faulting thread context.Faulting thread context.– Faulting thread call stack.Faulting thread call stack.– Other basic information about the faulting thread (!thread output Other basic information about the faulting thread (!thread output
contents)contents)– Usually 64K size.Usually 64K size.
Fulldump contains, Fulldump contains, – Entire system information.Entire system information.– Data saved to page file.Data saved to page file.– Pagefile must be on the boot drive.Pagefile must be on the boot drive.– Write to memory.dmp file on next reboot.Write to memory.dmp file on next reboot.
Types of IssuesTypes of Issues
System hung situation - Get dump using System hung situation - Get dump using WinDBG. Use .dump <filename>WinDBG. Use .dump <filename>
Manual Crash dump using Keyboard (if system Manual Crash dump using Keyboard (if system is able to respond to keys) is able to respond to keys) – HKEY_LOCAL_MACHINE \SYSTEM \HKEY_LOCAL_MACHINE \SYSTEM \
CurrentControlSet \Services \i8042prt \Parameters. CurrentControlSet \Services \i8042prt \Parameters. – CrashOnCtrlScroll set value to 1CrashOnCtrlScroll set value to 1– Reboot the box, press right Ctrl + Scroll Lock + Scroll Reboot the box, press right Ctrl + Scroll Lock + Scroll
LockLock
Crash with bugcheckCrash with bugcheck
Debugging System HungDebugging System Hung
– Check IRQL of all processor(use ~<processor>). If Check IRQL of all processor(use ~<processor>). If spinning for a spick lock, can see it with spinning for a spick lock, can see it with ~<processor>, kv, !irql. ~<processor>, kv, !irql.
– Check current running thread and see any possibility Check current running thread and see any possibility for endless loop.for endless loop.
– Check all IRPs using !irpfind and see any filter driver Check all IRPs using !irpfind and see any filter driver is blocking the flow of IRPs.is blocking the flow of IRPs.
– Use !devobj to see how the stack built-up and what Use !devobj to see how the stack built-up and what are the drivers in stack.are the drivers in stack.
– Check system memory usage using !poolused, !poolCheck system memory usage using !poolused, !pool– Use !locks to see any ERESOURCE dead locks.Use !locks to see any ERESOURCE dead locks.– If no other way, use !stacks and traverse thru each If no other way, use !stacks and traverse thru each
stacks and see what’s is really going on.stacks and see what’s is really going on.
Crash with BugcheckCrash with Bugcheck
– Check the bugcheck code.Check the bugcheck code.– IRQL related issues.IRQL related issues.– Stack overflow, usually see it as double fault. Stack overflow, usually see it as double fault.
(Why it is double fault ?)(Why it is double fault ?)– Memory related issues.Memory related issues.– Issues related to Swine Flu Issues related to Swine Flu – Issues related to Bird Flu Issues related to Bird Flu – Continues…Continues…
Methods for Crash AnalyzeMethods for Crash Analyze
Check each area one by one, bugcheck Check each area one by one, bugcheck code, stack trace, IRP, IRQL, running code, stack trace, IRP, IRQL, running thread info etc.thread info etc.
Slowly by experience you will be able to Slowly by experience you will be able to guess which area to check more, depend guess which area to check more, depend upon the symptom, bugcheck and !upon the symptom, bugcheck and !analyse –v output.analyse –v output.
Trap frameTrap framePoolTagPoolTagSpecial poolSpecial pool– Allocate buffer in end of a pageAllocate buffer in end of a page– Write header in the beginning of the pageWrite header in the beginning of the page– Header contains pattern, size, pool tag etcHeader contains pattern, size, pool tag etc– Fill rest of the page with patternFill rest of the page with pattern– Reserve next page as guard pageReserve next page as guard page– Put a special protection in the PTE of the guard page so that it Put a special protection in the PTE of the guard page so that it
cause access violation if buffer overrun happen.cause access violation if buffer overrun happen.– While freeing the memory OS check the pattern part with the While freeing the memory OS check the pattern part with the
actual pattern from the header. If it mismatches, crash the actual pattern from the header. If it mismatches, crash the system.system.
– Special pool check can configure with driver verifier.Special pool check can configure with driver verifier.
Guard PageGuard Page
BufferBuffer
SignatureSignature
HigherHigherAddressesAddresses
Page nPage n
Page n+1Page n+1
Header(Header(pattern, size,tag)pattern, size,tag)
Analyze Bugcheck - Analyze Bugcheck - IRQL_NOT_LESS_THAN_OR_EQUALIRQL_NOT_LESS_THAN_OR_EQUAL
Most common bugcheck.Most common bugcheck.
Demonstrate !irql, !irp, !devobjDemonstrate !irql, !irp, !devobj
Let’s see the sample dump…Let’s see the sample dump…
Debug pool corruptionDebug pool corruptionMake sure the pool chaining is correctMake sure the pool chaining is correct
kd> !pool bc00248ckd> !pool bc00248c– Pool page bc00248c region is Paged session poolPool page bc00248c region is Paged session pool– bc002000 size: 90 previous size: 0 (Allocated) Gla@bc002000 size: 90 previous size: 0 (Allocated) Gla@– bc002090 size: 10 previous size: 90 (Allocated) Glnkbc002090 size: 10 previous size: 90 (Allocated) Glnk– bc0020a0 size: 20 previous size: 10 (Allocated) Vtfdbc0020a0 size: 20 previous size: 10 (Allocated) Vtfd– bc0020c0 size: 8 previous size: 20 (Free) Gtmpbc0020c0 size: 8 previous size: 20 (Free) Gtmp– bc0020c8 size: 38 previous size: 8 (Free ) Usqmbc0020c8 size: 38 previous size: 8 (Free ) Usqm– bc002100 size: 28 previous size: 38 (Allocated) Gldvbc002100 size: 28 previous size: 38 (Allocated) Gldv– bc002128 size: 58 previous size: 28 (Allocated) GFilbc002128 size: 58 previous size: 28 (Allocated) GFil– bc002180 size: 198 previous size: 58 (Allocated) Uspi Process: 856b2a58bc002180 size: 198 previous size: 58 (Allocated) Uspi Process: 856b2a58– bc002318 size: 18 previous size: 198 (Allocated) Uspi Process: 856b4528bc002318 size: 18 previous size: 198 (Allocated) Uspi Process: 856b4528– bc002330 size: 90 previous size: 18 (Allocated) Gla@bc002330 size: 90 previous size: 18 (Allocated) Gla@– bc0023c0 size: c0 previous size: 90 (Allocated) Gla4bc0023c0 size: c0 previous size: 90 (Allocated) Gla4– *bc002480 size: 80 previous size: c0 (Allocated) *Usms P*bc002480 size: 80 previous size: c0 (Allocated) *Usms P
Debug Memory leaksDebug Memory leaks
!poolused !poolused
!vm!vm
!pool!pool
Analyze the sample setup.Analyze the sample setup.
Debugging System HungDebugging System Hung
Analyze the sample dumpAnalyze the sample dump
Demonstrate !thread, !irql, kv, Demonstrate !thread, !irql, kv, ~<processor>, ~<processor>,
Debugging Wrong IRQLDebugging Wrong IRQL
Analyze the sample dumpAnalyze the sample dump
Debug system hung – Citrix CDM Debug system hung – Citrix CDM driver issue.driver issue.
Analyze the CDM dump.Analyze the CDM dump.
Demonstrate !irpfind, !vm usage.Demonstrate !irpfind, !vm usage.
Show me your dumpsShow me your dumps
Questions ?Questions ?