wcl303 russinovich

62

Upload: conleyc

Post on 20-May-2015

875 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Wcl303 russinovich
Page 2: Wcl303 russinovich

Case of the Unexplained 3Mark RussinovichTechnical FellowMicrosoft CorporationSession Code: WCL303

Page 3: Wcl303 russinovich

About Me

Technical Fellow, MicrosoftCo-founder and chief software architect of Winternals Software Co-author of Windows Internals 4th and 5th edition and Inside Windows 2000 3rd edition with David SolomonAuthor of TechNet Sysinternals

Home of blog and forumsContributing Editor TechNet Magazine, Windows IT Pro MagazinePh.D. in Computer Engineering

Page 4: Wcl303 russinovich

Outline

IntroductionSluggish PerformanceApplication HangsError MessagesApplication CrashesBlue Screens

Page 5: Wcl303 russinovich

Case of the Unexplained…

This is the 2009 version of the “case of the unexplained” talk series

2007 & 2008 versions covered different casesCan view webcast on Sysinternals->Mark’s webcasts

Based on real case studiesSome of these have been written up on my blog

Page 6: Wcl303 russinovich

Troubleshooting

Most applications do a poor job of reporting unexpected errors

Locked, missing or corrupt filesMissing or corrupt registry dataPermissions problems

Errors manifest in several different waysMisleading error messagesCrashes or hangs

Page 7: Wcl303 russinovich

Purpose of Talk

Show you how to solve these classes of problems by peering beneath the surface

Interpreting file and registry activityInterpreting call stacks

You’ll learn tools and techniques to help you solve seemingly unsolvable problems

Page 8: Wcl303 russinovich

Tools We’ll UseSysinternals: www.microsoft.com/technet/sysinternals

Process Explorer – process/thread viewerProcess Monitor – file/registry/process/thread tracingAutoruns – displays all autostart locationsSigCheck – shows file version information PsExec – execute processes remotely or in the system accountPslist – list process information Strings – dumps printable strings in any fileADInsight – real time LDAP (Active Directory) monitorZoomit – presentation tool I’m using

Microsoft downloads:Kernrate – sample-based system profilerVisual Studio: Spy++ - Window analysis utility Debugging Tools for Windows: Windbg application and kernel debugger: www.microsoft.com/whdc/devtools/debugging/Windbg

Page 9: Wcl303 russinovich

Outline

Sluggish PerformanceApplication HangsError MessagesApplication CrashesBlue Screens

Page 10: Wcl303 russinovich

The Case of the Slow Outlook Attachment

User would see CPU burst and Outlook would hang for 15+ seconds whenever they received an attachment:

Page 11: Wcl303 russinovich

Process MonitorProcess Monitor is a real-time file, registry, process and thread monitor

It requires Windows 2000 SP4 w/Update Rollup 1, XP SP2 or higher, Server 2003 SP1 or higher, Vista, or Server 2008 (including 64-bit versions of Windows)It replaces Filemon and Regmon, but you can use Filemon and Regmon on older operating systemsEnhancements over Filemon/Regmon include:

More advanced filteringOperation call stacksBoot-time loggingData mining viewsProcess tree to see short-lived processes

When in doubt, run Process Monitor!It will often show you the cause for error messagesIt many times tells you what is causing sluggish performance

Page 12: Wcl303 russinovich

The Case of the Slow Outlook Attachment (Continued)

Process Monitor trace of next received attachment implicated antivirus:

Page 13: Wcl303 russinovich

The Case of the Slow Outlook Attachment: Solved

Searched web for confirmation:

Checked AV settings found problematic option and disabled scanning:

Page 14: Wcl303 russinovich

Process Explorer

Process Explorer is a Task Manager replacementYou can literally replace Task Manager with Options->Replace Task Manager

Hide-when-minimize to always have it handyHover the mouse to see a tooltip showing the process consuming the most CPU

Open System Information graph to see CPU usage history

Graphs are time stamped with hover showing biggest consumer at point in timeAlso includes other activity such as I/O, kernel memory limits

Page 15: Wcl303 russinovich

The Case of the Periodic VMWare Freezes

Noticed CPU peg every 10 seconds and the desktop freeze when running VMWare Saw in the Process Explorer System Information graph that it was the System process:

Page 16: Wcl303 russinovich

Processes and ThreadsA process represents an instance of a running program

Address spaceResources (e.g., open handles)Security profile (token)

A thread is an execution context within a processUnit of scheduling (threads run, processes don’t run)All threads in a process share the same per-process address space

The System process is the default home for kernel mode system threads

Functions in OS and some drivers that need to run as real threadsE.g., need to run concurrently with other system activity, wait on timers, perform background “housekeeping” work

Other host processes: svchost, Iexplore, mmc, dllhost

Page 17: Wcl303 russinovich

Viewing ThreadsTask Manager doesn’t show thread details within a processProcess Explorer does on “Threads” tabDisplays thread details such as ID, CPU usage, start time, state, priorityStart address is where the thread began running (not where it is now)Click Module to get details on module containing thread start address

Page 18: Wcl303 russinovich

Thread Start Functions and Symbol Information

Process Explorer can map the addresses within a module to the names of functions

This can help identify which component within a process is responsible for CPU usage

Requires symbol information:Download the latest Debugging Tools for Windows from Microsoft (free)Configure Process Monitor’s symbol engine:

Use dbghelp.dll from the Debugging ToolsPoint at the Microsoft public symbol server (or internal symbol server if you have access)Can configure multiple symbol paths separated by “;”

Page 19: Wcl303 russinovich

The Case of the Periodic VMWare Freezes: Solved

Opened Threads tab for System process and paused after a spike:

Ftser2k was XM Radio USB/Serial driverStopping it didn’t remove spikes

Http.sys is IIS kernel-mode cache driverWent to device manager and showed hidden devicesStopped http.sys and hangs went awayDidn’t care about dependent services

Page 20: Wcl303 russinovich

The Case of the Runaway Internet Explorer

Noticed a CPU spike and hovered over Process Explorer to see culprit:

That was unexpected, because had just installed Adobe Acrobat Reader and exited Internet Explorer

IE’s window wasn’t visible, but it was still in the process list

Page 21: Wcl303 russinovich

The Case of the Runaway Internet Explorer: Investigation

The thread had a generic start address:

Required deeper investigation…

Page 22: Wcl303 russinovich

Call Stacks

Sometimes a thread start address doesn’t tell you what a thread is doingThe stack might provide a hint:

The stack is a per-thread region of memory that records a history of function nestingThe bottom from (Function 3) is where the thread will continue executing

Function 2

Function 1

Function 3

Page 23: Wcl303 russinovich

Viewing Call StacksClick Stack on the Threads tab to view a thread’s call stack

Lists functions in reverse chronological order

Note that start address on Threads tab is different than first function shown in stack

This is because all threads created by Windows programs start in a library function in Kernel32.dll which calls the programmed start address

Page 24: Wcl303 russinovich

The Case of the Runaway Internet Explorer: Stack Investigation

I double-clicked on the thread to see its stack:

Page 25: Wcl303 russinovich

The Case of the Runaway Internet Explorer: What is GP.OCX?

Opened DLL view to see DLL’s version information:

DLL Search Online didn’t return any useful results

Page 26: Wcl303 russinovich

The Case of the Runaway Internet Explorer: Solved

Searched for NOS Microsystems:

Conclusion: Adobe uses gp.ocx, which had hit an infinite-loop bug

Terminated IE process to stop CPU usage

Page 27: Wcl303 russinovich

Outline

Sluggish PerformanceApplication HangsError MessagesApplication CrashesBlue Screens

Page 28: Wcl303 russinovich

The Case of the Logon Script HangsMultiple users complained that logon would take three minutes

Investigation revealed that all complaints were from Dell Precision 670 workstationsBut only some of the 670 workstations were affected

User configured Process Explorer to run during logon and saw Lisa Client consuming CPU:

Lisa Client was custom logon application that checked system for installed applicationsLisa Client CPU then went idle for several minutes, then exited and system would start acting normally

Page 29: Wcl303 russinovich

The Case of the Logon Script Hangs (Continued)

User captured a Process Monitor trace after manually running Lisa Client

Saw three-minute delay correspond to device error:

Details column showed IOCTL_SCSI_PASS_THROUGH

Captured trace on working system and looked for IOCTL_SCSI_PASS_THROUGH operation

No device error and no delay:

Page 30: Wcl303 russinovich

The Case of the Logon Script Hangs: Solved

Device error lead user to look at disks:Working systems had Fujitsu disks Systems with hangs had Seagate

Solution: Temporary: wrote WMI script that queried disk type and would not launch Lisa Client on Seagate systemsFinal: Application developers changed Lisa Client to avoid performing problematic command

Page 31: Wcl303 russinovich

Outline

Sluggish PerformanceApplication HangsError MessagesApplication CrashesBlue ScreensUndocumented Settings

Page 32: Wcl303 russinovich

The Case of the MMC Startup Failure

User would get an error every time they started an MMC snapin:

Page 33: Wcl303 russinovich

The Case of the MMC Startup Failure: Solved

Ran Process Monitor and saw an Access Denied error on an IE registry key:

Checked permissions and Administrators had no accessSolution: added full-access for Administrators and MMC started successfully

Page 34: Wcl303 russinovich

The Case of the Favorite that Wouldn’t Save

User tried to change the URL for one of his IE favorites:

Trying to save a new favorite resulted in a similar error:

Page 35: Wcl303 russinovich

The Case of the Favorite that Wouldn’t Save: Solved

Captured a Process Monitor trace:

AccessChk showed that folder was Medium Integrity (IE requires Low):

Fixed integrity with Icacls and problem solved

Page 36: Wcl303 russinovich

The Case of the Persistent Executable

Noticed that opening volumes in Explorer was really slowVolume context menu indicated presence of Autorun.inf

Page 37: Wcl303 russinovich

The Case of the Persistent Executable (Continued)

Files reappeared after deleting, so monitored activity with Process Monitor

File was recreated by Explorer, so looked at stack

Page 38: Wcl303 russinovich

Viewing AutostartsUse Autoruns to see what’s configured to start when the system boots and you login

Windows MsConfig shows a subset defined autostart locationsMsConfig doesn’t show as much information

Page 39: Wcl303 russinovich

The Case of the Persistent Executable (Solved)

Process Explorer DLL search showed that amvo.dll loaded into Explorer and all its children

Found amv0.exe and used Autoruns to delete it from the system Run key

Page 40: Wcl303 russinovich

Outline

Sluggish PerformanceApplication HangsError MessagesApplication CrashesBlue Screens

Page 41: Wcl303 russinovich

Application Crashes

In most cases, there’s nothing you can do about application crashes

They are caused by a bug in in the programOnly the developer can fix a bug

However, the crash may be caused by misconfiguration or an extension (a plugin)

Monitor the application’s crash with Process Monitor if it’s reproducibleLook for extensions in the crash file with Windbg

Page 42: Wcl303 russinovich

Finding the Crash Dump

On pre-Vista systems, finding the dump file is easy:

Page 43: Wcl303 russinovich

Attaching to the Dying ProcessVista doesn’t save crash dumps for most crashes

Only if Microsoft requests a dump for study and you send it inWhen a crash occurs, don’t dismiss the crash dialog:

Launch Windbg and attach to the process

You can save a dump with the .dumpcommand

Page 44: Wcl303 russinovich

Identifying the Crashed Process

On Vista, the process name might not be enough to identify the instance that’s crashed:

To determine the PID of the crashed instance, look at WerFault’s command line:

Page 45: Wcl303 russinovich

Enabling Dump Archiving on Vista and Windows Server 2008

Or you can configure Vista SP1 and Windows Server 2008 to always generate and save a dump file

Create a key named:HKLM\Software\Microsoft\Windows\Windows Error Reporting\LocalDumps

Dumps go to %LOCALAPPDATA%\CrashDumpsOverride with a DumpFolder value (REG_EXPAND_SZ)Limit dump history with a DumpCount value (DWORD)

Page 46: Wcl303 russinovich

Analyzing a CrashBasic crash dump analysis is easy and it might tell you the cause

Requires Windbg and symbol configurationOnce the dump is loaded, find the faulting thread

The debugger might identify itIf the debugger doesn’t, examine each thread stack looking for “fault”, “exception”, or “error” names

Examine the stack of the faulting thread to look for third-party pluginsIf you suspect an extension:

Check for a new version Uninstall it if the problem persists

Page 47: Wcl303 russinovich

The Case of the Explorer Context Menu Crash

Explorer would randomly crash when the user right-clicked on a fileAttached to process and executed !analyze -v:

Didn’t know what muangys.dll was and because module was unloaded, Windbg provided no information

Page 48: Wcl303 russinovich

The Case of the Explorer Context Menu Crash (Cont)

Ran Process Explorer and looked at Explorer DLL view to find muangys.dll:

File had no version information, but Strings identified the company and application:

Page 49: Wcl303 russinovich

The Case of the Explorer Context Menu Crash: Solved

Was part of Icon editing software, which developer relied upon

No newer versionSolution: disable shell extension with Autoruns

Page 50: Wcl303 russinovich

Outline

Sluggish PerformanceApplication HangsError MessagesApplication CrashesBlue Screens

Page 51: Wcl303 russinovich

Crashes and HangsWindows has various components that run in Kernel Mode, the highest privilege mode of the OS

OS components: Ntoskrnl.exe, Hal.dllDrivers: Ntfs.sys, Tcpip.sys, device drivers

Kernel-mode components are privileged extensions to the OS have to adhere to various rules

Not accessing invalid memoryAccessing memory at the right “Interrupt Request Level”Not causing resource deadlocks

When a kernel-mode component performs an illegal operation, Windows crashes (blue screens)

Crashing helps preserve the integrity of user dataA resource deadlock can hang the system

Page 52: Wcl303 russinovich

Online Crash AnalysisWhen you reboot after a crash, Windows offers to upload it to Microsoft Online Crash Analysis (OCA)

Automated server generates a thumbprint of the crash and uses it as a key in a databaseIf the database has an entry, the user is told the cause and directed at a fix

Page 53: Wcl303 russinovich

Basic Crash Dump Analysis

Many times OCA doesn’t know the cause:

Basic crash dump analysis is easy and it might tell you the cause

Requires Windbg and symbol configurationDump files are in either:

\Windows\Memory.dmp: Vista and servers\Windows\Minidump: Windows 2000 Pro and Windows XP

Page 54: Wcl303 russinovich

The Case of the Crashed Phone CallLaptop crashed during a Skype VOIP call

User reconnected and system crashed againMinidump file pointed at Intel wireless driver:

Page 55: Wcl303 russinovich

The Case of the Crashed Phone Call (Cont)

Looked at file properties to determine what device the driver was for:

Found device in Device Manager:

Page 56: Wcl303 russinovich

The Case of the Crashed Phone Call (Cont)

Right-clicked and checked Windows Update for newer driver:

Need to check OEM site, so had to find version number

Page 57: Wcl303 russinovich

The Case of the Crashed Phone Call: Solved

OEM site had older version:

Intel site had newer one:

Installed and crashes stopped

Page 58: Wcl303 russinovich

Summary and More InformationA few basic tools and techniques can solve seemingly impossible problems

I learn by always trying to determine the root causeResources:

Webcasts of two previous “Case of the Unexplained “ talkedSysinternals->Mark’s Webcasts

Sysinternals Video Library: in-depth dive on tools and troubleshootingMy blogWindows Internals: understand the way the OS works

If you’ve solved one, send me a description, screenshots and log files!

I’ll send you a signed copy of Windows Internals

Page 59: Wcl303 russinovich

www.microsoft.com/teched

Sessions On-Demand & Community

http://microsoft.com/technet

Resources for IT Professionals

http://microsoft.com/msdn

Resources for Developers

www.microsoft.com/learningMicrosoft Certification and Training Resources

www.microsoft.com/learning

Microsoft Certification & Training Resources

Resources

Page 60: Wcl303 russinovich

Track Resources→Want to find out which Windows Client sessions are best suited to help you in your deployment lifecycle? →Want to talk face-to-face with folks from the Windows Product Team?

Meet us today at the

Springboard Series Lounge, or visit us at www.microsoft.com/springboard

Springboard SeriesThe Springboard Series empowers you to select the right resources, at the right technical

level, at the right point in your Windows® Client adoption and management process. Come see why Springboard Series is your destination for Windows 7.

Page 61: Wcl303 russinovich

Complete an evaluation on CommNet and enter to win!

Page 62: Wcl303 russinovich

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,

IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.