checking the health of your active directory enviornment
DESCRIPTION
TRANSCRIPT
Stanley Lopez, Senior Premier Field Engineer
February 24, 2012
Checking the Health of your Active Directory Environment
Canada
US
Latam
UK
WE
France
Germany
CEE
Japan
India APAC
MEA
GCR
Overview of PFE Premier Field Engineering (PFE) provides technical
leadership for Microsoft’s Premier customers around the
world to promote health in their IT environments through
onsite, remote and dedicated support services.
Envision
Project Planning
Build
Stabilize
Deploy
Operate
3
Microsoft
Confidential
Assess Plan Stabilize Educate Prevent Optimize
Driving Operations Excellence
• Active
Directory,
Exchange &
Windows
Server Risk
Assessment
& Health
Check
Program -
ADRAP
Operations
RAP
Operation
Strategic
Review
Messaging
Service Map
ADRAP
Remediation
* Dedicated
Support
Engineer for
Exchange &
Windows
Servers
Troubleshooti
ng & Disaster
Recovery
Workshop
Roles &
Knowledge
Management
Desired
Configuration
Management
Proactive
Monitoring
Management
Software
Update
Management
• Monthly
Hot Fix
Change and
Configuration
Management
Service Level
Management
Service
Catalog
Design
Capacity
Management
Get Healthy Stay Healthy
Ready for Business & Mission Critical Support
Is Your AD Healthy?
Major Components of Active
Directory
Active Directory Replication
SYSVOL Replication
Name Resolution
Domain Controller health
Why DR is important for AD
5
Microsoft Confidential
Major Components of Active Directory
Active Directory
Replication
SYSVOL
Replication
Name Resolution
Domain Controller
Health
Disaster Recovery
6
Microsoft Confidential
Active Directory Replication
Active Directory
Replication
SYSVOL
Replication
Name Resolution
Domain Controller
Health
Disaster Recovery
7
Active Directory Replication
Synchronizes changes between domain controllers in a multi-master environment
Ensures data stored on all domain controllers is consistent
Replication Model and Benefits
Multi-Master
– Scalability, Reliability and High Availability
Store and forward
– Reduce communication over WAN Links
Pull Replication
– Request-Pull
– Request consist of data already received
State-based and Attribute Level Replication
– Minimize replication traffic
Active Directory Replication 101
8
Domain Y
Domain DNS Zone
Domain
Forest DNZ Zone
Configuration
Schema
Replication occurs at partition level
Directory Partition Replicas
Forest-wide Replication
Domain-wide Replication
Note: sometimes called as NC (Naming Context)
Act
ive D
irect
ory
Data
base
NTD
S.D
IT
Global Catalogue
9
Replication Topology
Connection Object
ISTG
Bridgehead Server
Site A
Site Link A-B
Cost 100/Interval 15
Site B Site C
Site Link A-C
Cost 100/Interval 180
ISTG Bridgehead Server
/ ISTG
Subnets
10
Connections
A one-way, inbound route from one DC, the source to another DC, the destination
Site
Define sets of DC that are well connected together, in terms of speed and cost
A site contains one or more subnets
A site can contain more than one domain and one domain can span more than one site
Within a site, the replication topology is generated by KCC automatically
Site Links
Between sites, site link have to be established in order for the KCC (ISTG) to generate the topology across the
sites
Site link contains the schedule which determines when replication can take place as well as an assigned ‘cost’
Site Link Bridge
When more than 2 sites are linked for replication and use the same transport, all of the site link are ‘bridged’
Site link bridge are ‘transitive’
Bridgehead Server
Designated server to perform site-to-site replication, for each directory partition
Bridgehead servers can be designated by the administrator or automatically assigned by KCC
Inter-Site Topology Generator (ISTG)
Within a site, KCC will run on each DC to generate the topology for the site
Between sites, a DC will be designated as the ISTG to generate the topology for inter-site replication
The first DC for the site automatically becomes the ISTG
ISTG need not necessary be a bridge head server
Inter-site Replication Topology
11
KCC vs. Manually created connection objects
No automatic fail-over for manually created connection objects
Directory partition connection
One for Schema and Configuration, one for Domain
Global Catalog Replication
Connection required for ISTG to create inter-site topology
Bridgehead Servers
2000 – One per domain/per site
2003 and above – more than one may be selected
Subnets to site mapping
Ensure that clients communicate with the ‘closest’ DC
Things to note…
12
Repadmin
Active Directory Sites and Services
Event viewer
DCDiag
Replmon
Active Directory Topology Diagrammer (ADTD)
Checking Replication
13
Verify Forest-wide replication status at least once a week and prior
to making major changes that rely on directory replication
Monitor ISTGs and Bridgehead servers more frequently
DO NOT
Fix DC that has not been replicating for more than TSL
Restore backups more than the TSL
Decrease TSL without proper understanding of the impact, unless
there is a strong justification for it.
Create manual connection objects unnecessarily
Assign preferred bridgehead servers without both a compelling
reason and thorough understanding of expected results
Change default setting without a proper understanding of the
implications
AD Replication Best Practices
14
Microsoft Confidential
SYSVOL Replication
Active Directory
Replication
SYSVOL
Replication
Name Resolution
Domain Controller
Health
Disaster Recovery
15
File Replication Services
Distributed File Replication Services
SYSVOL Replication
16
Verify dependent services are functioning
Name Resolution
AD Replication
Review FRS status
SONAR
Event Logs
FRSDiag
Review DFRS status
DFS Replication has an in-box diagnostic report for the
replication backlog, replication efficiency, and the number of files
and folders in a given replication group
Dfsrdiag.exe is a command-line tool that can generate a backlog
count or trigger a propagation test. Both show the state of
replication.
Checking SYSVOL replication
17
Replication/FRS failures undetected
Journal Wrap failures
FRS service not running
Improper decommissioning of domain controllers
SYSVOL partition running out of disk space
Storing non-group policy files in SYSVOL
Configuring inappropriate permissions on SYSVOL folders
Manual copying/deleting of files
Improper use of D2/D4
Excessive Replication
File system policy
Anti-Virus Software
Defragmenter
Sharing Violation
Files held open by applications
Common pitfalls for FRS
18
Proactively monitor AD and FRS replication
Monitor the event logs for FRS regularly for FRS errors,
sharing violations and excessive replication
Clean up metadata of improperly decommissioned DC
Do not stop FRS service for extended period of time
Never copy files that live in the SYSVOL between DC,
always try to troubleshoot why files aren’t replicating
Use D2(Non-Authoritative) and D4(Authoritative) with
care
Do not configure file system policies on SYSVOL
Do not scan or defrag SYSVOL
Do not store non-group policy files in SYSVOL
FRS best practices
19
DFS Replication is a multi-master replication engine, this means that changes can be
made on all locations. Do not make changes on one document on two locations at
the same time, changes will not merge, the conflict is solved by using the last writer
wins.
Sharing violations -users open files and gain exclusive WRITE locks in order to modify
their data- will prevent DFSR from replicating the modified file. Periodically those
changes are written within NTFS by the application and the USN Change Journal is
updated. DFSR Monitors that journal and will attempt to replicate the file, only to find
that it cannot because the file is still open.
An event will be logged if DFSR is repeatedly having troubles replicating open files. In
the DFS Replication event log entries for 4302 and 4304 will appear.
The option to adjust the replication schedule in DFSR management is greyed out.
This is done because SYSVOL replication follows the same replication path and
schedule as active directory. If the time window is open DFSR will replicate almost
instantly. If the replication is not possible because of the schedule replication will start
when the time window opens. This means that if AD replication is not permitted
between 6:00 am and 10:00 am DFS Replication will also not replicate. As soon as the
schedule allows replication, the changed files will be replicated.
DFRS Best Practices
20
Microsoft Confidential
Name Resolution
Active Directory
Replication
SYSVOL
Replication
Name Resolution
Domain Controller
Health
Disaster Recovery
21
Domain Name System
Provides name resolution service
Used by
Client & applications – for locating DC as well as
‘services’ provided by DC
Domain Controllers – for Active Directory Replication
and File Replication Services
DNS 101
22
TCP/IP Configurations
Domain Controllers must be configured with proper IP
Address and pointing to valid DNS servers
DNS Records
Required records must be registered properly on DNS
servers
Servers must be functioning properly
Forwarders/delegation/secondary, etc. must be
configured properly and valid
What needs to be in place for AD to function properly
23
Host (A) record
IP Address of domain controllers
Registered by DHCP Client
Registered by DNS Client on Windows 2008
Service Resource Record (SRV) Records
Registered by Netlogon service on DC
Used by clients/services to locate various type of
services provided by domain controller
GUID (CNAME) Record
Required for AD Replication
Registered only of forest root DNS server
Records Registered by DCs
24
Verify TCP/IP configurations
IPConfig
Verify DNS server functionality
NSLookup
DCDiag /test:DNS
DNS server console
Event Logs
Verify GUID and Glue Records
DNSLint
Re-register records
Cycle Netlogon
Cycle DHCP Client/DNS Client or IPConfig /RegisterDNS
Capture Network Trace
Netmon
Checking your DNS
25
Administrators not familiar/aware of name resolution
design
Invalid(Stale) TCP/IP, forwarders, delegation, etc. settings
DCs pointing to external (invalid) DNS servers
Single point of failure configurations
DNS forwarder loop
Zone Transfer not secured
Dynamic update not enabled
DNS scavenging not enabled
Multi-homed domain controllers
Common Pitfalls
26
Audit DNS entries used by DC replication with DNS on a
monthly basis
Ensure that disconnected NICs are disabled
Adopt a standardized configuration for domain
controllers and DNS servers
Allow zone transfer to specific servers only
Allow only secured dynamic updates
Configure DNS Scavenging to remove stale records
DNS Best Practices
27
Microsoft Confidential
Major Components of Active Directory
Active Directory
Replication
SYSVOL
Replication
Name Resolution
Domain Controller
Health
Disaster Recovery
28
Service Pack level
When was the last time your DC was restarted?
Event Logs
How often do you review the logs for errors or
warnings
Is Time Synchronization configured properly in the
environment (W32tm)
Domain Controller Health
29
Potential Failures not detected
Service Failing
DC experiencing bottleneck
System running low on disk space
No proper management of event logs
DCs running on outdated service pack
DCs not patched with security updates
Time Synchronization improperly configured
Common Pitfalls
30
Run DCDiag on a weekly basis to verify the overall well-
being of domain controllers
Review event logs on domain controllers regularly to
uncover problems in the early stage
Perform base-lining and regular monitoring of domain
controllers to uncover any potential resource bottleneck
Configure only the Forest root domain PDCe as NTP
type server
Best Practices
31
Microsoft Confidential
Major Components of Active Directory
Active Directory
Replication
SYSVOL
Replication
Name Resolution
Domain Controller
Health
Disaster Recovery
32
Loss of DCs
Loss of data
Re-introduction of lingering objects
Loss of configuration partition data
Disaster Recovery
Questions?