something special about benjamin session objectives and takeaways

60

Upload: stuart-robinson

Post on 23-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

The Top Ten Lessons Learned in Managing SQL & Reporting Benjamin Reynolds, Service Ops EngineerShitanshu Verma, Service Engineering ManagerUD-B328

Something special about Benjamin

Session Objectives and TakeawaysSession Objectives Deep dives into lessons learned and day–to–day management of SQL components in System Center 2012 SP1 Configuration Manager. The session is for Configuration Manager Admins who are working in depth with SQL Server.

Key Takeaways Understand how Microsoft IT is managing SQL server for Configuration ManagerImplement the SQL recommendation to improve health and availability

Unified Management Infrastructure @ Microsoft IT

Redmond Site 175k

Clients

Redmond Site 275k

Clients

North & South

America35k Clients

Europe, MidEast, Africa

40k Clients

Australia & Asia

75k Clients

Unified Device

Mgmt. Site~98K

devices *

MS Online Directory Services (MSODS)

Active Directory

Federation Server 2.0

MS Online Directory

Sync (DirSync)

AD

User Discovery corp domains

Intune Subscriptio

n

Connector Site role

Infrastructure• 6 Primary Sites• 13 Secondary Sites• 250 Distribution

PointsPCs & Devices• ~300,000 clients• ~125,000 mobile

devicesUsers• ~98k FTEs• ~82k Vendors

* Projected devices count

Key facts @ MSIT Configuration Manager DB

CAS Database (1.13 TB)• Database file size: 914

GB• Log file size: 217 GB

Redmond Primary Site 1 Database (561 GB) • Database file size: 342 GB• Log file size: 219 GB

Redmond Primary Site 2 Database (466 GB) • Database file size: 405 GB• Log file size: 61 GB

Asia Primary Site Database (633 GB) • Database file size: 588 GB• Log file size: 45 GB

Europe Primary Site Database (462 GB) • Database file size: 402 GB• Log file size: 60 GB

North America Primary Site Database (351 GB) • Database file size: 305 GB• Log file size: 46 GB

Lesson #1Database File Configuration

Database File ConfigurationBenefits of a properly configured subsystemLess Disk LatencyMore Disk IOPSBetter throughputLess Replication Backlogs

To increase performance you need to configure the subsystem appropriatelyDrive/Array ConfigurationDatabase File Configuration

Disk ConfigurationRecommended Disk & Array ConfigurationFile Allocation unit size = 64KBArray configuration: stripe size = 64KBRAID levels = 1+0Partition Alignment: http://msdn.microsoft.com/en-us/library/dd758814(v=SQL.100).aspx

If SQL server is still pegged look at the array controller settingsPhysical Drive Request Elevator sort (our setting is “disabled”)Maximum Drive Request queue depth (our setting is “32”)Physical drive write cache state (our standard is “enabled”)Surface scan analysis priority (our standard is “idle (w/ delay) = 15 seconds”)

Database File ConfigurationSeparate Data and Log Files

Keep these files on different drives

Create Multiple Data Files*Create 4-8 data filesThe only supported method of achieving this is to pre-create the database before installing ConfigMgr

We have a script for this and is available for download - http://sdrv.ms/11oBObv Do not alter the database files after site installation!

When possible locate these files on different drives*There is no need to have multiple log files; one will do! (there is no performance enhancement)

Database File ConfigurationSize and Growth does matter!

Data files should be equally sizedPre-size these files to avoid frequent/unnecessary file growthHave the same growth defined – do NOT use percentages but rather a set amount such as 1024MB

TempDBCreate multiple data files (4-8 data files rather than one file per CPU core)Follow size and growth recommendationsKeep TempDB on completely separate drive

Consider pre-sizing files to utilize the entire diskConsider turning autogrowth off when you pre-size to consume the disk

Database File Configuration – What we doCAS Server Defaults:o Data files on H: driveo Log files on O: driveo Backup files on E: drive

ConfigMgr database specifics:o 2 Data files on H: driveo 2 Data files on O: driveo Log file on O: drive

TempDB specifics:o Files all on T: drive (4 data & 1 log)o 75% of T: drive allocated for data fileso 25% of T: drive allocated for log file

Lesson #2Backup & Recovery

Backup & RecoveryWhy Backup & Recovery?You must prepare for backup and recovery strategy to avoid loss of critical data & services

Two options for Site Server BackupConfiguration Manager “Backup Site Server” maintenance taskSQL database backup (yes it’s supported)

Most common used backup option is #1 and we use #2

More Info on Backup & Recovery: http://technet.microsoft.com/en-us/library/gg712697.aspx

Backup & RecoveryDatabase backup using SQL native backup

Configuration Manager maintenance job for site backup is not required, SQL DB alone enough – Site Control Files & other files now moved to DBConfiguration Manager backup interrupts other Long Running Syncs (Exchange Connector)

Benefits of SQL BackupSQL Native Compression reduces Backup Size & CostRestore to Replica DB is easier due to smaller sizeDBA Teams anyway take SQL BackupsDPM Cost Savings - $1095 $103

Backup & RecoveryPrimary Site / CAS RecoveryChange Tracking Retention Period Matters – Balance it with recovery requirements & SizeRetention Period at Microsoft = 5 Days (default value)Easier Recovery even without Backups (as long as one other reference site is present)Recovery from Reference Site – Lost Site Data, will come back from Clients EXCEPT status messages

Secondary Site RecoveryNo Need of SQL Backup – Can be Re-initialized if RequiredRecovery of Secondary Site – Now Persists child DP’s information and Content Metadata (Since ConfigMgr SP1)

Lesson #3SQL and ConfigMgr Maintenance Tasks

SQL and CM Maintenance Task RecommendationsConfiguration Manager Discovery schedules Review discoveries schedule across all sites configuredEnsure discovery process are completed without causing DDR backlogs We have configured discoveries in gap of 1 day to ensure previous schedule is completed before next schedule

Configuration Manager maintenance taskReview frequent reoccurring tasks to ensure they are completing prior to next scheduleWe discovered 2 tasks in our hierarchy which were not completing before next schedule

SQL and CM Maintenance Task RecommendationsSUM Update Group Status Summarizer task & SUM Update Status Summarizer task At Microsoft IT we have changed from 1 hour to 4 hour as these tasks were running in continuous loop and causing blockings and disk IO

Other Misc. ConfigurationsEnsure there is a weekly SQL indexing job enabled and there is monitoring enabled for the job failuresEnable policy randomization for SUM deployments to minimize the impact on SQL replication due to monthly patching.

All above configurations helped us in getting Replication Link status availability from 94% to 98%

Lesson #4Database Index Maintenance

Index MaintenanceWhy Index Maintenance?Indexes are like disks – they get fragmented and need to be “defragged”

Improper index maintenance will impact performanceOverall data processing on siteReplication processingQuerying/Reporting

Options for Index MaintenanceConfigMgr Out of Box “Rebuild Indexes” Maintenance TaskOnly worry about the indexes that are more than 10% fragmentedRebuild indexes when greater than 30% fragmented; otherwise ReorganizeIf running on Enterprise Edition (of SQL) the rebuild operation will try an online operationDisabled by default!

Custom Index MaintenanceOnly worry about the indexes that meet YOUR criteria (or those defined by best practices)Configurable fragmentation levels for rebuilding and reorganizing as well as index sizeConfigurable time limits on the task – to account for maintenance windowsConsider using a different schedule for “problem” indexes

You Need To Do It!Index Maintenance is absolutely necessary so make sure you are doing one of these options regularly

Index Maintenance – What we doWe use a custom solution (created by Ola Hallengren)Fragmentation percentages between 5% and 25% = ReorganizeFragmentation percentages above 25% = Rebuild (online if possible)Run Index solution once a week (on the weekend due to resources required to perform)Run Statistics solution daily (off hours)

CAS StatsIndexing Job: 1.5 days to run (down from 2-3 days!)Statistics Job: 1 hour and 40 minutes on average

Next StepsDetermine if less aggressive fragmentation percentages can be used to decrease the run timeDetermine whether a time limit along with more frequent off hour runs would be best to distribute the load throughout the weekDetermine whether more tables exist that can be decreased in size or ignoredDatabase Indexing Resources - Ola Hallengren : http://ola.hallengren.com/

Lesson #5Moving a ConfigMgr Database

Moving a ConfigMgr Database Scenario 1Redmond Primary Site RD2’s SQL server was not performing as well as other primary sites were performingA new server was obtained with more resources

Scenario 2Redmond Primary Site RD3’s SQL server was outdated and did not have enough resources to support the loadA new server was obtained with sufficient resources

Database Moves – Pre RequisitesInstall SP1 CU1This fixes the known issue “Site replication fails after a site database is restored to a new server.”

Backup your SQL Logins & PermissionsWe have a script for this and is available for download – http://sdrv.ms/Z3wRRm

Don’t forget about SUSDB!Does the SQL server also host the WSUS content?

More Information:http://technet.microsoft.com/en-us/library/hh427336.aspx#BKMK_ModifyDatabaseConfig

Database Moves – Lessons LearnedRequired CertificatesWe were unable to complete the database move because certificates couldn’t be copied to the new server because the services were turned offWe turned on the ‘old’ SQL service until the move was complete

Re-InitializationNormally this isn’t needed, but if you run into the problem with the overflow exception (SP1 CU1 is not installed) then the system will re-initialize

High CPU on Primary siteThis was an issue due to the amount of backlogs the site needed to processWe installed the provider on the SQL box until the backlogs were gone and then moved it back to the Primary site

MP Connectivity problemsWe saw very high TCP connections afterward due to how long we were offline

Database MovesOur High Level Step by Step GuideStop SMS and WSUS services on all sites (including those that report to the site)Stop SQL servicesCopy data/log files (for CM_XXX and SUSDB) to the new serverCopy WSUS content to the new serverAttach the databases (aka restore them) on the new serverCreate SQL Logins (which were backed)Ensure the SUSDB content location is correctStart SMS services on the site requiring the move and the SQL services on the ‘old’ serverRun the Configuration Manager Setup Wizard to “Modify SQL Server configuration”Run the Configuration Manager Setup Wizard to “Modify SMS Provider configuration”Update SUP servers registry for “ContentDir” and “SqlServerName”Update SUP servers IIS settings for the content location “Physical Path”Start SMS and WSUS services on all sitesProvide permissions on the new SQL server on the WSUS Content FolderPerform QC (Quality Checks)

Lesson #6Database Size

Database SizeKey Factors that will increase the size of your database:Inventory settings – are you really using all that information?Logging (some can be modified, others cannot)Keeping history information for long periods of time

What is an appropriate size?“Big” does not equal “Bad”“Small” does not equal “Good”It depends!

Reasons to be aware of your database’s sizeDatabase Backups – size and duration are impacted so review your strategyIndex Maintenance – you may need to tweak your strategyDBCC CHECKDB – depending on drive space this may prove difficult to complete without a new strategy

Database SizeWhat’s taking up the most space?Use “spDiagGetObjectSize” or this script to determine that: http://sdrv.ms/14Up144

Our top 5 tables by size:

Can we take action on any of these tables to reduce the size?

Table NameNumOfRows

DataSize (MB) DataSize (GB)

CI_CurrentComplianceStatusDetails 64,396,595 84,093 82.12DRSSentMessages 20,599,266 71,458 69.78DRSReceivedMessages 10,608,458 54,850 53.56Logs 133,989,969 42,162 41.17INSTALLED_SOFTWARE_DATA 35,430,685 38,749 37.84

Database Size – Before & AfterBefore our grooming

After our grooming

Table NameNumOfRows

DataSize (MB) DataSize (GB) Index Maint

DRSSentMessages 20,599,266 71,458 69.7819.62 hours

DRSReceivedMessages 10,608,458 54,850 53.56 9.87 hours

Table NameNumOfRows

DataSize (MB) DataSize (GB) Index Maint

DRSSentMessages 23,886 27 0.026 0 hoursDRSReceivedMessages 4,808 5 0.005 0 hours

Database Size - Logging & History InformationDRS Message LoggingThese two tables store every DRS message sent or received for the replication groups listed in the following registry key:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SMS\COMPONENTS\SMS_REPLICATION_CONFIGURATION_MONITOR] | "DRS Replication Group Message Logging“

This information can be used to troubleshoot some replication issuesThe default value is “Site Control Data,Configuration Data”We updated this from “Site Control Data,Configuration Data,General_Site_Data,Medium_Priority_Site,Status_Messages” to “Site Control Data,Replication Data”

Database SizeAfter updating our message logging:

Next steps:Evaluate length to store “Logs” dataEvaluate hardware inventory class “Win32_Service” – are we using this information?

Table NameNumOfRows

DataSize (MB) DataSize (GB)

CI_CurrentComplianceStatusDetails 64,396,595 84,093 82.12Logs 133,989,969 42,162 41.17INSTALLED_SOFTWARE_DATA 35,430,685 38,749 37.84Services_HIST 52,213,928 22,828 22.29Services_DATA 50,560,825 19,843 19.38

Lesson #7Replication: When to Panic

Replication: States DefinedThe “Degraded” StateWhen a site takes longer than the defined “degraded” threshold to process a particular sync message for a replication groupIf the failed threshold is surpassed but the site processing the sync message is sending “In Progress” messages to the sending site the state will stay as “Degraded”

The “Failed” StateWhen a site takes longer than the defined “failed” threshold to process a sync message for a replication group AND the sites are not sending or receiving “In Progress” messages for the sync message

Replication ThresholdsDegraded/Failed thresholds are configurableThese are configured via the console (“Link Properties” under “Monitoring Database Replication”)Each link has its own thresholds definedThe default threshold for the degraded state is 12 retriesThe default threshold for the failed state is 24 retries

Replication ThresholdsRetries vs. Minutes:A retry does not equal a minuteEach replication group has an interval defined which determines how often the group is replicated (see the dbo.vReplicationData view)To determine a threshold in minutes multiply the SyncInterval by the retry thresholdExamples:

ReplicationGroup SyncInterval

Degraded Retries

Failed Retrie

s

Degraded Threshold (minutes)

Failed Threshold (minutes)

Site Control Data 1 12 24 12 24

Hardware_Inventory 5 12 24 60 120

Replication: Panic Time?Determine what an acceptable backlog isConfigure your degraded and failed states appropriatelyWe have defined degraded with a retry count of 20 and failed with a retry count of 35

Degraded = Keep an eye on things but don’t panic (yet)Generally this simply means the link has a backlog and is just taking a longer time to finishDetermine which link and replication group is causing the site to be in this stateIf replication stays in this state for a long time start looking into root cause (use RLA & look for blocking)

Failed = Start troubleshooting immediately (aka panic)Generally this means something really is wrong or broken with the linkIn most cases the link won’t just fix itself so begin your troubleshooting steps immediatelyRLA is helpful but make sure to read the messages before clicking “Ok”!

Lesson #8A Database Replica for Reporting

Database Replica for ReportingSome reasons for creating a “replica”Multiple groups/organizations requiring data from our databaseLong running queries impact production performancePoorly written queries can cause blockingToo many people “testing” queries on production!

Database Replica for ReportingSteps we perform in the restore processRESTORE the backupRemove any schemas owned by a user or groupRemove any logins and users that should not have access to this server/databaseCreate approved logins and users if they don’t already existGrant any additional or special permissionsPut the database into READ_ONLY mode

Database Replica for ReportingOther possible steps to include in the restore processCreate custom indexes (for specific scenarios/queries)Create flat tables of App Model data for easier/faster querying

Additional Benefits to this backup/restore processBackup ValidationIncreased Redundancy (for your Disaster Recovery Plan)Ability to run important system checks without impacting production (DBCC CHECKDB)

Lesson #9High Availability & Disaster Recovery

High Availability & Disaster RecoveryScenario Requirements:A highly available environmentA disaster recovery plan which doesn’t have a lot of downtimeA disaster recovery plan which has geographic separation

High Availability & Disaster Recovery

DC1 Datacenter

DC1SCCMSQL01 DC1SCCMSQL02

DC2 Datacenter

DC2SCCMSQL01 DC2SCCMSQL02

Windows Failover Cluster (SCCMSQL)

SQL Cluster (DC1SQLCLSTR)

SQL Cluster (DC2SQLCLSTR)

Asynchronous Synchronization

AlwaysOn Availability Group

Note: AlwaysOn Availability Groups only work as a database mirroring solution without auto failover

High Availability & Disaster RecoveryHigh AvailabilityThe SQL Clusters provide high availability in each data center

Disaster RecoveryThe AlwaysOn Availability Group provides database mirroring to a SQL Cluster in a separate data centerIf using AlwaysOn Availability Groups with ConfigMgr (unsupported) ensure that auto failover has not been configured - a manual failover will require a site reset in order for ConfigMgr to work

Reporting as a Service Overview

Reporting Service @ MicrosoftEvolving ServiceSmart Stakeholders – Microsoft IT, Auditing, Security, Internal TeamsCombination of Data Sources – multiple ConfigMgr feeds, Active Directory, HR datafeed (HeadTrax), SCEP, custom online monitoring, historical dataPower Pivot based reportsReplicate data from Production to Replica DBBackup & Restore Serves as Disaster Recovery Validation

Mixed Access to Live Data & Data Warehouse1 Data warehouse with 186 Distinct ETLs2 cubes for Patching and Operating System Deployment custom reportsOverall 400+ custom reports

Reporting as Service – Future StateAccess to One Reporting DB & DWDashboard for Modern Device ManagementConsolidate Data Sources & Report Catalog on one ServerCustom Metrics DashboardDisplays Tiles of dataAnnotationsDashboard

Lesson #10Replication Monitoring Reports

SQL Replication Status Daily ReportCustom SSRS Report for visually see the status of SQL replication within your System Center 2012 Configuration Manager (ConfigMgr) environment

At Microsoft IT, this SSRS report is being sent to ConfigMgr Admins as subscription report everyday for awarenessCustom SQL Replication reports for System Center 2012

Configuration Manager:http://blogs.technet.com/b/system_center_in_action/archive/2012/05/02/custom-sql-replication-reports-for-system-center-2012-configuration-manager.aspx

Measuring Replication HealthData captured from ConfigMgr Views every 15 minutes, and dumped to custom SQL table

Availability based on Active/ Degraded/Failed States

Custom SSRS report for tracking replication link status history

Link for downloading custom SQL job and SSRS report: http://sdrv.ms/10zD3T3

In ReviewSession ObjectiveDeep dives into lessons learned and day–to–day management of SQL components in System Center 2012 SP1 Configuration Manager.

Key TakeAwaysUnderstand how Microsoft IT is managing SQL server for Configuration ManagerImplement the SQL recommendation to improve health and availability

Related Content from Microsoft ITUD-B305 How Microsoft IT Uses System Center Configuration Manager 2012 SP1

UD-B319 How Microsoft IT Upgrades System Center Configuration Manager 2012 Hierarchy with System Center Orchestrator Automation

UD-B311 Deploying System Center 2012 Configuration Manager SP1 With Windows Intune

ResourcesMSDN Reorganize and Rebuild Indexes: http://msdn.microsoft.com/en-us/library/ms189858.aspx

White Paper on Index Defragmentation:http://msdn.microsoft.com/sv-se/library/cc966523(en-us).aspx

More InformationSystem Center in Action Sitehttp://blogs.technet.com/b/system_center_in_action

Technical Case Study: How Microsoft IT Deployed System Center 2012 Configuration Managerhttp://technet.microsoft.com/en-us/library/hh913620.aspx

Technical Case Study: User-Centric Client Management with System Center 2012 Configuration Manager in Microsoft IThttp://technet.microsoft.com/en-us/library/hh925141.aspx

Shitanshu Verma’s Bloghttp://blogs.msdn.com/b/shitanshu

Evaluation

Complete your session evaluations today and enter to win prizes daily. Provide your feedback at a CommNet kiosk or log on at www.2013mms.com.Upon submission you will receive instant notification if you have won a prize. Prize pickup is at the Information Desk located in Attendee Services in the Mandalay Bay Foyer. Entry details can be found on the MMS website.

We want to hear from you!

Resources

http://channel9.msdn.com/Events

Access MMS Online to view session recordings after the event.

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.