the high availability mantra - how dcim can help

21
1 High Availability Mantra: How DCIM Can Help

Upload: greenfield-software-private-limited

Post on 17-Jun-2015

281 views

Category:

Documents


0 download

DESCRIPTION

This is the last of a 3-part series "DCIM for High Availability" presented by GreenField Software. It first defines "high availability" and then gives instances of some recent high profile Data Center failures in spite of their robustness and extreme in-built redundancies. The business impact of Data Center failures is highlighted. Data Center topology has changed in the last two decades as a result of the High Availability Mantra and new tools are required to effectively manage the Modern Data Center. DCIM Software today has matured to a level where it is no longer an option. Data Centers of all sized need to implement DCIM not just to reduce risks of Data Center failures, but also to arrest increasing capital costs and operating expenses. GFS Crane DCIM Software is a great example as the two DCIM Case Studies show in this presentation. The following GFS Crane capabilities have been included in this presentation: - Improved Availability through Predictability, Visibility and Change Tracking. - Controlling Capex Costs though better visibility of under-utilized capacities and therefore deferring expensive capital expenditures; and minimizing stranded capacities. - Reducing Operating Expenses: Real-time monitoring and multi-level PUE helps to reduce power costs; automation of processes improves productivity; and rationalization of assets reduces AMC and space rentals. The presentation concludes with two GFS Crane DCIM Case Studies: in Financial Services and Telecom verticals. GreenField Software’s Mission is to help Data Centers control capital expenditures reduce operating expenses and mitigate the risks of Data Center failures. Besides DCIM Software, GFS offers Data Center Advisory Services in the areas of best practices, capacity planning, energy efficiency and business continuity of data centers.

TRANSCRIPT

Page 1: The High Availability Mantra - How DCIM Can Help

1

High Availability Mantra: How DCIM Can Help

Page 2: The High Availability Mantra - How DCIM Can Help

2

Today’s Topics

• High Availability Mantra Revisited

• Anatomy of a DCIM Software: GFS Crane

• How GFS Crane DCIM Delivers Higher Availability

• How GFS Crane DCIM Helps to Reduce Costs

• GFS Crane DCIM Case Studies

Page 3: The High Availability Mantra - How DCIM Can Help

3

The High Availability Mantra Revisited The High Availability Mantra Revisited

Amazon Data Centers (built to Tier 4 standards and with an expected availability of 99.995%) had two outages in 2012 – each over 3 hours!

• Tier 3/Tier 4 just defined by hardware redundancies

• Glaring gaps in operating procedures to prevent fatal human errors

• Lack of purpose-built BCP software to predict failures

• Lack of chain of custody to detect root cause

Amazon Data Centers (built to Tier 4 standards and with an expected availability of 99.995%) had two outages in 2012 – each over 3 hours!

• Tier 3/Tier 4 just defined by hardware redundancies

• Glaring gaps in operating procedures to prevent fatal human errors

• Lack of purpose-built BCP software to predict failures

• Lack of chain of custody to detect root cause

Availability % Downtime per year Downtime per month* Downtime per week

99% ("two nines") 3.65 days 7.20 hours 1.68 hours

99.5% 1.83 days 3.60 hours 50.4 minutes

99.8% 17.52 hours 86.23 minutes 20.16 minutes

99.9% ("three nines") 8.76 hours 43.8 minutes 10.1 minutes

99.95% 4.38 hours 21.56 minutes 5.04 minutes

99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes

99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds

99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds

99.99999% ("seven nines") 3.15 seconds 0.259 seconds 0.0605 seconds

Page 4: The High Availability Mantra - How DCIM Can Help

4

Did You Know?

90% of DC Failures Are From Common Preventable Causes 90% of DC Failures Are From Common Preventable Causes

Page 5: The High Availability Mantra - How DCIM Can Help

5

Did You Know?

Average Failure of an Online System: 36 hours per annum. That’s only 99.6% Uptime

Average Failure of an Online System: 36 hours per annum. That’s only 99.6% Uptime

Page 6: The High Availability Mantra - How DCIM Can Help

6

Did You Know?

75% of Businesses Without a BC Plan Fail Within 3 Years after a Major Disruption in their IT Systems

75% of Businesses Without a BC Plan Fail Within 3 Years after a Major Disruption in their IT Systems

Page 7: The High Availability Mantra - How DCIM Can Help

7

Anatomy of a DCIM Software: GFS Crane

Page 8: The High Availability Mantra - How DCIM Can Help

8

Improves Availability: Predictability, Visibility & Change Tracking

Advanced Alarm Management and analytics helps in failure predictability, faster turn-around-time, improved availability and SLA Consolidation of alarms from different facilities helps in centralized monitoring

Improved visibility of the power chain and the relationships among critical components of the infrastructure helps in better impact analysis of device malfunction or failure and doing RCA

Change Tracking in the data center environment helps in doing impact analysis of any change and root cause analysis of any outage occurring due to a change

Predictive Analytics Predictive Analytics

Visibility from Power Chain

Visibility from Power Chain

Change Tracking Change Tracking

Page 9: The High Availability Mantra - How DCIM Can Help

9

Improves Availability: Predictability from Proactive Alarms

Proactive Real-time alarms Alarms on power, PUE and environmental conditions like temperature, humidity, smoke, fire, WLD, door-open and motion Alarms can be sent on e-mail & SMS

Alarm Dashboard Alarms from multiple data centers are consolidated on a dashboard Analysis on alarms based on severity, type, source, duration etc.

Advanced Alarm Management helps in failure predictability, faster turn-around-time, improved availability & SLA compliance

Page 10: The High Availability Mantra - How DCIM Can Help

10

Improves Availability: Visibility from Power Chain

Maps relationships among critical components of electrical infrastructure Create power chain for electrical infrastructure Map asset relationships and redundancies starting from power source to customers and applications

Asset Relationship Mapping

Improved visibility of the power chain and relationships among critical components of

the infrastructure help in better impact analysis of device malfunction or failure

and doing root cause analysis

Page 11: The High Availability Mantra - How DCIM Can Help

11

Improves Availability: Change Tracking

Maintains an audit trail for all Installation/Move/Add/Change activity in the data center Integration with existing ITSM tool enables running the tracked changes through a workflow system for change approvals

Audit Trail of DC Configuration Changes

Tracking changes in the data center environment helps in doing impact analysis of any change and root cause analysis of any outage occurring due to a change

Page 12: The High Availability Mantra - How DCIM Can Help

12

Reduces Cost: Capex & Opex

Better visibility helps discovering under-utilized computing capacities -> defers capex purchases Better visibility helps avoiding stranded capacities on rack space & power use: maximizes utilization of available capacities

Better monitoring & analytics reduces operating cost on power Automation of processes like Asset Tracking, Provisioning & Monitoring improves productivity Rationalizing asset base helps in lower maintenance costs like equipment AMC

Reduces Capex Reduces Capex

Reduces Opex Reduces Opex

Page 13: The High Availability Mantra - How DCIM Can Help

13

Reduces CapEx: Monitoring IT Utilization

Visibility of hidden compute capacity Calculates the average utilization of all computing devices in the data center Identifies the unused compute capacity

Under-utilized servers can be repurposed Based on power consumption & utilization patterns, hardware specs and age, ‘Repurpose Candidates’ are identified that helps in deferring new server hardware purchase

Hidden Computing Capacity

Repurpose Hardware

Discovery of hidden compute capacity defers capital investment on new server hardware and software licenses

Page 14: The High Availability Mantra - How DCIM Can Help

14

Reduces Capex: Minimizing Stranded Capacities

Visibility of consumed power against max capacity in a rack Provides real-time information on actual IT load in a rack Provides maximum power capacity Provides available power capacity

Visibility of occupied rack space against max available space Provides real-time information on occupied space in the rack in RU Provides maximum space capacity Provides available space capacity

Hidden Power Capacity

Hidden Space Capacity

Page 15: The High Availability Mantra - How DCIM Can Help

15

Reduces OpEx: Power Costs

Multi-level PUE Comparison Compares PUE calculated at multiple levels and identifies power distribution losses that can be rectified to improve efficiency and reduce OpEx on Power

Detect Power Distribution Loss

L1 PUE: UPS Output

L2 PUE: PDU Output

L3 PUE: Device-level reading Detection of power distribution losses in the

electrical infrastructure helps in improving energy efficiency of the data center and reduce operating cost on power

Page 16: The High Availability Mantra - How DCIM Can Help

16

Reduces Opex: Process Automation & Improved Productivity

Automated discovery and inventory of both IT and infrastructure assets Intelligent assets are automatically discovered using SNMP/IPMI Manufacturer Repository contains information on static attributes of assets Assets data imported from spreadsheets or asset management tool Single management console to manage IT and non-IT assets Maintenance management for assets done using plug-ins that sends scheduler based proactive alerts Workflow-based auto-provisioning improves speed and reduces errors

Advanced Asset Management

Page 17: The High Availability Mantra - How DCIM Can Help

17

Reduces Opex: Asset Rationalization

Asset Rationalization Asset Management module tracks & maintains inventory of all assets (IT

& non-IT) in the data Centre. Helps identify legacy servers and replacement candidates Reduces AMC, space rentals

Asset Rationalization

Asset Rationalization

Server Virtualization

Server Virtualization

Capacity Planning

Capacity Planning

Data Center Consolidation

Data Center Consolidation

GFS Crane

DC DCIM

GFS Crane

DC DCIM

Legacy Data Center

Legacy Data Center

Server & Rack Consolidation Server & Rack Consolidation

Multiple Data Centers

Multiple Data Centers

Page 18: The High Availability Mantra - How DCIM Can Help

18

How GFS Crane DCIM Helps

• Helps Data Center Manager avoid unnecessary over-provisioning • Helps plan investments and new capacity • Helps reduce the capital costs • Helps reduce power use and other operating costs • Helps reduce risk of failures through critical alerts • Helps adapting to technical and business change more easily • Helps improvement plans through real-time metrics & dashboard

Page 19: The High Availability Mantra - How DCIM Can Help

19

GFS Crane DCIM Case Study 1: Financial Services

Industry Project Financing & Mutual Funds

Data Center Location India

Data Center Details Tier III certified by 451 Research, Energy Efficient ‘green’ Data Center certified by TÜV Rheinland

DCIM Implementation date

January, 2012

Business requirement driving DCIM implementation

Improve energy efficiency through better energy management Comply with Green Grid recommendations and adopt best practices in data center operations Improve data center availability and meet business SLA through better monitoring, failure prediction and faster turn-around-time

Integration Touch

Points

Power Systems: LT transformer panels, UPS, PDUs and Distribution Panels, BUSBAR panels, Multifunction Energy Meters. Environmental Systems: PAC units, temperature and humidity probes Servers, Network devices, Storage devices

Siemens Building Management System

Page 20: The High Availability Mantra - How DCIM Can Help

20

Industry Mobile Operator

Data Center Location South Asia

Data Center Details Multiple data centers spread across 4 locations, covering 8,500 sq.ft. of whitespace and housing 320 racks

DCIM Implementation Date

Ongoing

Business requirement driving DCIM implementation

Improve data center efficiency through better energy management Improve operational efficiency through better asset management, capacity planning and converged infrastructure monitoring capability Improve data center availability and meet business SLA through better monitoring, failure prediction and faster turn-around-time

Integration Touch Points

Power Systems: LT transformer panels, UPS, A/C & D/C PDUs and Distribution Panels, BUSBAR panels, Multifunction Energy Meters. Environmental Systems: PAC units, temperature and humidity probes Diesel generator, flow and level sensors

IBM Netcool (ITSM), VESDA, ACS and IP Surveillance

GFS Crane DCIM Case Study 2: Telecom

Page 21: The High Availability Mantra - How DCIM Can Help

21

Thank You http://www.greenfieldsoft.com Email: [email protected]

See other two in this series: - The Modern Data Center Topology: The High

Availability Mantra - Data Center Infrastructure Management:

ERP for the Data Center Manager