leveraging ionix it operations intelligence to make your private cloud more efficient using apis

55
1 © Copyright 2010 EMC Corporation. All rights reserved. Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs Bill Kuhhirte

Upload: emery-flores

Post on 01-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs. Bill Kuhhirte. “. “. It is a very sad thing that nowadays there is so little useless information. - Oscar Wilde. EMC’s Vision Begins with your Core Asset. Agenda. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

1© Copyright 2010 EMC Corporation. All rights reserved.

Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

Bill Kuhhirte

Page 2: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

2© Copyright 2010 EMC Corporation. All rights reserved.

EMC’s Vision Begins with your Core Asset

It is a very sad thing that nowadaysthere is so little useless information. - Oscar Wilde“

Page 3: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

3© Copyright 2010 EMC Corporation. All rights reserved.

Agenda

Brief Technology Overview and Definition of Terms

Points of Extensibility– Dynamic MODEL

Use Case #1 – Configuring Multiple Thresholds Use Case #2 – Site Failure Analysis

– Business Impact and Maintenance Use Case #3 – Maintenance Use Case #4 – Business Impact Management

– SAM Automatic Actions and Notification List Subscribers Use Case #5 – SAM Actions Use Case #6 – Combining RCA with Abstract Events

– Notification Manager Use Case #7 – Advanced Event Management with Notification Manager

Recap

Questions

Page 4: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

4© Copyright 2010 EMC Corporation. All rights reserved.

Smarts Founding Vision

Automate the management of dynamic distributed systems

– Management by delegation– Model based management

Patented technology builds intelligence into software that automatically adapts to managed system

Page 5: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

5© Copyright 2010 EMC Corporation. All rights reserved.

Value of Automated Root Cause Analysis

Up to 80% of the time to resolvea service affecting failure can beattributed to finding the source

Accelerate resolution– Lower operational costs

Terminology– MTTI = Mean Time to Identify – the time to

identify the cause of the incident– MTTF = Mean Time to Fix – the time to

actually restore service once the cause is isolated

– MTTR = Mean Time to Resolution

MTTI MTTF

MTTR

5 min

25 min

20 min55 min

75 min

Identify

Escalate

Restore

Page 6: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

6© Copyright 2010 EMC Corporation. All rights reserved.

Current IT Management State

Business Gap

TechnologyGap

OperationalGap

AutomationGap

ManagementInformation Gap

Page 7: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

7© Copyright 2010 EMC Corporation. All rights reserved.

Analyze in Context

automatically analyzes any behavior, in any technology domain

Collect • Auto-discovery• Mediation• 900+ certified

devices• Adapters

Integration Layer

automatically builds a knowledge base across infrastructure and business

“Automating the Automatable”

Page 8: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

8© Copyright 2010 EMC Corporation. All rights reserved.

Produces an “actionable” root-cause that can be refined by– Identifying at a more granular detail what is wrong– Identifying based on similar events what was done to resolve them– Actually taking an action (automated or user-directed) to resolve the problem or

gather additional information

Focused only on the technology domains and specific problems– There will always be new problems to solve– Some problems are specific to a very narrow set of environments or configurations– Some problems are simply hard to diagnose in a generic way (e.g. firmware bugs,

transient conditions)

By having a flexible framework– Analysis can be adaptive to your needs– Automation can be applied to reduce personnel costs/time– Can rapidly address new problems without waiting for new releases from EMC

Doing More with Ionix IT Operations Insight

Page 9: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

9© Copyright 2010 EMC Corporation. All rights reserved.

Global Network Solution Architecture

ManagedDomain

Discovery &

Monitoring

SNMP, ICMP& Traps

SNMP, Syslog,SSH and Telnet

Root CauseIonix

CMDB

Cross Correlation

as Applicable

BusinessImpacts

Topology Topology

SNMP& EMS

RoutersServers

Switches

OSPF

Firewalls

IS-IS

BGP

EIGRPMPLS

Page 10: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

10© Copyright 2010 EMC Corporation. All rights reserved.

Definition of Terms

Repository The Repository is an in-memory database representing the topology constructed

automatically by applying behavior models to the discovered infrastructure. It represents physical and logical objects in the managed environment and their relationships and is used to compute problem signatures for the Codebook.

MODEL Managed Object Definition Language (MODEL) is a language used to express the logical

and physical relationships between components of the topology as well as how symptoms propagate across those relationships from the problems they relate to.

ECIM The Repository leverages the industry-standard Common Information Model defined by the

DMTF, and is the first commercial implementation of this important standard. The EMC Ionix implementation of this model is called the EMC Common Information Model (ECIM). It provides a single common topological context for all of the EMC Smarts analysis tools as well as events received from 3rd party tools. This means that when an operator receives a notification of a problem they can rapidly view all the current problem information for the device regardless of the information source. The infrastructure devices and their components are also related to the logical topologies that are overlain on the physical topology. This permits impact analysis to extend to customers, business processes, geographies, etc.

Page 11: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

11© Copyright 2010 EMC Corporation. All rights reserved.

Agenda

Brief Technology Overview and Definition of Terms

Points of Extensibility– Dynamic MODEL

Use Case #1 – Configuring Multiple Thresholds Use Case #2 – Site Failure Analysis

– Business Impact and Maintenance Use Case #3 – Maintenance Use Case #4 – Business Impact Management

– SAM Automatic Actions and Notification List Subscribers Use Case #5 – SAM Actions Use Case #6 – Combining RCA with Abstract Events

– Notification Manager Use Case #7 – Advanced Event Management with Notification Manager

Recap

Questions

Page 12: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

12© Copyright 2010 EMC Corporation. All rights reserved.

Dynamic Model

Dynamic Model is an alternate implementation of the Managed Object Definition Language (MODEL)

– Traditional Model produces executable code– Dynamic Model produces a platform- and (mostly) version-independent output– Languages and semantics are identical, with some minor limitations

Dynamic Model enables you to add new classes, and refine classes that are already defined in the data model libraries without needing the sources.

Dynamic Model can add attributes, events, and relationships to an existing class

New attributes are saved and restored with the repository

Load dynamic model extensions into IP and SAM servers

Populate attributes via ad hoc scripts, discovery scripts and SNMP polling

Page 13: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

13© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #1: Configuring Multiple Thresholds

Example in point:– Provide 4 thresholds for FileSystem utilization with different severities

Overview of the solution:– Dynamic MODEL code to configure multiple events– Dynamic MODEL code to adjust the UI– Assign severities in SAM for the new events

Page 14: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

14© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #1: Configuring Multiple Thresholds

Dynamic MODEL to generate new events:– Define and Export the events– Provide default threshold values

Solution:refine interface FileSystem_Performance {    export ModerateUtilization85PercentMarkerExceeded;    export ModerateUtilization90PercentMarkerExceeded;    event ModerateUtilization85PercentMarkerExceeded        "Utilization is higher than Utilization85PercentMarker and less than Utilization90PercentMarker."        = Mounted && StorageSize > 0 && UtilizationPct > Utilization85PercentMarker

&& UtilizationPct <= Utilization90PercentMarker;    event ModerateUtilization90PercentMarkerExceeded        "Utilization is higher than Utilization90PercentMarker and less than Utilization95PercentMarker."        = Mounted && StorageSize > 0 && UtilizationPct > Utilization90PercentMarker

&& UtilizationPct <= Utilization95PercentMarker;    attribute double Utilization85PercentMarker        "Threshold for percentage of total size currently in use."        = 85.0;    attribute double Utilization90PercentMarker        "Threshold for percentage of total size currently in use."        = 90.0;}

Page 15: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

15© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #1: Configuring Multiple Thresholds

Dynamic MODEL to adjust the UI:– Provide an attribute (ranged) for each of the thresholds

Solution:– Posted to https://community.emc.com/message/458163

refine interface FileSystem_Performance_Setting {    attribute double [0 .. 100] Utilization85PercentMarker        "The lower threshold for moderate filesystem utilization expressed as a "        "percentage of the total capacity of the filesystem."        = 85;    attribute double [0 .. 100] Utilization90PercentMarker        "The higher threshold for moderate filesystem utilization expressed as a "        "percentage of the total capacity of the filesystem."        = 90; …}

Page 16: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

16© Copyright 2010 EMC Corporation. All rights reserved.

New thresholds visible in the UI

Page 17: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

17© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #2 – Site Failure Analysis

Failures of a physical location (Rack, Floor, Building, etc.)– Desirable RCA – especially in areas with power or cooling issues– Easy to perform, but based on undiscoverable data– Solution posted to https://community.emc.com/message/458163

interface Site : ICIM_Collection{ export SiteDown; event SiteDown "The site is down." = IsSiteDown && (|ICIM_UnitaryComputerSystem(ConsistsOf)| > 0); propagate attribute boolean and AllUnresponsive = ICIM_UnitaryComputerSystem, ConsistsOf, IsUnresponsive; propagate attribute boolean or SuperSiteDown = Site, MemberOf, AllUnresponsive; computed attribute boolean IsSiteDown = SuperSiteDown ? FALSE : AllUnresponsive else AllUnresponsive;…};

Page 18: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

18© Copyright 2010 EMC Corporation. All rights reserved.

Agenda

Brief Technology Overview and Definition of Terms

Points of Extensibility– Dynamic MODEL

Use Case #1 – Configuring Multiple Thresholds Use Case #2 – Site Failure Analysis

– Business Impact and Maintenance Use Case #3 – Maintenance Use Case #4 – Business Impact Management

– SAM Automatic Actions and Notification List Subscribers Use Case #5 – SAM Actions Use Case #6 – Combining RCA with Abstract Events

– Notification Manager Use Case #7 – Advanced Event Management with Notification Manager

Recap

Questions

Page 19: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

19© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #3 - Maintenance

IT Departments normally have scheduled component or service outages– Want the Operations staff to ignore those conditions– Need the alarms to be visible again if the component or service is still unavailable

after the planned window.

Ionix SAM 8.0 and higher provides a good mechanism for handling scheduled outages

– Provided through MBIM (Maintenance and Business Impact Manager)– The GUI exposes ways to configure scheduled maintenance for topology objects in

SAM

So, what if the object doesn’t exist in SAM? – More granular pieces:

Network Adapters FileSystems TemperatureSensors

– Comes from an abstract event source

Page 20: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

20© Copyright 2010 EMC Corporation. All rights reserved.

Maintenance – API Accessibility

The Scheduled Maintenance *is* an event and can be created using our standard APIs

– Can even be driven by a configuration file– Could even be coupled to a server-side tool in SAM as a way to suppress alarms for a

fixed duration– Solution posted to https://community.emc.com/message/458163

notiName = "NOTIFICATION-" . systemClass . "_" . systemName . "_SchedMaint" . currentTime; schedMaintNotiObj = create("ICS_Notification", notiName); schedMaintNotiObj->ClassName = "Interface"; schedMaintNotiObj->ClassDisplayName = "Interface"; schedMaintNotiObj->InstanceName = ifName; schedMaintNotiObj->InstanceDisplayName = ifName; schedMaintNotiObj->EventType = "MOMENTARY"; schedMaintNotiObj->EventText = "Sched maint from: " . time(schedMaintTimeValue) . " to: " . time(maintEndTime) . ", by EXTERNAL"; schedMaintNotiObj->EventName = "SchedMaint"; schedMaintNotiObj->EventDisplayName = "SchedMaint"; schedMaintNotiObj->OccurredOn = systemObj; schedMaintNotiObj->Severity = 5; schedMaintNotiObj->ClearOnAcknowledge = TRUE; schedMaintNotiObj->notify("maint", "EXTERNAL", "Maint Window Created from External Source", currentTime,

schedMaintDuration); schedMaintNotiObj->takeOwnership("maint"); schedMaintNotiObj->changed();

Page 21: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

21© Copyright 2010 EMC Corporation. All rights reserved.

Maintenance – View of Suppressed Events

Page 22: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

22© Copyright 2010 EMC Corporation. All rights reserved.

Maintenance – View of Scheduled Maintenance

Page 23: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

23© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #4 – Business Impact Management

What is Maintenance & Business Impact Manager (MBIM) – A method of implying service impacts based on relationships to topological

components– Allows the Operations staff to prioritize simultaneous problems based on business

impact – Handles the creation and manipulation of scheduled maintenance windows for

components.

Perceived limitations:– Service impacts can be calculated only against topology in SAM– Any event regardless of severity triggers the service impact– Service impact can vary depending on when the problem occurs

All of those can be overcome through the use of the API (and a little creativity)

Page 24: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

24© Copyright 2010 EMC Corporation. All rights reserved.

Notification List Processing - Term Overview

Notification List– A subset of the overall set of alarms (notifications) within SAM based on the

application of a filter.

Notification List Subscriber– An adapter using any of the forms of API which will be sent indications of change to

the notifications within a Notification List– The adapter may then perform any number of actions based on the reception of that

data Output the data to another interface Manipulate the notification Perform specific user-defined actions

– For more information on how to construct a Notification List subscriber take a look here:

https://community.emc.com/docs/DOC-1268– Support exists in all flavors of the API

Page 25: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

25© Copyright 2010 EMC Corporation. All rights reserved.

MBIM/SAM Subtleties

MBIM is driven by a Notification List:– In $SM_SITEMOD/rules/bim/bim-start-sam-sync.asl:

BUSINESS_IMPACT_SUB {}do {… sub = create("GA_NLSubscription", bimDriverName."-SUB"); sub->NLName = "ALL_NOTIFICATIONS"; subscriberFE->SubscribesTo += sub;…}

So, you can change the set of notifications subscribed to that will drive business impacts by changing the NL and applying your own filters

Sidebar: What if you wanted to be granular beyond what you can express in a filter?

Page 26: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

26© Copyright 2010 EMC Corporation. All rights reserved.

Sidebar: ASL Notification List Filter

Instead of using the typical NL filter construction (using the UI or XML) we can use ASL:

Option to use an ASL filter is only available when creating the Notification List filter

– The use of ASL can make a filter arbitrarily complex– The variable “Result” must contain a Boolean value indicating whether the event passes the filter

or not

Sample code is below ($SM_HOME/rules/ics/nl-sample-filter.asl):

default Result = TRUE;default NotificationName = "";

START do { notification = object(NotificationName); if (notification->EventName == "Failure") {

Result = TRUE; } else {

Result = FALSE; } }

Page 27: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

27© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #4 – Business Impact Management

Key Points:– Example is done via hook script, but can be easily done through a NotificationList

subscriber– Utilizes general (string) key/value tables of the class “GA_StringDictionary”

Not persistent data – will be reloaded every time SAM restarts

– Map is based around the ElementName but can be adapted to any field of the Notification

– Example code shows the creation for a single Customer, but the data may have any number of business impacts

– Solution posted to https://community.emc.com/message/458163

Page 28: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

28© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #4 – Business Impact Management

timeNow = time(); notiObj = notiFactory->makeNotification("Customer", custKey, "ServiceImpacted"); notiObj->ClassName = "Customer"; // Instance class must be set notiObj->SourceDomainName = eventObj->SourceDomainName; notiObj->Severity = eventSeverity; notiObj->EventType = "DURABLE"; //Set the event to autoclear based on duration notiObj->EventText =

"Customer ".customerName." impacted by device ".keyElementName."::".eventText; notiObj->Category = "IMPACT"; notiObj->CausedBy += eventObj; notiObj->Impact = numeric(custWeight); notiObj->InstanceDisplayName = custKey; notiObj->InstanceName = customerName; notiResult = notiObj->notify("", "", "", timeNow); notiResult = notiObj->changed(); notifInstance->Causes += notiObj;

Page 29: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

29© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #4 – Business Impact Management

Sample Code Continued:– Similar processing required when clearing– Remember to call changed() to indicate to all clients that the Notification has been

altered.

// Loop through the list of Customers associated to the specified component // if (debug) { print("devCustList = ".devCustList); } foreach custKey (devCustList) { if (debug) { print("Customer Key : ".custKey); } notifKey = "NOTIFICATION-Customer_".custKey."_ServiceImpacted"; custNotif = self->object(notifKey); if (custNotif->isNull()) {

print("WARNING: Missing Customer Notification ".notifKey. " for Device Mapping GA_StringDictionary DSLAM_AD. Device =“ .keyElementName);

}else { notifResult = custNotif->clear("", "", ""); notifResult = custNotif->changed(); } }

Page 30: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

30© Copyright 2010 EMC Corporation. All rights reserved.

Agenda

Brief Technology Overview and Definition of Terms

Points of Extensibility– Dynamic MODEL

Use Case #1 – Configuring Multiple Thresholds Use Case #2 – Site Failure Analysis

– Business Impact and Maintenance Use Case #3 – Maintenance Use Case #4 – Business Impact Management

– SAM Automatic Actions and Notification List Subscribers Use Case #5 – SAM Actions Use Case #6 – Combining RCA with Abstract Events

– Notification Manager Use Case #7 – Advanced Event Management with Notification Manager

Recap

Questions

Page 31: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

31© Copyright 2010 EMC Corporation. All rights reserved.

Actions within ASL/Java/Perl/C++

Say you have a custom Notification List subscriber built– Want to be able to do more with it?

Suite of actions available:– Executed as a MODEL method, inherently language and mostly platform independent– ACT_SNMP

Send a trap/traps/informs Request data via get or getNext

– ACT_ICMP Ping the designated target IP or system

– ACT_Mail Send a SMTP mail message

– ACT_Script Execute a script (run) on the server within the Ionix directory structure and return an integer

value Execute the script (run_ex) and return both a result (integer) as well as any text (stdout)

– ACT_Perl Similar to the above, but invokes Perl natively

Imagine the possibilities!– Feedback loops for collecting additional data for the audit log

Page 32: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

32© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #5 – Script Actions in SAM

RoutersServers

Switches

OSPF

Firewalls

IS-IS

BGP

EIGRPMPLS

NLSubscriber

Discovery &

Monitoring

ACT_Script

VI-SDK

run_ex()

Results

Add to audittext

Expect

Page 33: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

33© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #5 – Script Actions using SAM and IP

RoutersServers

Switches

OSPF

Firewalls

IS-IS

BGP

EIGRPMPLS

NLSubscriber

Discovery &

Monitoring

ACT_Script

Expect

run_ex()

Results

Add to audittext

Page 34: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

34© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #5 – Script Actions in SAM

General Code Overview– After a notification is received, create an ACT_Script object (if one doesn’t already

exist) Should have one per NL subscriber

– Invoke the ACT_Script and passing parameters: readonly script_result_t run_ex(in string parameters = "", in string stdindata =""); The “parameters” argument describes a space delimited set of arguments to be passed

• For example “--version --output” The stdindata is a string passed to the stdin of the process created to run the script

– Retrieving results Results are returned in a data structure

struct script_result_t { int result_code; string result_text; };

The data structure is returned as a list in ASL– Interpretation of the results can then interact with the domain manager– Note: This implementation is a single thread, but you can launch new processing

agents by calling GA_Driver::start() or startWithParameters() with waitForCompletion set to FALSE

– Solution posted to https://community.emc.com/message/458163

Page 35: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

35© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #6 – Combining RCA with Abstract Events

Sometimes we can have abstract events that we want to combine with root-causes

– Perhaps to perform accounting of various data sources– Perhaps you simply want to have an explanation tree

Approach is based around three key points:– SAM considers any two sources that share the same triplet (class, instance, event)

are the *same* event– The use of Aggregate Notifications– Events can be explained in SAM even if the source is not the same as the

explanation Domain A presents problem X causes event Y, but Y is not subscribed Domain B presents event Y SAM will indicate X causes Y even though they are in different domains

Aggregate notifications– Active if one or more related notifications are active– Severity is the maximum of the related notifications

Page 36: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

36© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #6 – Combining RCA with Abstract Events

ManagedDomain

Discovery &

Monitoring

SNMP, ICMP& Traps

Root CauseIonix

CMDB

BusinessImpacts

RoutersServers

Switches

OSPF

Firewalls

IS-IS

BGP

EIGRPMPLS

Smart Adapter Platform (OI)

Page 37: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

37© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #6 – Combining RCA with Abstract Notifications

Specific case details:– Specific alarms that indicate Network Adapters problems– Don’t want or need to know what the “TRUE” RCA happens to be– Configured a filter to receive just those messages in a Notification List subscriber

In SAM:– AM produces some RCA

RCA explains NetworkAdapter_Fault::<instance>::DownOrFlapping

– SMART Adapter Platform receives events Aggregates the events to the same class, instance, event as the explained symptom

– Causal links are formed: RCA->explains->Aggregate->aggregates->abstract notifications

Solution:– Posted to https://community.emc.com/message/458163

Page 38: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

38© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #6 – Combining RCA with Abstract Notifications

NL { type: { "NL_NOTIFY" | "NL_CHANGE" | "NL_CLEAR" | "NL_DELETE" } fs classDisplayName: word fs instanceDisplayName: word fs eventDisplayName: word fs localPropObjectName: word .. eol } do {// locate properties object and extract true C:I:E localPropObj = self->object( "ASL_NLData" , localPropObjectName ) ;

instance = localPropObj->get( "InstanceName" ) ; class = localPropObj->get( "ClassName" ) ; event = localPropObj->get( "EventName" ) ; icsNotificationFactory = object( getInstances( "ICS_NotificationFactory" )[0] ) ? IGNORE ;

Page 39: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

39© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #6 – Combining RCA with Abstract Notifications Cont.

AggClassName = "Network_Adapter_Fault" ; AggInstanceName = instance ; AggEventName = "DownOrFlapping" ; eventObj = icsNotificationFactory->findNotification( class , instance , event ) ;

aggObj = icsNotificationFactory->makeAggregate( AggClassName , AggInstanceName , AggEventName , eventObj ) ? IGNORE ;

admin = "admin" ; OIDomainName = "INCHARGE-OI" ; aggObj->notify( admin , OIDomainName ) ; aggObj->changed() ;}

Page 40: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

40© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #6 – Classic ACM Example

UnitaryComputerSystem::ResourceException

IP Server Performance Manager

Host Monitoring Software

ESX Performance Data

SoftwareService::Major/Minor/DegradedSymptoms

ACM Internal Polling

Host Monitoring Software

ESX Performance Data

ApplicationTaskCheck::Degraded

Causes

VMWare AppSpeed

Cisco Netflow

Synthetic Transaction Tests

Causes

Aggregation Aggregated NotificationsRaw Symptoms

Page 41: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

41© Copyright 2010 EMC Corporation. All rights reserved.

Agenda

Brief Technology Overview and Definition of Terms

Points of Extensibility– Dynamic MODEL

Use Case #1 – Configuring Multiple Thresholds Use Case #2 – Site Failure Analysis

– Business Impact and Maintenance Use Case #3 – Maintenance Use Case #4 – Business Impact Management

– SAM Automatic Actions and Notification List Subscribers Use Case #5 – SAM Actions Use Case #6 – Combining RCA with Abstract Events

– Notification Manager Use Case #7 – Advanced Event Management with Notification Manager

Recap

Questions

Page 42: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

42© Copyright 2010 EMC Corporation. All rights reserved.

Notification Manager – Operational Challenge

Not all notifications are associated with hard failures

Need to reduce time and effort associated with events

Some ‘sympathetic’ alarms become root cause problems

Reoccurring notifications can indicate future problems

The sheer volume of notifications are overwhelming

Manually customized scripting is complex and inefficient

Customers of EMC Ionix want/need an effective solution to analyze unmanaged alarms

Page 43: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

43© Copyright 2010 EMC Corporation. All rights reserved.

Notification Manager – Key Values

Converts unmanaged notifications into meaningful information

Eliminates the need for manual event scripting and rules writing

Improves event processing significantly

Allows for easy, modular distribution of new event-handling policies

Tracks and documents event policy changes automatically

Adapts to a wide variety of event sources

Page 44: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

44© Copyright 2010 EMC Corporation. All rights reserved.

Notification Manager – UI Example

Page 45: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

45© Copyright 2010 EMC Corporation. All rights reserved.

Notification Manager - Sample Capabilities

Active/Inactive check-box

Expiration Clearing (lifetime of event)

Unknown Agent (create or ignore)

Logging specifications

Notification field setting

Enumerated value mapping

De-duplication

Time-based threshold

Dynamic-discard flag

Is-Managed check

In-Maintenance

Calculated values for any field operators

Hook scripts

Clears-For (uses NCI & ECI)

Delayed Publication

Aggregation

Causes/CausedBy Relationship Support

Page 46: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

46© Copyright 2010 EMC Corporation. All rights reserved.

OI-3

SAM-AGG

DSLAM::DOWNXD

Cable::DOWNXD

Delay 60 sec.

InterfaceDownAMPM

DSLAM::DownDSL

DSLAM::UnresponsiveDSL

DSLAM::LINKDOWNOI-3

EMSAgent::DownDSL

DSLAM::SNTPCOMMOI-3

DSLAM::ISOLATIONOI-3

OI-3

DSLAM::InBandDOWNNotif

Aggregate & CausedByNotification HookCausedBy E-E::Self::Sam-agg-1

Notification Hook/Notif

DSLAM::Agg.CommunicationsProblemNotif

CausedBy E-E::ConnectedSystems::Sam-agg-1Notif

DSLAM Fault Handling

AggregatesToCausedBy E-E::Self::Sam-agg-1

Notif

CausedBy E-E::HostedBy::DSL-XNotif

AggregatesToCausedBy E-E::Self::Sam-agg-1

Notif

AggregatesToCausedBy E-E::Self::Sam-agg-1

Notif

DSLAM::LINKDOWNOI-3 – Delay 10 min

DSLAM::SNTPCOMMOI-3 – Delay 10 min

DSLAM::ISOLATIONOI-3 – Delay 10 min

AggregatesToCausedBy E-E::Self::Sam-agg-1

Notif

AggregatesToCausedBy E-E::Self::Sam-agg-1

Notif

AggregatesToCausedBy E-E::Self::Sam-agg-1

Notif

Use Case #7 – Notification Manager

Page 47: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

47© Copyright 2010 EMC Corporation. All rights reserved.

Use Case #7 – Notification Manager

Page 48: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

48© Copyright 2010 EMC Corporation. All rights reserved.

Agenda

Brief Technology Overview and Definition of Terms

Points of Extensibility– Dynamic MODEL

Use Case #1 – Configuring Multiple Thresholds Use Case #2 – Site Failure Analysis

– Business Impact and Maintenance Use Case #3 – Maintenance Use Case #4 – Business Impact Management

– SAM Automatic Actions and Notification List Subscribers Use Case #5 – SAM Actions Use Case #6 – Combining RCA with Abstract Events

– Notification Manager Use Case #7 – Advanced Event Management with Notification Manager

Recap

Questions

Page 49: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

49© Copyright 2010 EMC Corporation. All rights reserved.

Recap

You should now have a good understanding of:– The automation capabilities in the Ionix IT Operations Intelligence suite

Some ideas about how those can be applied to your environment Excitement to apply these ideas

– How Dynamic Model can be used to extend the functionality of the existing suite Suite is highly data-driven you can accomplish a lot with a few small changes

– How to use business impact weighting and maintenance windows to prioritize work Helps focus the operations staff on what is important Keeps known issues away from the staff until unplanned effects are noticed

Next Steps– Use these techniques in your environment– Feedback is always appreciated

Page 50: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

50© Copyright 2010 EMC Corporation. All rights reserved.

EMC Developer NetworkThe Essential Community for the EMC Developer

EDN: EMC Developer Network

http://developer.emc.com

Code, content, collaboration

For and by developers

Accelerate your development

Register for EDN

Find the community for you

Search, view, post, question or collaborate on code and content

Participate in Open Exchange

Meet or link with other developers

Page 51: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

51© Copyright 2010 EMC Corporation. All rights reserved.

Ionix Developer Community

Ionix Developer Community on EDN

https://community.emc.com/community/edn/ionix

Downloads, discussions, documentation, adapters

Q&A and collaboration

And experts just like you…

Page 52: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

52© Copyright 2010 EMC Corporation. All rights reserved.

EMC Community Network @ EMC World

ECN Lounge on the Exhibit Floor, Booth #429

• Learn about ECN communities & get a free shirt!

• Briefings on EMC Proven Professional, ECN, Networker, VMware, Studio E (for customers), Support Forums, Celerra & Centera, & a new RSA customer community

• Wed 2pm EDN Monster Mash Developer Challenge Award Announcement

Ionix developer sessions

• Wed 5/12 9:30am Ionix UIM: Managing your Vblock Infrastructure with Service Catalog APIs. Fred Crable

• Thur 5/13 11:30am Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs. Bill Kuhhirte

• View session content and discussions on EDN - Ionix

Page 53: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

53© Copyright 2010 EMC Corporation. All rights reserved.

EMC Community Networkhttp://community.emc.com

Page 54: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs

54© Copyright 2010 EMC Corporation. All rights reserved.

Questions

Questions?

Page 55: Leveraging Ionix IT Operations Intelligence to make your Private Cloud more efficient using APIs