automated out-of-band management with ansible and redfish

49
Automated Out-of-Band Management with Ansible and Redfish Jose Delarosa, Dell EMC October 23, 2017

Upload: jose-de-la-rosa

Post on 22-Jan-2018

262 views

Category:

Software


4 download

TRANSCRIPT

Page 1: Automated out-of-band management with Ansible and Redfish

Talk Title HereAuthor Name, Company

Automated Out-of-Band Management

with Ansible and RedfishJose Delarosa, Dell EMC

October 23, 2017

Page 2: Automated out-of-band management with Ansible and Redfish

Before we start

• Thank you for coming

• Please ask questions: It’s OK to interrupt

• If time runs out, happy to talk to you afterwards

Page 3: Automated out-of-band management with Ansible and Redfish

Who am I

• Jose Delarosa (@jdelaros1)– Linux Engineer at Dell EMC (13 years)

– Part-time technology evangelist

– Part-time developer

– Part-time systems engineer

– Full-time problem solver

Page 4: Automated out-of-band management with Ansible and Redfish

Agenda

1. iDRAC: provides out-of-band management

2. Redfish: provides scalability

3. Ansible: provides automation

4. Putting it all together:

Page 5: Automated out-of-band management with Ansible and Redfish

iDRAC Overview

Page 6: Automated out-of-band management with Ansible and Redfish

Integrated Dell Remote Access Controller (iDRAC)

• Embedded chip on a PowerEdge server, independent of the server’s

operating system and main power:

– Provides device inventory

– Detects hardware failure

– Manage power: turn off, on, hard reset

• Has its own Ethernet port, usually connected to separate

management network

• Referred to as “out-of-band” (OOB) management, as opposed to “in-

band” management provided by the operating system.

Page 7: Automated out-of-band management with Ansible and Redfish

Login

Page 8: Automated out-of-band management with Ansible and Redfish

Dashboard

Page 9: Automated out-of-band management with Ansible and Redfish

Storage Controller

Page 10: Automated out-of-band management with Ansible and Redfish

Hard Drives

Page 11: Automated out-of-band management with Ansible and Redfish

Fans

Page 12: Automated out-of-band management with Ansible and Redfish

Temperatures

Page 13: Automated out-of-band management with Ansible and Redfish

System Event Logs

Page 14: Automated out-of-band management with Ansible and Redfish

Simple Out-of-Band Management

ManagementNetwork

Page 15: Automated out-of-band management with Ansible and Redfish

Redfish Overview

Page 16: Automated out-of-band management with Ansible and Redfish

What is Redfish?

• Open source, open industry standard specification published by

the DMTF for hardware management.

• A RESTful API used to obtain information and exert control over

servers via an OOB controller.

• Built on a modern tool-chain, which includes HTTPS, JSON and

the OData standard.

• A redfish “command” is sent as a URI request, so a client can be

any application on a server, workstation or mobile device.

Page 17: Automated out-of-band management with Ansible and Redfish

Scalable Out-of-Band Management

ManagementNetwork

https://<idrac>/redfish/v1/Systems/Systems.Embedded.1

{ Health OK HealthRollup OK State Enabled }

Page 18: Automated out-of-band management with Ansible and Redfish

What can we do with Redfish?

• Get server health status

• Alert on server health status changes

• Retrieve hardware and firmware inventory

• Reset, reboot, and power control servers

• Access system logs

• Configure OOB controller

• Much more!

https://www.dmtf.org/standards/redfish

Page 19: Automated out-of-band management with Ansible and Redfish

Redfish API tree structure

Page 20: Automated out-of-band management with Ansible and Redfish

iDRAC operation APIs

Dell Redfish API URLs Comments

/redfish/v1/Managers

/redfish/v1/Managers/iDRAC.Embedded.1

/redfish/v1/Managers/iDRAC.Embedded.1/Actions/Manager.Reset Used to perform iDRAC reset

/redfish/v1/Managers/iDRAC.Embedded.1/NetworkProtocolReports information about iDRAC's network services. Includes Web server, SNMP, vMedia, Telnet, SSH, IPMI & KVM.

/redfish/v1/ Managers/iDRAC.Embedded.1/SerialInterfaces iDRAC BMC serial interface

/redfish/v1/ Managers/iDRAC.Embedded.1/SerialInterfaces/<Serial-key>

/redfish/v1/Managers/iDRAC.Embedded.1/LogServices

/redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Sel Access to server System Event Log

/redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Lclog Access to Lifecycle Controller Log

/redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Sel/Actions/LogService.ClearLog Used to clear LC Log

/redfish/v1/Managers/iDRAC.Embedded.1/VirtualMedia Status of iDRAC virtual media

/redfish/v1/Managers/iDRAC.Embedded.1/VirtualMedia/<media-type>

/redfish/v1/Managers/iDRAC.Embedded.1/EthernetInterfaces iDRAC network interface

/redfish/v1/Managers/iDRAC.Embedded.1/EthernetInterfaces/<FQDD>

/redfish/v1/Managers/iDRAC.Embedded.1/AccountService

/redfish/v1/Managers/iDRAC.Embedded.1/Accounts iDRAC user accounts

/redfish/v1/Managers/iDRAC.Embedded.1/Accounts/<Account-Id>

Page 21: Automated out-of-band management with Ansible and Redfish

Chassis Inventory APIs

Dell Redfish API URLs Comments

/redfish/v1/Chassis

/redfish/v1/Chassis/System.Embedded.1 Top-level URI for server chassis

/redfish/v1/Chassis/System.Embedded.1/Thermal System temperatures

/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans Reports fan status for server and FX2 chassis

/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/<Fan-FQDD>

/redfish/v1/Chassis/System.Embedded.1/Sensors/Temperatures Reports thermal data for server and FX2 chassis

/redfish/v1/Chassis/System.Embedded.1/Sensors/Temperatures/<Sensor-FQDD> <Sensor-FQDD> addresses each temperature probe

/redfish/v1/Chassis/System.Embedded.1/Power Power consumption and supply status

/redfish/v1/Chassis/System.Embedded.1/Power/PowerControl

/redfish/v1/Chassis/System.Embedded.1/Sensors/Voltages

/redfish/v1/Chassis/System.Embedded.1/Sensors/Voltages/<Voltage-FQDD> <Voltage-FQDD> addresses each voltage output

/redfish/v1/Chassis/System.Embedded.1/Power/PowerSupplies

/redfish/v1/Chassis/System.Embedded.1/Power/PowerSupplies/<PSU-FQDD> <PSU-FQDD> addresses each power supply

/redfish/v1/Chassis/System.Embedded.1/Power/Redundancy/<PSRedundancy-FQDD> <PSRedundancy-FQDD> addresses power supply redundancy

Page 22: Automated out-of-band management with Ansible and Redfish

System status APIs

Dell Redfish API URLs Comments

/redfish/v1 Top-level API access

/redfish/v1/Systems Server inventory and status information access

/redfish/v1/Systems/<ServiceTag+nodeid>

/redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset Server reset operation

/redfish/v1/Systems/System.Embedded.1/Processors Details on CPUs

/redfish/v1/Systems/System.Embedded.1/Processors/<Processor-FQDD>

/redfish/v1/Systems/System.Embedded.1/EthernetInterfaces Reports NIC IP address, DHCP and DNS information.

/redfish/v1/Systems/System.Embedded.1/EthernetInterfaces/<EthernetInterface-FQDD> Example <EthernetInterface-FQDD> = NIC.Embedded.1-1-1

/redfish/v1/Systems/System.Embedded.1/EthernetInterfaces/<EthernetInterface-FQDD>/Vlans

/redfish/v1/Systems/System.Embedded.1/EthernetInterfaces/<EthernetInterface-FQDD>/Vlans/<Vlan-FQDD>

/redfish/v1/Systems/System.Embedded.1/Storage/Controllers Manages storage controllers (i.e. PERC).

/redfish/v1/Systems/System.Embedded.1/Storage/Controllers/<Controller-FQDD>Typical <Controller-FQDD>=RAID.Slot.N-1; describes details of controller, backplane, enclosure, attached drives

Page 23: Automated out-of-band management with Ansible and Redfish

Registries, Sessions, Tasks and Event APIs

Dell Redfish API URLs Comments

/redfish/v1/Registries/Messages/En PowerEdge message registry

/redfish/v1/odata Enables OData clients to navigate iDRAC Redfish resources

/redfish/v1/$metadataProvides a metadata document describing the resources and collections that are available at the iDRAC Redfish service root URI

/redfish/v1/$metadata#<Collection or a single resource>

/redfish/v1/JSONSchemas Schema descriptions for all supplied data

/redfish/v1/JSONSchemas/<file>

/redfish/v1/SessionService Redfish session management

/redfish/v1/Sessions

/redfish/v1/Sessions/<SessionId>

/redfish/v1/TaskService Redfish internal task management

/redfish/v1/EventService Redfish event management

/redfish/v1/EventService/Actions/EventService.SubmitTestEvent

/redfish/v1/EventSubscriptions

/redfish/v1/EventSubscriptions/<Subscription ID>

Page 24: Automated out-of-band management with Ansible and Redfish

Example: Get system health

$ curl -s https://<idrac>/redfish/v1/Systems/System.Embedded.1 -k -u root:calvin | jq .Status

{

"Health": "OK",

"HealthRollUp": "OK",

"State": "Enabled"

}

Page 25: Automated out-of-band management with Ansible and Redfish

Example: Get storage controller information

$ curl -s https://<idrac>/redfish/v1/Systems/System.Embedded.1/Storage/Controllers/RAID.Slot.4-1 -k -u root:calvin |

jq '{name: .Name, status: .Status.Health}’

{

"name": "PERC H740P Adapter ",

"status": "OK"

}

Page 26: Automated out-of-band management with Ansible and Redfish

Example: Get HDD information

$ curl -s https://<idrac>/redfish/v1/Systems/System.Embedded.1/Storage/Controllers/RAID.Slot.4-1 -k -u root:calvin | jq .Devices[

{

"CapacityBytes": 599550590976,

"Manufacturer": "SEAGATE",

"Model": "ST600MM0238",

"Name": "Physical Disk 0:1:0",

"Status": {

"Health": "OK",

"HealthRollup": "OK",

"State": "Enabled"

}

},

{

"CapacityBytes": 599550590976,

"Manufacturer": "SEAGATE",

"Model": "ST600MM0238",

"Name": "Physical Disk 0:1:1",

"Status": {

"Health": "OK",

"HealthRollup": "OK",

"State": "Enabled"

}

},

Page 27: Automated out-of-band management with Ansible and Redfish

Example: Get fan information

$ curl -s https://<idrac>/redfish/v1/Chassis/System.Embedded.1/Thermal -k -u root:calvin | jq '.Fans[] | {name:.Name, reading: .Reading, health: .Status.Health}'

{

"name": "System Board Fan1",

"reading": 5640,

"health": "OK"

}

{

"name": "System Board Fan2",

"reading": 5640,

"health": "OK"

}

{

"name": "System Board Fan3",

"reading": 5640,

"health": "OK"

}

..

Page 28: Automated out-of-band management with Ansible and Redfish

Example: Get thermal information

$ curl -s https://10.9.165.12/redfish/v1/Chassis/System.Embedded.1/Thermal -k -u root:calvin | jq '.Temperatures[] | {name:.Name, readingCelsius: .ReadingCelsius, health: .Status.Health}'

{

"name": "CPU1 Temp",

“readingCelsius": 29,

"health": "OK"

}

{

"name": "CPU2 Temp",

“readingCelsius": 28,

"health": "OK"

}

{

"name": "System Board Exhaust Temp",

“readingCelsius": 29,

"health": "OK"

}

..

Page 29: Automated out-of-band management with Ansible and Redfish

Example: Get system event logs

$ curl -s https://<idrac-ip>/redfish/v1/Managers/iDRAC.Embedded.1/Logs/Sel -k -u root:calvin | jq '.Members[] | {date: .Created, message: .Message, severity: .Severity}'

..

{

"date": "2017-09-26T13:33:00-05:00",

"message": "Power supply redundancy is lost.",

"severity": "Critical"

}

{

"date": "2017-09-26T13:32:53-05:00",

"message": "The power input for power supply 2 is lost.",

"severity": "Critical"

}

{

"date": "2017-09-16T10:37:59-05:00",

"message": "Log cleared.",

"severity": "Ok"

}

Page 30: Automated out-of-band management with Ansible and Redfish

Redfish Roadmap

• Version 1.x focused on servers. Will expand over time to cover

storage and network infrastructure.

• Will add devices over time to cover new technologies (i.e.

NVDIMMs, Multifunction Network Adapters)

• SNIA is developing Swordfish, which builds upon Redfish’s local

storage management to address advanced storage devices.

• Open source efforts (http://github.com/dmtf)– Client libraries (Python, Java, PowerShell)

– Redfish Mockup Creator / Server

– Redfishtool (CLI utility similar to ipmitool)

Page 31: Automated out-of-band management with Ansible and Redfish

Ansible Overview

Page 32: Automated out-of-band management with Ansible and Redfish

What is Ansible

• Automation software makes repetitive tasks easy

• Agentless minimum footprint

• No database back end easy to install and use

• Defines desired state OK to run task more than once

• Remote tasks are run in parallel Efficient

• Easier to learn and use than shell scripts

Page 33: Automated out-of-band management with Ansible and Redfish

Ansible use cases

OpenStack• Compute nodes• Storage nodes• Controller nodes

IT Security Hardening• Firewall rules• Remove invalid users• Install latest updates

Container Management• Stop/remove containers• Refresh container images• Deploy with new images

1-to-n management Executes tasks in parallel

Page 34: Automated out-of-band management with Ansible and Redfish

Popular Ansible use cases

• Cloud infrastructure deployments

• Application installation & configuration

• Container management

• Security compliance

• IT audits

• OOB systems management

Page 35: Automated out-of-band management with Ansible and Redfish

Ansible definitions

• Task: A task is the smallest unit of work. It can be an action like “Install a package”, “Remove a user”,

“Create a firewall rule” or “Copy this file to this location”.

• Play: A play is made up of tasks. For example, the Play “Prepare a database” is made up of tasks:

– Task: “Install the database package”

– Task: “Set password”

– Task: “Create database”

– Task: “Set database access”.

• Playbook: A playbook is made up of Plays. For example the playbook “Prepare a web site with a

database” is made of up Plays: 1) “Set up the database server” and 2) “Set up the web server”.

Playbook: Setup my web application

Play 1: Setup database

Task 1:Install mysql package

Task 2:Create database customer_db

Play 2: Setup web server

Task 1:Install httpd package

Task 2:Configure site for TLS

Page 36: Automated out-of-band management with Ansible and Redfish

Example: Automating routine tasks

$ groupadd admin$ useradd -c Sys Admin -g admin -m sysman$ mkdir /opt/tools$ chmod 755 /opt/tools$ chown sysman /opt/tools$ yum -y install httpd$ yum -y update$ systemctl enable httpd$ systemctl start httpd$ rm /etc/motd

- name: daily tasks hosts: my_100_daily_servers tasks: - group: name=admin state=present - user: name=sysman comment="Sys Admin" group=admin - file: path=/opt/tools state=directory owner=sysman mode=0755 - yum: name=httpd state=latest - yum: name=* state=latest - service: name=httpd state=started enabled=yes - file: path=/etc/motd state=absent

Say you provision 100 servers every

day and you run these commands

in each server:

The same commands can be placed in an Ansible

playbook and executed in 100 servers:

yaml format = easy to learn! yaml format = easy to organize!

Page 37: Automated out-of-band management with Ansible and Redfish

One more Ansible definition

• Module

– An Ansible module is a program where a task is implemented.

– A playbook is where you specify the instructions (tasks) you want to run; a module

is the code to implement those instructions.

– Modules can be written in any language but most

popular is Python.

– If you are an operator, you will work mostly with

playbooks.

– If you are a developer, you will work mostly with

modules.

Page 38: Automated out-of-band management with Ansible and Redfish

Scalable & Automated Out-of-Band Management

ManagementNetwork

{ Health OK HealthRollup OK State Enabled }

https://<idrac>/redfish/v1/Systems/Systems.Embedded.1

Page 39: Automated out-of-band management with Ansible and Redfish

Putting it all together:iDRAC + Redfish + Ansible

Page 40: Automated out-of-band management with Ansible and Redfish

Ansible module and playbooks for iDRAC

• Manage your entire Dell EMC IT infrastructure (servers,

routers, switches, storage) from an Ansible controller.

• Automated monitoring, provisioning, firmware updates at

scale.

• Open source, so you can write your own extensions as

needed and contribute back to the community.

• To be included as an Ansible community module.

Page 41: Automated out-of-band management with Ansible and Redfish

• Server Power On/Off; Reboot; Hard Reset

• Install BIOS, Configure BIOS, Reset to Default

• Configure iDRAC (CRUD operations):

– User & Password Management

– NTP and Time Zone settings

– Storage (RAID, Physical Disks, Virtual Disks)

• System Inventory – H/W, Firmware, Sensor

• OS Deployment – remote file share, vMedia

• Import / Export SCP – remote file share, vMedia

• Backup and Restore– Server Profiles

Key lifecycle management tasks

• Upgrade using DSU (Dell Server Update) or DUEC (Dell Update Engine for Consoles)

– Get list of available and applicable updates

– Firmware Upgrade

– BIOS Upgrade

– OS Drivers Upgrade

• Job Management– Check JOB status– Create JOB– Delete JOB– Create JOB Queue– Delete JOB Queue

• Get Logs– Export LC logs– Export System Event Logs

Page 42: Automated out-of-band management with Ansible and Redfish

Implementation example: get system inventory

Playbook: Execute: Output:

Page 43: Automated out-of-band management with Ansible and Redfish

Inventory spreadsheet

• Collect inventory data, maintain in spreadsheet or database

Server iDRAC IP Model IP address BIOS CPU Type RAM Service Tag Status

webserver-1 192.168.2.10 PowerEdge R630 10.0.1.30 2.3.4 2Intel(R) Xeon(R) CPU E5-2630 v3 @

2.40GHz128 5W14Q47 OK

webserver-2 192.168.2.11 PowerEdge R630 10.0.1.31 2.3.4 2Intel(R) Xeon(R) CPU E5-2630 v3 @

2.40GHz128 5XR7Q32 OK

webserver-3 192.168.2.12 PowerEdge R630 10.0.1.33 2.3.2 2Intel(R) Xeon(R) CPU E5-2630 v3 @

2.40GHz128 5XT3QYY OK

appserver-1 192.168.2.13 PowerEdge R830 10.0.1.34 2.3.2 4Intel(R) Xeon(R) CPU E5-2630 v3 @

2.60GHz512 5XR7QXY OK

dbserver-1 192.168.3.10 PowerEdge R730 10.0.2.30 2.1.2 2Intel(R) Xeon(R) CPU E5-2630 v3 @

2.33GHz256 5XR7Q67 OK

dbserver-2 192.168.3.11 PowerEdge R730 10.0.2.31 2.3.4 2Intel(R) Xeon(R) CPU E5-2630 v3 @

2.33GHz256 5WT4Q37 OK

dbserver-3 192.168.3.12 PowerEdge R730 10.0.2.32 2.3.4 2Intel(R) Xeon(R) CPU E5-2630 v3 @

2.33GHz256 5WR4Q12 OK

dbserver-4 192.168.3.13 PowerEdge R730 10.0.2.33 2.3.4 2Intel(R) Xeon(R) CPU E5-2630 v3 @

2.33GHz256 5TT1Q21 OK

Page 44: Automated out-of-band management with Ansible and Redfish

Implementation example: get HDD info and health

Playbook: Output:

Page 45: Automated out-of-band management with Ansible and Redfish

Implementation example: get BIOS boot order

Playbook:Execute: Output:

Page 46: Automated out-of-band management with Ansible and Redfish

https://github.com/dell/idrac-ansible-module

Development is ongoing. Contributions are welcome!

Page 47: Automated out-of-band management with Ansible and Redfish

Resources

• iDRAC with Lifecycle Controller: http://dell.to/2qdBd0y

• Redfish API specification: http://bit.ly/2gb9VBj

• Dell EMC PowerEdge Redfish API Overview: http://dell.to/2odsH1p

• iDRAC Redfish API Reference Guide: http://dell.to/2oyjMTy

• Getting started with Ansible: http://bit.ly/2oCj5xy

• jq JSON parser: https://stedolan.github.io/jq/

Page 48: Automated out-of-band management with Ansible and Redfish

Thank you

Q & A

Page 49: Automated out-of-band management with Ansible and Redfish