cisco ucs integrated infrastructure for big data with …€¢ cisco ucs integrated infrastructure...

5
1 © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. In collaboration with: Highlights Ease of deployment • Cisco UCS® Manager automates deployment and scaling, reducing risk of configuration errors that can cause downtime. Scalability for big data workloads • The Cisco UCS Integrated Infrastructure for Big Data solution offers linear scalability and simplification of essential operations for single-rack and multiple-rack deployments. Comprehensive integrated infrastructure • Cisco UCS Integrated Infrastructure for Big Data solutions include computing, storage, connectivity, and unified management. Simplified management • Cisco UCS Director Express for Big Data offers one-click provisioning, installation, and configuration. Multi-tenancy with MapR • The MapR Distribution including Apache Hadoop offers multi-tenancy with no need for additional setup. It supports logical partitions in a physical cluster for separate administrative control, data placement, and job processing. Simplified management through MapR Control System (MCS) • MCS gives Hadoop administrators a single place for configuring, monitoring, and managing their clusters. Two major features exposed by MCS, heatmaps and job metrics, dramatically simplify administration of a cluster. Cisco UCS Integrated Infrastructure for Big Data with MapR Cisco and MapR Deliver Performance and Multi-tenancy to Help Tame Big Data Big data provides an enormous wealth of information to your organization. But to gain the most benefit, you need to manage it efficiently. And you must make sure that all this data is separated and isolated so that each set of users can see and work on only the data that they are authorized to use. Challenges of Multi-tenancy for Big Data Organizations seek to share IT resources cost efficiently and securely among multiple applications, data, and user groups. Platforms that support this architecture are commonly known as multitenant technologies. Multi-tenancy is the capability of a single instance of software to serve multiple tenants. A tenant is a group of users that have the same view of the system. Hadoop is an enterprise data hub, and it demands multi-tenancy. Big data platforms are increasingly expected to support multi-tenancy by default. Multi-tenancy requires isolation of the distinct tenants: both the data in the data platform and the computing aspect. To support, solutions need to: Help ensure that service-level agreements (SLAs) are met Help guarantee data and compute isolation • Enforce quotas Establish security and delegation Help ensure low-cost operations and simpler manageability The Solution: Cisco UCS Integrated Infrastructure for Big Data with MapR The Cisco UCS® Integrated Infrastructure for Big Data solution includes computing, storage, connectivity, and unified management capabilities to help companies manage the dramatically increasing data that they must cope with today. It is built on Cisco Unified Computing System™ (Cisco UCS) infrastructure using Cisco UCS 6200 Series Fabric Interconnects, (optional) Cisco Nexus® 2200 platform fabric extenders, and Cisco UCS C-Series Rack Servers. Installed in pairs, the fabric interconnects offer redundant, active-active connectivity and embedded management using Cisco UCS Manager. MapR is a complete distribution for Apache Hadoop that packages more than a dozen projects from the Hadoop ecosystem to provide you with a broad set of big data capabilities. The MapR platform provides enterprise-class features such as high availability, disaster recovery, security, and full data protection. It also allows Hadoop

Upload: vananh

Post on 16-Apr-2018

227 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Cisco UCS Integrated Infrastructure for Big Data with …€¢ Cisco UCS Integrated Infrastructure for Big Data solutions include computing, storage, ... Cisco UCS Solution for MapR

1 © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.

In collaboration with:

Highlights

Ease of deployment

• Cisco UCS® Manager automates deployment and scaling, reducing risk of configuration errors that can cause downtime.

Scalability for big data workloads

• The Cisco UCS Integrated Infrastructure for Big Data solution offers linear scalability and simplification of essential operations for single-rack and multiple-rack deployments.

Comprehensive integrated infrastructure

• Cisco UCS Integrated Infrastructure for Big Data solutions include computing, storage, connectivity, and unified management.

Simplified management

• Cisco UCS Director Express for Big Data offers one-click provisioning, installation, and configuration.

Multi-tenancy with MapR

• The MapR Distribution including Apache Hadoop offers multi-tenancy with no need for additional setup. It supports logical partitions in a physical cluster for separate administrative control, data placement, and job processing.

Simplified management through MapR Control System (MCS)

• MCS gives Hadoop administrators a single place for configuring, monitoring, and managing their clusters. Two major features exposed by MCS, heatmaps and job metrics, dramatically simplify administration of a cluster.

Cisco UCS Integrated Infrastructure for Big Data with MapR

Cisco and MapR Deliver Performance and Multi-tenancy to Help Tame Big Data

Big data provides an enormous wealth of information to your organization. But to gain the most benefit, you need to manage it efficiently. And you must make sure that all this data is separated and isolated so that each set of users can see and work on only the data that they are authorized to use.

Challenges of Multi-tenancy for Big DataOrganizations seek to share IT resources cost efficiently and securely among multiple applications, data, and user groups. Platforms that support this architecture are commonly known as multitenant technologies.

Multi-tenancy is the capability of a single instance of software to serve multiple tenants. A tenant is a group of users that have the same view of the system. Hadoop is an enterprise data hub, and it demands multi-tenancy. Big data platforms are increasingly expected to support multi-tenancy by default. Multi-tenancy requires isolation of the distinct tenants: both the data in the data platform and the computing aspect.

To support, solutions need to: • Help ensure that service-level agreements (SLAs) are met

• Help guarantee data and compute isolation

• Enforce quotas

• Establish security and delegation

• Help ensure low-cost operations and simpler manageability

The Solution: Cisco UCS Integrated Infrastructure for Big Data with MapRThe Cisco UCS® Integrated Infrastructure for Big Data solution includes computing, storage, connectivity, and unified management capabilities to help companies manage the dramatically increasing data that they must cope with today. It is built on Cisco Unified Computing System™ (Cisco UCS) infrastructure using Cisco UCS 6200 Series Fabric Interconnects, (optional) Cisco Nexus® 2200 platform fabric extenders, and Cisco UCS C-Series Rack Servers. Installed in pairs, the fabric interconnects offer redundant, active-active connectivity and embedded management using Cisco UCS Manager.

MapR is a complete distribution for Apache Hadoop that packages more than a dozen projects from the Hadoop ecosystem to provide you with a broad set of big data capabilities. The MapR platform provides enterprise-class features such as high availability, disaster recovery, security, and full data protection. It also allows Hadoop

Page 2: Cisco UCS Integrated Infrastructure for Big Data with …€¢ Cisco UCS Integrated Infrastructure for Big Data solutions include computing, storage, ... Cisco UCS Solution for MapR

Cisco UCS Integrated Infrastructure for Big Data with MapR

2 © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.

to be easily accessed as traditional network attached storage (NAS) with read-write capabilities and multi-tenancy.

The MapR Distribution offers multi-tenancy from the start. It provides powerful features to logically partition a physical cluster to provide separate administrative control, data placement, job processing, user quotas, and network access. Volumes—a unique feature in MapR—are the foundation of multi-tenancy. Volumes provide a way to organize data and apply different policies to different data sets, applications, and users and groups. A single cluster can have many volumes: up to hundreds of thousands.

Together, Cisco and MapR provide enterprises with transparent, simplified data as well as management integration with an enterprise application ecosystem. They transparently work together to provide a uniquely capable,

industry-leading architectural platform for Hadoop-based applications.

Cisco UCS Solution for MapR

The Cisco UCS solution for MapR is based on Cisco UCS Integrated Infrastructure for Big Data, a highly scalable architecture that includes computing, storage, connectivity, and unified management capabilities and is designed to meet a variety of scale-out application demands. It achieves this with transparent data integration and management integration capabilities built using the components described here, shown in Figure 1.

Cisco UCS 6200 Series Fabric InterconnectsFabric interconnects establish a single point of connectivity and management for the entire system. They provide high-bandwidth, low-

latency connectivity for servers, with integrated, unified management for all connected devices provided by Cisco UCS Manager. Deployed in redundant pairs, the interconnects offer the full active-active redundancy, performance, and exceptional scalability needed to support the large number of nodes that are typical in clusters serving big data applications. The manager enables rapid and consistent server configuration using service profiles, automating ongoing system maintenance activities such as firmware updates across the entire cluster as a single operation. It also offers advanced monitoring with options to raise alarms and send notifications about the health of the entire cluster.

Cisco UCS C240 M4 Rack ServerThe rack server supports a wide range of computing, I/O, and storage-capacity demands in a compact design. The server is based on the Intel® Xeon® E5 v3 Family Processors and supports

Figure 1 Cisco UCS Integrated Infrastructure for Big Data: A 64-Node Cluster

2 x Cisco UCS 6296 Fabric Interconnects

16 x Cisco UCS C240 M4 Servers

Page 3: Cisco UCS Integrated Infrastructure for Big Data with …€¢ Cisco UCS Integrated Infrastructure for Big Data solutions include computing, storage, ... Cisco UCS Solution for MapR

Cisco UCS Integrated Infrastructure for Big Data with MapR

3 © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.

12-Gbps SAS throughput, delivering significant performance and efficiency gains over the previous generation of servers. The server uses dual Intel Xeon processor E5-2600 v3 series CPUs and supports up to 768 GB of main memory (128 or 256 GB is typical for big data applications) and a range of disk drive and SSD options. Twenty-four small-form-factor (SFF) disk drives are supported in the performance-optimized option, and 12 large-form-factor (LFF) disk drives are supported in the capacity-optimized option, along with two 1 Gigabit Ethernet embedded LAN-on-motherboard (LOM) ports. The Cisco UCS Virtual Interface Card (VIC) 1227 is designed for the M4 generation of Cisco UCS C-Series Rack Servers. The VIC is optimized for high-bandwidth and low-latency cluster connectivity, with support for up to 256 virtual devices that are configured on demand through Cisco UCS Manager.

MapR Distribution Including Apache Hadoop: Complete Hadoop Platform

As one of the technology leaders in Hadoop, MapR provides an enterprise-class Hadoop solution that can be quickly developed and easily administered. With significant investment in critical technologies, MapR offers a comprehensive Hadoop platform fully optimized for performance and scalability. The MapR Distribution includes over 20 tested and validated Hadoop software modules on an advanced data platform, offering exceptional ease of use, reliability, and performance for Hadoop deployments (See Figure 2).

The benefits of the MapR’s distribution solution include: • Performance: Ultra-fast throughput

• Scalability: Up to a trillion files, with no restrictions on the number of nodes in a cluster

• Standards-based APIs and tools: Standard Hadoop APIs, including Open Database Connectivity (ODBC), Java Database Connectivity (JDBC), Lightweight Directory Access Protocol (LDAP), and Linux (Pluggable Authentication Module (PAM)

• MapR Direct Access Network File System (NFS): Random read-write high speed operations, real-time data flows, and transparent support for existing non-Java applications

• Manageability: Advanced management console, rolling upgrades, and support for Representational State Transfer (REST) API

• Integrated security: Kerberos and non-Kerberos options with wire-level encryption

Map

R C

ontr

ol S

yste

m(M

anag

emen

t)

MapR File System (MapR-FS) MapR-DB

MapR Data Platform

APACHE HADOOP AND OPERATIONS SUPPORT SYSTEM ECOSYSTEM

Batch

Tez

Spark

Cascading

Pig

MapReduce v1 & v2

ML and Graph

GraphX

MLLib

Mahout

SQL

Drill

Spark SQL

Impala

Hive

NoSQL and Search

Solr

HBase

Streaming

Storm

Spark Streaming

YARN

EXECUTIVE ENGINES

Data Integration and Access

Hue

HttpFS

Flume

Sqoop

Security

Hue

Workflow and Data

Governance

Oozie

Provisioning and

Coordination

Oozie

ZooKeeper

DATA GOVERNANCE AND OPERATIONS

Figure 2

Page 4: Cisco UCS Integrated Infrastructure for Big Data with …€¢ Cisco UCS Integrated Infrastructure for Big Data solutions include computing, storage, ... Cisco UCS Solution for MapR

Cisco UCS Integrated Infrastructure for Big Data with MapR

4 © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.

• Advanced multi-tenancy: Volumes, data placement control, job placement control and queues.

• Consistent snapshots: Full data protection with point-in-time recovery

• High availability: Ubiquitous high availability with no-NameNode architecture, YARN high availability, and NFS high availability

• Disaster recovery: Cross-site replication with mirroring

• MapR-DB: Integrated enterprise-class NoSQL database

Main Benefits of Multi-tenancy in MapR with UCS

Volumes (unique to MapR) form the foundation of multi-tenancy as offered by MapR.

In a typical deployment, the data for each user, group, application, or business unit is placed in a single volume so that it can be managed separately from the data of other users, groups, applications, and business units.

Other Hadoop distributions do not support volumes, so policies can be defined only at the file or directory level (too detailed) or at the cluster level (not detailed enough). As a workaround, organizations using other Hadoop distributions create separate physical clusters for each tenant, which add architectural complexity, and thus higher risk of errors and failure.

Multi-tenancy in MapR also has significant total cost of ownership (TCO) advantages. It allows organizations to use a single cluster for multiple use cases rather than having to maintain a large number of isolated clusters. This approach reduces overall administrative overhead. It also enables the higher efficiency of a common resource pool.

Here are some of the unique features of multi-tenancy in Cisco UCS Integrated Infrastructure for Big Data with MapR:

• Data placement control: MapR provides the ability to restrict a volume to a subset of a cluster’s nodes. This feature allows to isolate sensitive data and applications and to use heterogeneous hardware. For example, data placement control can be used to keep specific data on separate nodes with different configurations, or to keep Apache Spark data on nodes that have SSDs. It can also be used for more advanced storage tiering policies, such as to keep old data on nodes that have a higher storage capacity and less computing power (such as Cisco UCS C3160 servers), and hence a lower cost per terabyte (TB) of storage. In combination with the MapR warden pluggable services, data placement control also enables administrators to designate specific nodes for a given application or service, such as Spark, effectively creating a mini-cluster within the larger cluster to help guarantee SLAs and resource availability.

• Job placement control: MapR provides the ability to restrict a

specific job or jobs from a specific user or group to a subset of the nodes in the cluster. This feature enables administrators to help guarantee SLAs for specific applications and to create separation between different applications or business units. This feature also allows administrators to designate a small subset of the nodes for low-priority jobs or jobs that require access to external systems through the corporate firewall.

• Access control and security: MapR provides fine-grained, role-based access controls (RBAC) with access control expressions (ACEs) for tables, column families, and columns in MapR-DB; Unix permissions for files; and field-level access control via Apache Drill views.

• MapR also provides cryptographically secure wire-level authentication and encryption. Organizations that have a Kerberos infrastructure can use it for authentication. Organizations that do not have a Kerberos infrastructure can use an integrated and simpler scheme that provides the same security without the complexity associated with Kerberos deployment and management. This leverages Linux Pluggable Authentication Modules (PAM) to enable integration with any PAM-supported registry.

• Administration and reporting: MapR allows organizations to define and enforce storage, CPU, and memory quotas at the volume, user, and group levels. To help enable service

Page 5: Cisco UCS Integrated Infrastructure for Big Data with …€¢ Cisco UCS Integrated Infrastructure for Big Data solutions include computing, storage, ... Cisco UCS Solution for MapR

Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco Website at www.cisco.com/go/offices.

Cisco and the Cisco Logo are trademarks of Cisco Systems, Inc. and/or its affiliates in the U.S. and other countries. A listing of Cisco’s trademarks can be found at www.cisco.com/go/trademarks. Third party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership

relationship between Cisco and any other company. (1005R)

Americas Headquarters Cisco Systems, Inc. San Jose, CA

Asia Pacific Headquarters Cisco Systems (USA) Pte. Ltd. Singapore

Europe Headquarters Cisco Systems International BV Amsterdam, The Netherlands

Cisco UCS Integrated Infrastructure for Big Data with MapR

providers to provide accurate usage and billing information, MapR offers resource usage reports encompassing more than 60 different metrics. These metrics are available through the MCS browser-based user interface, and—for upstream integration—through the command-line interface (CLI) and the REST API.

Reference Architecture

The current version of the Cisco UCS Integrated Infrastructure for Big Data offers the configurations listed in Table 1. The configuration used depends on the computing and storage requirements of Hadoop.

For More Information

For more information about Cisco UCS big data solutions, please visit http://www.cisco.com/go/bigdata_design.

For more information about Cisco UCS Integrated Infrastructure for Big Data, please visit http://blogs.cisco.com/datacenter/cpav3/.

For more information about MapR, please visit www.MapR.com.

For more information about the Cisco® SmartPlay program, please visit http://www.cisco.com/go/smartplay.

Table 1: Cisco UCS Integrated Infrastructure for Big Data Configuration Details

Capacity Optimized

Connectivity:• 2 Cisco UCS 6296UP 96-Port

Fabric Interconnects

Scaling:• Up to 80 servers per domain• Up to 160 servers per domain with

Cisco Nexus 2232PP 10GE Fabric Extender

16 Cisco UCS C240 M4 Rack Servers (LFF), each with:• 2 Intel Xeon processor E5-2620

v3 CPUs• 128 GB of memory • Cisco 12-Gbps SAS modular RAID

controller with 2-GB FBWC• 12 x 4-TB 7200-rpm LFF SAS

drives (768 TB total)• 2 x 120-GB 6-Gbps 2.5-inch

Enterprise Value SATA SSDs for bootup

• Cisco UCS VIC 1227 (with 2 x 10 Gigabit Ethernet SFP+ ports)

Performance Optimized

Connectivity:• 2 Cisco UCS 6296UP 96 Port

Fabric Interconnects

Scaling:• Up to 80 servers per domain• Up to 160 servers per domain with

Cisco Nexus 2232PP 10GE Fabric Extender

16 Cisco UCS C240 M4 Rack Servers (SFF), each with:• 2 Intel Xeon processor E5-2680

v3 CPUs• 256 GB of memory • Cisco 12-Gbps SAS modular RAID

controller with 2-GB flash-based write cache (FBWC)

• 24 x 1.2-TB 10,000-rpm SFF SAS drives (460 TB total)

• 2 x 120-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for bootup

• Cisco UCS VIC 1227 (with 2 x 10 Gigabit Ethernet SFP+ ports)

For more information on the Cisco Validated Design (CVD) for the solution, please visit: http://www.cisco.com/c/dam/en/us/td/docs/unified_computing/ucs/UCS_CVDs/Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_MapR.pdf.

Scale to tens of thousands of servers with Cisco ACI