creating a next-generation big data architecture

Big Data Architectural Series:Creating a Next-Generation Big Data Architecture

facebook.com/perficient twitter.com/Perficientlinkedin.com/company/perficient

2

Perficient is a leading information technology consulting firm serving clients throughout

North America.

We help clients implement business-driven technology solutions that integrate business

processes, improve worker productivity, increase customer loyalty and create a more agile

enterprise to better respond to new business opportunities.

About Perficient

3

• Founded in 1997

• Public, NASDAQ: PRFT

• 2013 revenue $373 million

• Major market locations:

• Allentown, Atlanta, Boston, Charlotte, Chicago, Cincinnati,

Columbus, Dallas, Denver, Detroit, Fairfax, Houston,

Indianapolis, Lafayette, Minneapolis, New York City,

Northern California, Oxford (UK), Philadelphia, Southern

California, St. Louis, Toronto, Washington, D.C.

• Global delivery centers in China and India

• >2,200 colleagues

• Dedicated solution practices

• ~90% repeat business rate

• Alliance partnerships with major technology vendors

• Multiple vendor/industry technology and growth awards

Perficient Profile

BUSINESS SOLUTIONS

Business Intelligence

Business Process Management

Customer Experience and CRM

Enterprise Performance Management

Enterprise Resource Planning

Experience Design (XD)

Management Consulting

TECHNOLOGY SOLUTIONS

Business Integration/SOA

Cloud Services

Commerce

Content Management

Custom Application Development

Education

Information Management

Mobile Platforms

Platform Integration

Portal & Social

Our Solutions Expertise

Our Speaker

Bill Busch

Sr. Solutions Architect, Enterprise Information Solutions, Perficient

• Leads Perficient's enterprise data practice

• Specializes in business-enabling BI solutions that enable the agile enterprise

• Responsible for executive data strategy, roadmap development, and the delivery of high-impact solutions that enable organizations to leverage enterprise data

• Bill has over 15 years of experience in executive leadership, business intelligence, data warehousing, data governance, master data management, information/data architecture and analytics

Perficient’s Big Data Architectural Series

Business

Case

Next

Generation

Architecture

Future Topics

• Data Integration

• Stream

Processing

• NoSQL

• SQL on Hadoop

• Data Quality

• Governance

• Use Cases &

Case Studies

Today’s

Webinar

Today’s Objectives

5 Architectural

Roles For Hadoop

Hadoop

Ecosystem

Potential

vs. Reality

Realizing A

Hadoop

Centric

Architecture

“Big Data is high-volume, high-velocity and high-variety information assets that demand cost-effective,

innovative forms of information processing for enhanced insight and decision making.”

Convergence of structured, unstructured,and dark data

Big Data is the evolution of data creating similar data management issues that IT has struggled to address

for the last 20+ years.

Three Views of Big Data

“Big Data is high-volume, high-velocity and high-variety information assets that demand cost-

effective, innovative forms of information processing for enhanced insight and decision

making.”

Convergence of structured, unstructured, and dark data

Big Data is the evolution of data creating similar data management issues that IT has struggled to

address for the last 20+ years.

Three Views of Big Data

Common Big Data Business Use Cases

Improve Strategic

Decision Making

Customer

Experience

Analysis

Operational

Optimization

Risk and Fraud

Reduction

Data Monetization

Security Event

Detection and

Analysis

IT Cost

Management

Expanding Data Ecosystem

• Customer

Intelligence

• Operations

• Risk& Fraud

• Data

Monetization

• Strategic

Development

• Security

Intelligence

• IT Optimization

Structured Data

(5-20% of Total)

Point-of-Sale

Text Messages

Contracts &

Regulatory

Preferences &

Emotions

Security AccessWeather

Machine Data

Automobile

Mobile

Communications

Geospatial

Social

Data

Ecosystem

Enterprise Data ArchitectureNext Generation

The PromiseData Architecture Simplification

Data IntegrationData HubAnalytics

Stream ProcessingData Warehouse Operational Data

Hadoop Cluster

The RealityMaturity Limits the Use Cases

• Realize the potential of Hadoop

• Multi-tenancy is in its infancy

• Hadoop 2.0 and YARN

• Most third-party applications are just

moving to YARN

• Hive (and other SQL on Hadoop

solutions) maturing

• Robust enterprise functionality is

evolving

• Security

• High Availability

Different Types of “Open Source Hadoop”

Apache

Projects

Only

Proprietary

Value Add & Re-

Development

Apache

Projects +

Proprietary

Add-ons

Packaged and

Online Solutions

• IBM Big Insights

• Oracle Big Data

Appliance

• HDInsight

• Many others!

Choosing A Hadoop Distribution

Company Philosophy

Current Relationships

Acceptable Risk

Specialized Functionality

Quick Primer on YARN

What is Yarn?

• Yet Another Resource Manager

• Sometimes referred as

MapReduce 2.0

• Data operating system

• Fault-Tolerance

Why is this important?

• Enables multi-tendency on

Hadoop

• Moves processing to the data*Image Provided by HortonWorks


5 Architectural

Roles For Hadoop

Hadoop

Ecosystem

Potential

vs. Reality

Realizing A

Hadoop

Centric

Architecture

Hadoop

Analytics

Data Warehouse

Stream Processing

Data Factory

Transactional Data Store

Five Common Architectural RolesHadoop Big Data Use Cases

Enterprise Data ArchitectureNext Generation