creating a next-generation big data architecture

45
Big Data Architectural Series: Creating a Next-Generation Big Data Architecture facebook.com/perficient twitter.com/Perficient linkedin.com/company/perficient

Upload: perficient-inc

Post on 05-Jul-2015

1.046 views

Category:

Technology


0 download

DESCRIPTION

If you’ve spent time investigating Big Data, you quickly realize that the issues surrounding Big Data are often complex to analyze and solve. The sheer volume, velocity and variety changes the way we think about data – including how enterprises approach data architecture. Significant reduction in costs for processing, managing, and storing data, combined with the need for business agility and analytics, requires CIOs and enterprise architects to rethink their enterprise data architecture and develop a next-generation approach to solve the complexities of Big Data. Creating the data architecture while integrating Big Data into the heart of the enterprise data architecture is a challenge. This webinar covered: -Why Big Data capabilities must be strategically integrated into an enterprise’s data architecture -How a next-generation architecture can be conceptualized -The key components to a robust next generation architecture -How to incrementally transition to a next generation data architecture

TRANSCRIPT

Page 1: Creating a Next-Generation Big Data Architecture

Big Data Architectural Series:Creating a Next-Generation Big Data Architecture

facebook.com/perficient twitter.com/Perficientlinkedin.com/company/perficient

Page 2: Creating a Next-Generation Big Data Architecture

2

Perficient is a leading information technology consulting firm serving clients throughout

North America.

We help clients implement business-driven technology solutions that integrate business

processes, improve worker productivity, increase customer loyalty and create a more agile

enterprise to better respond to new business opportunities.

About Perficient

Page 3: Creating a Next-Generation Big Data Architecture

3

• Founded in 1997

• Public, NASDAQ: PRFT

• 2013 revenue $373 million

• Major market locations:

• Allentown, Atlanta, Boston, Charlotte, Chicago, Cincinnati,

Columbus, Dallas, Denver, Detroit, Fairfax, Houston,

Indianapolis, Lafayette, Minneapolis, New York City,

Northern California, Oxford (UK), Philadelphia, Southern

California, St. Louis, Toronto, Washington, D.C.

• Global delivery centers in China and India

• >2,200 colleagues

• Dedicated solution practices

• ~90% repeat business rate

• Alliance partnerships with major technology vendors

• Multiple vendor/industry technology and growth awards

Perficient Profile

Page 4: Creating a Next-Generation Big Data Architecture

BUSINESS SOLUTIONS

Business Intelligence

Business Process Management

Customer Experience and CRM

Enterprise Performance Management

Enterprise Resource Planning

Experience Design (XD)

Management Consulting

TECHNOLOGY SOLUTIONS

Business Integration/SOA

Cloud Services

Commerce

Content Management

Custom Application Development

Education

Information Management

Mobile Platforms

Platform Integration

Portal & Social

Our Solutions Expertise

Page 5: Creating a Next-Generation Big Data Architecture

Our Speaker

Bill Busch

Sr. Solutions Architect, Enterprise Information Solutions, Perficient

• Leads Perficient's enterprise data practice

• Specializes in business-enabling BI solutions that enable the agile enterprise

• Responsible for executive data strategy, roadmap development, and the delivery of high-impact solutions that enable organizations to leverage enterprise data

• Bill has over 15 years of experience in executive leadership, business intelligence, data warehousing, data governance, master data management, information/data architecture and analytics

Page 6: Creating a Next-Generation Big Data Architecture

Perficient’s Big Data Architectural Series

Business

Case

Next

Generation

Architecture

Future Topics

• Data Integration

• Stream

Processing

• NoSQL

• SQL on Hadoop

• Data Quality

• Governance

• Use Cases &

Case Studies

Today’s

Webinar

Page 7: Creating a Next-Generation Big Data Architecture

Today’s Objectives

5 Architectural

Roles For Hadoop

Hadoop

Ecosystem

Potential

vs. Reality

Realizing A

Hadoop

Centric

Architecture

Page 8: Creating a Next-Generation Big Data Architecture

Today’s Objectives

5 Architectural

Roles For Hadoop

Hadoop

Ecosystem

Potential

vs. Reality

Realizing A

Hadoop

Centric

Architecture

Page 9: Creating a Next-Generation Big Data Architecture

“Big Data is high-volume, high-velocity and high-variety information assets that demand cost-effective,

innovative forms of information processing for enhanced insight and decision making.”

Convergence of structured, unstructured,and dark data

Big Data is the evolution of data creating similar data management issues that IT has struggled to address

for the last 20+ years.

Three Views of Big Data

Page 10: Creating a Next-Generation Big Data Architecture

“Big Data is high-volume, high-velocity and high-variety information assets that demand cost-

effective, innovative forms of information processing for enhanced insight and decision

making.”

Convergence of structured, unstructured, and dark data

Big Data is the evolution of data creating similar data management issues that IT has struggled to

address for the last 20+ years.

Three Views of Big Data

Page 11: Creating a Next-Generation Big Data Architecture

Common Big Data Business Use Cases

Improve Strategic

Decision Making

Customer

Experience

Analysis

Operational

Optimization

Risk and Fraud

Reduction

Data Monetization

Security Event

Detection and

Analysis

IT Cost

Management

Page 12: Creating a Next-Generation Big Data Architecture

Expanding Data Ecosystem

• Customer

Intelligence

• Operations

• Risk& Fraud

• Data

Monetization

• Strategic

Development

• Security

Intelligence

• IT Optimization

Structured Data

(5-20% of Total)

Point-of-Sale

Text Messages

Contracts &

Regulatory

Preferences &

Emotions

Security AccessWeather

Machine Data

Automobile

Mobile

Communications

Geospatial

Social

Data

Ecosystem

Page 13: Creating a Next-Generation Big Data Architecture

Enterprise Data ArchitectureNext Generation

Page 14: Creating a Next-Generation Big Data Architecture

The PromiseData Architecture Simplification

Data IntegrationData HubAnalytics

Stream ProcessingData Warehouse Operational Data

Hadoop Cluster

Page 15: Creating a Next-Generation Big Data Architecture

The RealityMaturity Limits the Use Cases

• Realize the potential of Hadoop

• Multi-tenancy is in its infancy

• Hadoop 2.0 and YARN

• Most third-party applications are just

moving to YARN

• Hive (and other SQL on Hadoop

solutions) maturing

• Robust enterprise functionality is

evolving

• Security

• High Availability

Page 16: Creating a Next-Generation Big Data Architecture

Different Types of “Open Source Hadoop”

Apache

Projects

Only

Proprietary

Value Add & Re-

Development

Apache

Projects +

Proprietary

Add-ons

Packaged and

Online Solutions

• IBM Big Insights

• Oracle Big Data

Appliance

• HDInsight

• Many others!

Choosing A Hadoop Distribution

Company Philosophy

Current Relationships

Acceptable Risk

Specialized Functionality

Page 17: Creating a Next-Generation Big Data Architecture

Quick Primer on YARN

What is Yarn?

• Yet Another Resource Manager

• Sometimes referred as

MapReduce 2.0

• Data operating system

• Fault-Tolerance

Why is this important?

• Enables multi-tendency on

Hadoop

• Moves processing to the data*Image Provided by HortonWorks

Page 18: Creating a Next-Generation Big Data Architecture

Today’s Objectives

5 Architectural

Roles For Hadoop

Hadoop

Ecosystem

Potential

vs. Reality

Realizing A

Hadoop

Centric

Architecture

Page 19: Creating a Next-Generation Big Data Architecture

Hadoop

Analytics

Data Warehouse

Stream Processing

Data Factory

Transactional Data Store

Five Common Architectural RolesHadoop Big Data Use Cases

Page 20: Creating a Next-Generation Big Data Architecture

Enterprise Data ArchitectureNext Generation

Page 21: Creating a Next-Generation Big Data Architecture

Hadoop

Analytics

Data Warehouse

Stream Processing

Data Factory

Transactional Data Store

Five Common Architectural RolesHadoop Big Data Use Cases

Page 22: Creating a Next-Generation Big Data Architecture

Analytical Processing

Source Wrangle Data Model & Tune Operationalize1 2 3 4

• Data Ingestion

• Metadata

Management

• Data Access

• Data Preparation

Tools

• Data Discovery

&Visualization

• Data Wrangling

Tools

• Business Glossary

& Search

• Data Access

• Data Discovery &

Visualization

• Analytical Tools

• Analytical

Sandbox

• Business Created

Reporting

• Model Execution &

Management

• Knowledge

Management

(Portal)

Analytical

Process

Architectural

Capabilities

Page 23: Creating a Next-Generation Big Data Architecture

Analytical Processing

Source Wrangle Data Model & Tune Operationalize1 2 3 4

• Data Ingestion

• Metadata

Management

• Data Access

• Data Preparation

Tools

• Data Discovery

&Visualization

• Data Wrangling

Tools

• Business Glossary

& Search

• Data Access

• Data Discovery &

Visualization

• Analytical Tools

• Analytical

Sandbox

• Business Created

Reporting

• Model Execution &

Management

• Knowledge

Management

(Portal)

Analytical

Process

Architectural

Capabilities

Page 24: Creating a Next-Generation Big Data Architecture

Data Access

• There are many methods

to accessing Big Data

• Direct HDFS

• NoSQL / Connector

• Hive/ SQL On Hadoop

• Align tool to access

methods and file types

• Data Preparation

• Analytics Source

Files/DataTidy Data

Data

Preparation

Tool

Analytics

Tool

Analytical

Result

Read Access

Write Access

Key

Hadoop Cluster

Page 25: Creating a Next-Generation Big Data Architecture

Hadoop

Analytics

Data Warehouse

Stream Processing

Data Factory

Transactional Data Store

Five Common Architectural RolesHadoop Big Data Use Cases

Page 26: Creating a Next-Generation Big Data Architecture

Data Warehouse Roles

• Two models for splitting processing

• Hot – Cold• Data Warehouse Layer

• Push high user loads to traditional data warehouses

• Fully investigate DW-Hadoop connector functionality

• Leverage opportunity to use in-memory database solutions

Data Warehouse Layer Approach

Hadoop Cluster Traditional DW/DM

Hot – Cold Data Warehouse

Cold Data

Hadoop Cluster Traditional DW/DM

Hot Data

Page 27: Creating a Next-Generation Big Data Architecture

Data WarehouseOrganize Your Data

• Types of data stored on

cluster

• Analytical sandboxes

• Team

• Individual

• Quotas

• Potential to replace

information lifecycle

management solutions

• No right answer – clearly

define usage

Consolidated

Data

Streaming

Queues

Delta’s

(Incremental)

Common Data (Dimensions, Master Data)

Improved / Modeled Data

Published, Analytical and Aggregates

Sandbox Zone

Raw Data Processed Data

Hadoop Cluster

Archived Data

Page 28: Creating a Next-Generation Big Data Architecture

Hadoop

Analytics

Data Warehouse

Stream Processing

Data Factory

Transactional Data Store

Five Common Architectural RolesHadoop Big Data Use Cases

Page 29: Creating a Next-Generation Big Data Architecture

Stream and Event Processing

• Dedicated vs. Shared Model

• Persistence of messages, logs, etc.

• Long-term storage

• Queuing

• Pre-load (HDFS) vs. Post-load

processing

• Micro-Batch vs. One-at-a-Time

• Programing language support

• Processing guarantee

• At most once

• At least once

• Exactly once

Let business requirements drive need for streaming solutions. It is acceptable to use more

than one solution as long as the roles / purposes of each are clearly defined.

Page 30: Creating a Next-Generation Big Data Architecture

Hadoop

Analytics

Data Warehouse

Stream Processing

Data Factory

Transactional Data Store

Five Common Architectural RolesHadoop Big Data Use Cases

Page 31: Creating a Next-Generation Big Data Architecture

The Data Integration Challenge

Key Point: Hadoop and Hadoop-related technologies can address these challenges.

However, they must be architected and governed properly

Volume, variety, and

velocity create unique

challenges for data

integration

10,000+ unique entities

(or file groups) may have

to be managed

Batch windows are still

the same or shrinking

The Challenge

Page 32: Creating a Next-Generation Big Data Architecture

Data Factory & Integration

Hadoop Distributed

Tools

Data Integration

Packages

Hybrid (Both Hadoop

and Data Integration

Package)

• Leverages tools included in

the Hadoop Distribution and

programing languages

• Scoop, Flume, Spark, Java,

MapReduce are examples

• Tools can be implemented in

many different modes

• Hand-coded/scripted

• Runtime Configured

• Generated

• Based on use case

leverages both Hadoop and

COTs tools to move and

transform data

• Leverage commercial data

integration packages to

move and transform data

• IBM Infosphere Big Insights,

Informatica are examples

• Key questions, where is

processing taking place and

does the tool use YARN

resource manger?

Approaches to Big Data Integration

Page 33: Creating a Next-Generation Big Data Architecture

Define Pipelines and Stages

Sqoop

Cloud

Sources

RDBMS

File

HubFTP

Packaged

Tool

Object

DBMSETL Tool

Log

DataFTP

Stream/

Message

Bus

Kafta

Sqoop

Storm

ExtractHDFS Load &

Formatting

Scraping&

Normalization

MCF

Storm

Cleansing ,

Aggregation

Transformation

Package

ETL Tool

Storm

Data Distribution Data Access &

Distribution

RDBMS/DW

/IMDB

Hive

Hbase

File

Extracts

NoSQL

Stream

Output

Custom

Sqoop

Custom

Custom

Message

Bus

ETL

Tool ETL Tool

Page 34: Creating a Next-Generation Big Data Architecture

Big Data Integration FrameworkTypical Services

Key Guidance:

• In lieu of using a ETL product, consider building a Big

Data Integration framework

• Apache Falcon provides pipeline management

• Focus is on making all components run-time

configurable with metadata

• Can offer significant cost savings over the long run

Load UtilityMetadata

Collection Metadata

Pipeline

Config

Files

Metadata

Config Files

Pipeline Utilities

Parser

(Delimiter)

Data

Standardization

HIVE

Publishing

MF Coding

Converters

File Joiner &

Transport

Logging

Checksum

Retention

Replication

Late Arriving

Data

Exception

Handling

Pipeline Master (ex. Falcon)

DB Copy

Archival

Audit

Sqoop Flume

HDFS Shell

Page 35: Creating a Next-Generation Big Data Architecture

Hadoop

Analytics

Data Warehouse

Stream Processing

Data Factory

Transactional Data Store

Five Common Architectural RolesHadoop Big Data Use Cases

Page 36: Creating a Next-Generation Big Data Architecture

SQL on Hadoop

• SQL on Hadoop is changing

• Historically focused on read

functionality for analytics

• New breed of SQL on Hadoop

• BI and operational

reporting

• Transaction Processing

*Image Provided by Splice Machine

Page 37: Creating a Next-Generation Big Data Architecture

Transactions In Hive

Page 38: Creating a Next-Generation Big Data Architecture

Today’s Objectives

5 Architectural

Roles For Hadoop

Hadoop

Ecosystem

Potential

vs. Reality

Realizing A

Hadoop

Centric

Architecture

Page 39: Creating a Next-Generation Big Data Architecture

Common Big Data Business Use Cases

Improve Strategic

Decision Making

Customer

Experience

Analysis

Operational

Optimization

Risk and Fraud

Reduction

Data Monetization

Security Event

Detection and

Analysis

IT Cost

Management

Page 40: Creating a Next-Generation Big Data Architecture

Architectural Scenarios

Architecture

Role

Business Use Case Analytics

Data

Warehouse

Stream

Processing Data Factory

Transactional

Data Store*

Strategic Decision

Making P s

Customer Experience P s P s

Operational

Optimization P s s s

Risk and Fraud

Reduction P s P

Data Monetization s s P

Security Event

Detection and Analysis P s s s

IT Cost Management P s P P

* Capability is just emerging within the Hadoop

ecosystem. Consider this use case for isolated

business cases and early adopters.P = Primary Use Case s = Secondary Use case

Page 41: Creating a Next-Generation Big Data Architecture

Integrating Hadoop into the Enterprise

Determine

Business Use

Cases

Understand

Current Tools

& Architecture

Align Business

Use Case

Priorities

Build

Roadmap

Specify

Solution

Architecture

Update &

Maintain

Roadmap

Implement

Roadmap

Page 42: Creating a Next-Generation Big Data Architecture

Final Thoughts

Do

• Match the business use case to the big data role

• Clearly define a roadmap

• Establish clear architectural standards to drive

• Consistency

• Re-use of resources

• Homework when defining a solution architecture

Don’t

• Select an initial use case that relies on immature

Hadoop functionality

• Leverage tools that move data off the cluster for

processing then storing the data back on the cluster

• Assume all Hadoop technologies integrate well together

Page 43: Creating a Next-Generation Big Data Architecture

As a reminder, please submit your

questions in the chat box.

We will get to as many as possible.

Page 44: Creating a Next-Generation Big Data Architecture

Daily unique content

about content

management, user

experience, portals

and other enterprise

information technology

solutions across a

variety of industries.

Perficient.com/SocialMedia

Facebook.com/Perficient

Twitter.com/Perficient

Page 45: Creating a Next-Generation Big Data Architecture

Thank you for your participation today.Please fill out the survey at the close of this session.