february 3, 2015 · hadoop / hdfs deployment architected differently than traditional hadoop...

Post on 22-Jun-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FEBRUARY 3, 2015

Arno Kolster Sr. Database Architect Advanced Technology Group

@2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Evolution of HPC Usage at PayPal

Image from Boris Müller’s “Visual Poetry 6” (http://www.esono.com/boris/projects/poetry06/visualpoetry06/)

About your speaker

2 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

25+ years in database architecture and operations.

Has been with Ebay Inc. for 12 years, with a focus on database and operations architecture.

Has spoken at a number of domestic and international Big Data and HPC conferences.

Career interest in solving real time, high volume analytics problems using HPC and new technology architectures.

Along with his colleague Ryan Quick, won IDC HPC Innovation Awards at SC ’12 and SC ‘14.

Why did we start leveraging HPC?

3 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

We had a number of large scale compute problems we thought we could solve with non-traditional enterprise solutions.

Many HPC installations had already solved these problems before. (Large data set analytics, heterogeneous compute architectures etc.)

We saw an eventual ‘merge’ of HPC and enterprise technologies and wanted to get in front of that trend.

HPC price points had come down enough for enterprise capex budgets.

PayPal HPC Timeline

4 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

2008 2009 2010 2011 2012 2013 2014 2015+

Real Time Fraud Detection Problems

5 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Detecting fraud in 'real time’ as millions of transactions are processed between disparate systems at volume is extremely difficult.

Ability to create and deploy new fraud models into event and transaction flows quickly and with minimal effort.

Provide environment for fraud modeling, analytics, visualization, M/R, dimensioning and further processing.

Finding suspicious patterns that we don’t even know exist in related data sets.

The Challenges

6 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

5 9’s availability, scalability and reliability in a 24x7x365 environment with servers requiring less power and committing to cleaner and greener commerce.

Maintain a graph of identities, transactions, bank accounts, credit cards, ip addresses etc. to support the models.

Keep operations simple. Small team of SAs and DBAs.

How to keep fraud models current and ensure integrity of incoming events and data.

Educate peers and higher ups of new technology and concepts so they ‘get it’. “HPC what?”

What kind of volume?

7 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

11 million+ PayPal logins / day.

500+ variables calculated per event for some models.

~4 Billion inserts / day.

14 million+ financial transactions / day.

~8 Billion selects / day.

Our Solution - Trinity

8 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Real time linking platform for identities from various source systems. Built a ‘financial’ social network.

Intelligent gateways, message routing & delivery to heterogeneous systems.

Inline stream analytics using CEP and ESP.

Highly distributed open source databases for OLTP storage of edges and nodes. Architected for scale up, out and HA.

Standardized operations – h/w and s/w deployment, monitoring, command & control processes, etc.

Downstream analytics environments for further processing.

Leveraged HPC architecture and hardware where it made sense.

Trinity – EFL Flow

@2014 PayPal Inc. All rights reserved. Confidential and proprietary. 9

AZURE DB

INDIGO DB

IDENTITY

SGW

POOL

INDIGO

SGW

POOL

AZURE

APP

POOL

INDIGO

APP

POOL

1

TIS

POOL

CERULEAN

SGW

POOL

CERULEAN

POOL

2

BES/RE

S

POOL

COBALT

POOL

(SFS)

TIS DB

M

E

S

S

A

G

E

B

U

S

PP

EFL

IDENTITY

SGW

POOL

REST SOAP RES ATE

C

E

P

C

E

P

3

4

7

5

6

SGI ALTIX 8200/8400 ICE CLUSTERS - 2008

10

156 sockets

1872 cores

7.5Tb RAM

Intel Xeon X5650

2.67Ghz

78 nodes

SGI ALTIX ICE 8200/8400 CLUSTER

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 11

Supports multiple deployment strategies in the same cluster

EFL Cluster Provisioning By Application

Cyber Monday 2014 – Trinity only

Messages Events Database

Sent Rcvd Sent Rcvd DB Op:

SELECT

DB Op:

INSERT

DB Op:

UPDATE

Rows

Read

Rows

Inserted

Rows

Updated

Bytes

Sent

Bytes

Rcvd

Per

Day 6B 10B 4B 39B 8B 12B 8TB 5.3TB

Per

Secon

d

160K 80K 1200 5000 69K 119K 47K 458K 95K 141K 98MB

65MB

Totals

/Sec 240K 6200 235K 694K 163MB

13

“BABAR” – Hadoop Cluster

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 14

Hadoop / HDFS deployment architected differently than traditional Hadoop installations.

Separation of storage from compute to allow independent expansion model.

SSDs for shuffle/sort and spinning disk (Lustre) for ingress / egress.

Used for offline analytics by where.com for geo-marketing data, spatial vectoring, array modeling, etc.

Also houses OLTP, OLAP and vector databases, R, MatLab, etc.

“BABAR” SGI ALTIX 8400 ICE CLUSTER

15

1152 sockets

2304 cores

14.2Tb RAM

Intel Xeon X5690 3.47Ghz

128 nodes

IS5500 QDR IB arrays

Shared storage on 6 IS5500 arrays, DDN

SFA10K & DDN SFA12K arrays

The problem with Big Data analytics…

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 16

Everyone seems to think there is a single solution to solve the problem. There isn’t.

“The Three Legged Stool”

OLTP

Analytics DB HDFS

Systems Intelligence

“MAINTAINING THE WELLNESS OF THE PAYPAL ECOSYSTEM”

17

OPERATIONAL ANALYTICS

The problems that kicked off Systems Intelligence:

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 18

No analytics being done on Terabytes of operational data being pushed from app servers.

No understanding of how this could even be analyzed by operations teams.

No interest from the business units, because it wasn’t business analytics.

No vision of possible future benefits or integrations to other operational areas.

In steps ATG, with a proven architecture and a vision to deploy HPC to solve the problem.

19

A FINELY TUNED ECOSYSTEM

Ecosystem: a system involving the interactions between a community of living organisms in a particular area and its nonliving environment

What are we trying to accomplish with operational Big Data?

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 20

Gather an holistic view of PayPal’s ecosystem. (i.e. interactions between physical sites, infrastructure, applications and customers). Think “Internet of Things” inside the data center.

Create a self-healing environment through the use of predictive analytics, event correlation and behavior and remediation rule sets.

Model the entire ecosystem’s capacity and capabilities for growth, performance and efficiency.

Leverage real time streaming analytics with dynamic models built offline to recognize patterns and take appropriate actions. Educate our peers and management about real time analytics augmented with ancillary datasets.

25Tb of data ingested every hour

What are we up against? Operations analytics in real time…

21 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Real time anomaly detection in correlated event streams using predictive analytical models based on historical data sets. Streams include application logs, server machine data, data center metrics and social media. What are we up against with real time? 3 Million events / sec from 1000s of sources in our data centers. “IoT in the data center”

20Mb / sec machine data

Increasing social media trends / customer interaction per day

50K metadata relationships

Our Solution – Systems Intelligence

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 22

Common ontology for concepts and relationships.

Purpose built systems driven by underlying technology.

RDMA, clustered file systems for reduced copy times.

Inline stream analytics using CEP, ESP and patented technologies.

Downstream analytics environments for model building and further processing.

3 In Line Processing

SHARED MEMORY EVENT WINDOW

APP APP APP

CEP CEP

APP

CEP

MACHINE DATA

APP LOGS

ENVIRON DATA

SOCIAL DATA

1 Source Events

4

GRAPH DB

OLTP DB

2 Message Bus

7

LINKED DATA MONITORING

ALERTING SELF-HEALING

8

5 Destination Data

Stores ANALYTICS DB

@2014 PayPal Inc. All rights reserved. Confidential and proprietary. 23

Systems Intelligence Flow

VISUALIZATION,

MACHINE

LEARNING “Data Scientist”

6 NEW MODELS

UV2000 Installation – Jan 2014

24

UV2000 Installation – Jan 2014

25

Systems Intelligence Cluster

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 26

THE UV2000

“The big brain”

24 sockets

96 cores

Intel Xeon E5

6Tb RAM

Shared storage on six SGI IS5500 arrays, DDN SFA10K and DDN SFA12K arrays

IS5500 QDR IB arrays

BABAR and….

Systems Intelligence Analytics

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 27

Keep events in memory so you have a ‘rolling window’ for CEP and ESP processing.

The event window is a function of memory size * events/sec * event size.

Shared memory data set can be acted upon by a number of different processes.

Data is streamed through predictive models generated from offline machine learning and deep analytics of historical data sets.

What can we tell from Systems Intelligence and IoT streams?

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 28

Application flows slowing down. (i.e. due to database performance degradation or new code push)

Aberrant server or server pool behavior.

Environmental issue in the data center. (i.e. temperature deviation, accidental operator error)

Bug in new codes shows up as a change in social media sentiment or increased customer service activity. Real time business metrics. (i.e. total payment volume)

Benefits derived from Systems Intelligence

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 29

Visibility into the health of the complete ecosystem all the time.

Self healing of potential issues through predictive models and remediation rule sets.

Modeling the entire PayPal system into a Linked Data paradigm for future use.

Less reliance on humans. Computers don’t need sleep.

Ecosystem has become too complex for humans to comprehend.

Cost benefit to business – running a more efficient system, up to its capabilities, not is capacity.

Our latest development for Systems Intelligence…

30

“Complex Event Processing as Digital Signals”

Real-Time Analogy

31

Everyone likes to go to concerts…

32 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

The Concert Experience

You’re at the concert listening to your favorite piece of music or song.

You’re really enjoying yourself, you’ve had a glass of wine, you feel in tune with the musicians…

Suddenly you hear a bad note.

But it’s a concert, the show goes on, you ignore it and you have another sip of wine.

33

The Concert Experience

Not a second later,

not a minute later,

not a day later…

You used predictive models to look for anomalies

in the event stream,

You analyzed data in real time.

But at the instant the event occurred.

But…what just

happened?

in REAL TIME .

34

Difficult stuff, right?

How do we create a solution that allows us to do this?

Cheaper

Faster Yes it is.

Greener

35

…meanwhile, in a completely unrelated meeting…

36

An idea evolved regarding the m800 cartridge...

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 37

… HPC in a SoC …

38

Difficult stuff, right?

Cheaper

Faster

Greener

Yes, but not impossible.

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 39

Familiar Systems Integration (ARM)

• Linux for general purpose work

• integrating with enterprise systems (databases, marshaling, command & control)

• short development learning curve (python, java, openCL, openMPI

Efficient, Real-Time Parallel Processing

Complex Event Processing as Digital Signals

• Implement signal analysis in hardware

• solve encoding, marshaling, atomicity

• apply both global shared memory and scale-out process best practices

• leverage cross-platform development to decrease ramp-up and testing time (openCL)

@2014 PayPal Inc. All rights reserved. Confidential and proprietary. 40

Parallel, True Real-Time Analytics

• Multiple filters/atomic event stream

• Multiple streams/filter

• Multiple filters/multiple streams

• Pattern recognition (outliers, clusters, frequency matrices, etc)

• Rich library of functions (notch/high pass/low pass filters, DFT/FFT, z-,bilinear- transform, etc.)

HPC and Enterprise Best Practices

Complex Event Processing as Digital Signals

• Multicore implementation

• Tiered shared memory and queuing

• High-speed, low-latency transports inter/intra SoC

• Support for common development libraries and standards (openCL, openMP/MPI)

• Efficient, low-power solutions

(~55W/cartridge (4 SoCs / cartridge)

• Extreme performance (11.2 GF/watt)

@2014 PayPal Inc. All rights reserved. Confidential and proprietary. 41

Challenges along the way (pt 1)…

42 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Misunderstanding of what HPC is. “I just bought a cloud… why do I need HPC now?”

Industry wants to follow trends of what other ‘valley’ companies are doing.

People are comfortable with what they know and resistant to change.

Had to apply e-commerce high availability and high volume standards to HPC.

Challenges along the way (pt 2)…

43 @2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Not able to realize there is no ‘box’, let alone think outside one.

No ability to apply new technology to existing problems in a different way.

Very few people can take a vision all the way to production deployment.

In terms of analytics – shortage of analysts with technical skills.

How we’re addressing the challenges…

44 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Educating our peers, managers and executives to the benefits and ROI of HPC as it pertains to the specific use cases we’ve identified.

Evangelize, socialize and showcase the work we’ve been doing.

Keep presenting PayPal HPC technology and use cases at conferences.

Formalize a roadshow and brown bag sessions about technical computing.

Try to host BOFs about HPC and industry to gain larger industry momentum.

Continue collaborations with ORNL, IBM Labs, HP and TI.

What we’re looking at in a post split world….

45 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

HPDA – High Performance Data Analytics

More ‘Atoms’ in our ‘Atoms to Galaxies’ architecture toolbox.

Continue with socially responsible computing – not only through business initiatives, but bringing lower energy computing into the data centers.

Thank you.

Image from Boris Müller’s “Visual Poetry 6” (http://www.esono.com/boris/projects/poetry06/visualpoetry06/)

akolster@paypal.com

top related