parstream - big data for business users

27
Real-time in Big Data

Upload: parstream-inc

Post on 28-Nov-2014

592 views

Category:

Business


6 download

DESCRIPTION

CTO of ParStream Joerg Bienert hold a presentation on February 25, 2014 about Big Data for Business Users. He talked about several use cases of current ParStream customers and ParStreams' technology itself.

TRANSCRIPT

Page 1: ParStream - Big Data for Business Users

Real-time

in

Big Data

Page 2: ParStream - Big Data for Business Users

Big Data“Every two days now we create as much information as we did from the dawn of civilization up until 2003.”

Eric Schmidt, Ex Google CEO

“85% of respondents say the issue is not about volume but the ability to analyze and act on data in real time”

Cap Gemini Study on Big Data 2012

Real Time

Karl Keirstead, BMO Capital Markets 2013

“It’s About Fast (not just Big) Data”

Fast Data

Page 3: ParStream - Big Data for Business Users

Real-time on Big Data becomes essential for survival of businesses

Realtime

Campaign steering

Trading risk analytics

Network monitoring

Algorithmic decisions

Algo trading

Programmatic ad-serving

Recommendation engine

Fraud prevention Interactive AnalyticsA/B-Testing App analytics

M2M

Big DataNetwork Data Point of Sale Data

Shopping Cart

Web Logs Sensors Twitter Stock Data Locations

Financial TXLogicstics Car Data

Page 4: ParStream - Big Data for Business Users

Real Time ?

Page 5: ParStream - Big Data for Business Users

Immediate Answers

Page 6: ParStream - Big Data for Business Users

Immediate Availability

Page 7: ParStream - Big Data for Business Users

● Smart Grids

Real

-Tim

eLa

g Ti

me

Batch Import Continuous Import

Daily Hourly Every second

Online Investigation

Automatic response systemsInteractive Analytics

Post-mortem Analytics

Weekly Every minute

● Fraud detection

● Ad-Serving

● Guided Shopping

● Campaign Control

● Web-Analytics

● Re-Targeting● Offer-Caches

● Trend-Spotting

● Customer churn rate reduction

● Revenue assurance

● Recommendation / promotional items

● Application monitoring

Trading analytics ●

● Prepaid-accounts

Customer account analytics ●

● Investment risk analytics

● Geo-spatial analytics

● Geo-Steering

< 1..10 milli sec

10 sec

10 min

10..100 milli sec

1 sec

1 min

Response time

1h

● SEO analytics

Immediate Answers & Availability

Availability

Ans

wer

s

Page 8: ParStream - Big Data for Business Users

8

USE CASES IN ALL INDUSTRIES

Confidential

eCommerceServices

FacettedSearch

Web analytics

SEO-analytics

Online-Advertising

Ad serving Profiling Targeting

Social Networks

Trend analysis

Fraud detection

Automatictrading

Risk analysis

Finance

Customerattritionprevention

Network monitoring

Targeting Prepaid

account mgmt

Telco

Smart metering

Smart grids Wind parks Mining Solar Panels

EnergyOil and Gas

Many More

Production Mining M2M Sensors Genetics Intelligence WeatherM

any

Ap

pli

cati

on

s

All Industries

Page 9: ParStream - Big Data for Business Users

9

Real-time Requires New Technology

RealtimeBig DataEngine

ContinuousData Import

Any Bus

Any File

Any StreamReal-Time

Monitoring

InteractiveAnalytics

Real-TimeDashboarding

Ultra-fastQuerying

Immediate Availability

1 ImmediateAnswers

3 InteractiveAnalytics

4

Geo-DistributedProcessing

5

Billion Records

2

LowTCO

6

Page 10: ParStream - Big Data for Business Users

etracker is a leading web-analytics and campaign steering company in Europe

Web-Analytics

Real-time web-analytics for 50,000 domains delivering 10 billion web-clicks

Continuous data import with maximum latency of 30 seconds

Complex interactive analytics for life-segmentation of customer groups

< 2 sec query response time for > 100 concurrent interactive user

Campaign steering – moving ahead from trail and error to continuous multidimensional optimization

Page 11: ParStream - Big Data for Business Users

ParStream imports 500,000 sensor readings per sec delivering real-time monitoring and long-term analytics

Gasturbines

5,000 sensors are delivering 1,800,000,000 measurements per hour

ParStream immediately imports and stores all sensor readings

Real-time monitoring with ParStream ensures early issue identification

Long-term analytics for predictive maintenance reduces downtime

Maintenance of gas turbines is a more lucrative business than the initial build

Page 12: ParStream - Big Data for Business Users

ParStream extends usage of QlikView installation from 400M to 6B records for interactive analytics

FMCG Retailer

Customer is the leading retail chain in Austria, a long term QlikView customer

POS-data analytics is heavily used for price negotiations with vendors

QlikView is easy to use and ultra fast but limits data volume to 400M records

Limited volume, time range and granularity of data hinders negotiations

ParStream extends usage of QlikView from 2 weeks to 6 month of data

Further extension to 30 billion records planned to cover 2.5 years of data

Page 13: ParStream - Big Data for Business Users

End-to-end network monitoring on packet-level detail unveils bottle-necks unseen for decades

Telecom

Continuous import with >1 million rows per second per node

Package level granularity delivers previously impossible insights

Field trail discovered bottle-neck nobody expected, billion dollar investment saved

Decentralized architecture capturing, storing and analyzing data at source

Massive reduction in network traffic due to decentralized storage

Solution is blue-print forInternet-of-Things use-cases

Decentralizedstorage & analytics

NDC NDC NDC

M2M Analytics

Network

Analytics

CRM/CEM Analytics

NPI Analytics

Analytics

Local Local Local Local Local

NDCNDC

Ad-hoc integration

Logical data warehouse

NoSQLFederation Server

Cache

Page 14: ParStream - Big Data for Business Users

v

• Keyword-Analysis of competitor domains

• Complex SQL Queries in Realtime

• 7 Tbyte mport

• 10 billion records

• < 1 sec Response time

• Reduction from 150 to 4 Servers

Google Search

Application Server

Complex correlativeSQL queries of many concurrent users

10,000,000,000 domain keyword relations

<1 sec response timeFirst 100domainsfor 10 millionkeywords in10 countries

Interactive domain traffic competitor report & analysis

SEO Analytics at Searchmetrics

Page 15: ParStream - Big Data for Business Users

INRA MetaGenoPolis (MGP) analyzes 17 billion records interactively – growing 100x per year

Bio-Technology

INRA is the world leader in meta-genomic research

Up to 50 million different bacteria are identified per stool sample

Sample size will grow by 100x over next 12 month

Data volume will grow from 17 billion to 2 trillion records

Researchers analyze correlation of bacteria presence with illnesses

ParStream is used to interactively discover and analyze correlations

Page 16: ParStream - Big Data for Business Users

Detection of Hurricane Risk Areas

Science: Climate Research

• Interactive Analytics of weather simulation data

• Response time 0.1 secon 3 billion data records

• Multi-dimensional queryingon geo-location data

• Run complex queries In-Databaseat very high speed

• No need for Cubes – up-to-date & full granularity

• Continuously import new data with low-latency

Page 17: ParStream - Big Data for Business Users

Coface Services is the Innovation Leaderin reliable Business Information

Facetted Search

Interactive guided selection process delivers better conversion rate

Multi-lingual text search and numeric-multiple-choice filters

15 billion data points

1,000 Coface columns+10,000 Customer columns

>100 concurrent users

< 100 ms response time

Page 18: ParStream - Big Data for Business Users

18

Real-time Requires New Technology

RealtimeBig DataEngine

ContinuousData Import

Any Bus

Any File

Any StreamReal-Time

Monitoring

InteractiveAnalytics

Real-TimeDashboarding

Ultra-fastQuerying

Immediate Availability

1 ImmediateAnswers

3 InteractiveAnalytics

4

Geo-DistributedProcessing

5

Billion Records

2

LowTCO

6

Page 19: ParStream - Big Data for Business Users

Needs vs. Reality

You want… What you get…

Sub-Second querieshigh speed import

Too Slow(Hadoop, Map Reduce)

Fully flexiblefully granular

Inflexible(Cassandra, KVS)

Scales on big data and big streams

Does not scale(traditional DBMS)

Page 20: ParStream - Big Data for Business Users

Billions of Records

Ultra-fastQuerying

ContinousImport

ThousandsOf Columns

ParStream Is Build For Fast Data

High QueryThroughput

ParStream is thefastest real-time database

for smart data

Unique Combination of continuous high speed import and

ultra-fast query response times

Page 21: ParStream - Big Data for Business Users

v

Map-Reduce RDBMS

Front-End

Raw-Data

Application Tool

Real-Time Analytics Engine

High Speed Loader with Low Latency

C++UDF - APISQL API / JDBC / ODBC

In-Memory andDisk Technology

Massively ParallelProcessing (MPP)

Multi-DimensionalPartitioning

Shared NothingArchitecture

3rd generation Columnar Storage

High PerformanceCompressed Index

(HPCI)

Patented high performancecompressed index - USP!

Build from scratch in C++

100 % own patented IP

Leading edge DB architecture

Massively parallel shared nothing cluster architecture

Optimized for standard hardware and many Linux distributions

Runs on single server, clusterand all clouds

Outstanding Technology with USP – high performance compressed index

Page 22: ParStream - Big Data for Business Users

Massive Performance Gain On Analytical Operations – Major Technological Innovation and Differentiation

High Performance Compressed Index (HPCI)

Superior ParStream index architecture

– High Memory Requirements

– High Load on CPUs

– Latency due to Decompression

– Not Suitable for Big Data

+ Immediate Query Processing

+ No Need for Decompression

+ Massively reduced memory + IO load

+ Ultra-high Throughput

Standard index architecture

Page 23: ParStream - Big Data for Business Users

Highly Scalable

Embedded

Systems

SingleServer

Cluster Cloud

Standard Hardware + Standard Linux

Page 24: ParStream - Big Data for Business Users

Real-time Query Performance

1 2 3 40

1000

2000

3000

4000

5000

6000

7000

8000

9000

Parstream

RedShift

Query # QUERY

1 select count(distinct AirlineID) as airlines, count(distinct FlightNum) from otp where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'

2 select count(distinct AirlineID) as airlines, count(distinct FlightNum), sum(Distance) from otp where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'

3 select count(distinct AirlineID) as airlines, count(distinct FlightNum), count(distinct Distance), sum(Distance) from otp where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'

4 select max(TaxiIn), sum(DepDelayMinutes), min(TaxiIn), avg(ArrDelayMinutes) from otp where YearD BETWEEN 1997 AND 2012 AND DestState='NY' AND Quarter=3 AND DayOfWeek=4 AND OriginState='FL'

Q # RS (mS) PS (mS) Factor

1 7797 264 29

2 8036 313 25

3 7949 381 20

4 7086 129 55

Environment: Single EC2 XL node with 15 GB RAM, 2 TB disk on Amazon AWS.OTP Data Set with about 150 Million records

Comparison with leading analytical databases are available on request

Query Response Time

Page 25: ParStream - Big Data for Business Users

ParStream – real-time demo

Try out the interactive ParStream demo on https://www.parstream.com/product/demos/

Page 26: ParStream - Big Data for Business Users

ParStream – The Company

• Founded 2008 in Cologne

• 50 employees in Cologne, Paris, Silicon Valley, Boston

• International Customers

• Running 24x7 in production for more than 3 years

• $ 15.6 M funding: Khosla Ventures (lead), Andy Bechtolsheim, Crunchfund, Data Collective, Baker Capital, Tola Capital, and others

Page 27: ParStream - Big Data for Business Users

Thank you

Yes,we are hiring

[email protected]