interactive data analytics with couchbase n1ql: couchbase connect 2015
TRANSCRIPT
UNLEASH THE POWER OF COUCHBASE THROUGH
N1QL (NICKEL)
ARVIND JADE ([email protected])GOVINDARAJAN RAGHUNATHAPURAM
AGENDA About Nielsen
Answers on Demand, Big Data Platform
Business Challenges
Why NoSQL with Couchbase?
Couchbase Usage Models
Next Steps
Q & A
ABOUT NIELSEN
Nielsen is a leading global information and measurement company that enables companies to understand consumers and consumer behavior
Nielsen measures and monitors what consumers watch (programming, advertising) and what consumers buy (categories, brands, products) on a global and local basis
Nielsen has a presence in approximately 100 countries spread across Africa, Asia, Australia, Europe, Middle East, North America, South America and Russia
REPORTING
Avg 175,000 Scan reports per month (>25% in cross cat)
Avg 7,000 Panel reports per month
MONTHLY USAGE
Avg 5,500 unique users AND
11,000 named users on the system
DATABASES
Scan• Core – 159• Custom - 141
Panel • Core – 34• Custom - 65
PROCESSING
175B records processed per week on scan, 10X that amount on the monthly
430MM purchase txns across 100000 Nielsen panelists per week
DATA VOLUMES
1.5PB of scan data - 130K stores, 4.3M UPCs, - 5 years
SLAS
Guaranteed system availability of 14 hours per day M-FWeekly refreshed scan data updates avail +9 6AM ETWeekly refreshed panel data updates avail +16 6AM ET
METRICS
AOD PLATFORM
DisaggregatedData Warehouse
USER EXPERIENCEONDEMAND ENGINESOURCE DATA
Flexible Adapters to accommodate multiple input source files.
Disaggregated data warehouse supports ultimate flexibility. Messaging architecture supports seamless orchestration.
Powerful BI and Rendering engine to provide fast, rich insights.
Trips
T-log
POS
ProdRef
Stores
Households
Loyalty Cards
ET
L P
OW
ER
ED
BY
IB
M
MESSAGING AND ORCHESTRATION POWERED BY TIBCO
FAC
T
PR
OC
ES
SIN
G
DIM
EN
SIO
NB
UIL
DS
PROCESSING POWERED BY IBM’S NETEZZABI ENGINE
POWERED BY JAVA
RENDERING ENGINE BY
EXTJS
PanelODS
ScanODS
LoyaltyODS
Fact
Dim
Dim
Dim Dim
Dim
On-the-fly Virtual
Aggregations
Answers
WHERE WE WERE?
BIG Problemso Growth - Expensive scalingo Relational Fatigueo Fragmented Cachingo No Unified Analyticso Ad Hoc querying capability
Application
Log
Access Log
Netezza
Rep
ortin
g
Oracle Report
Data
SQL Server
Audit Data
SQL Server
Audit Data
Report Cache
UnifiedBig DataPlatform
Caching Layer
Netezza Reporting
Application Log
Oracle ReportMetadata
SQL Server Audit Data
Access Log
OUR WANTS & NEEDS
Better Scalability – Be elastic to accommodate new data growth with ease.
Faster Performance
Cheaper – Can we get utopia for cheap?
Insights – Ability to run analytics
Faster feature release.
WHY NOSQL
Schema-less – Schema updates and cost of change were very high
Horizontal scaling – Sharding and replication with no single point of failure
Deep Analytics – Incremental map/reduce, aggregated searching
Cost – Commodity hardware
High Performance – Low latency and high throughput
COUCHBASE JOURNEY
Couchbase 2.0 Mobilization, Prototyping
Couchbase 2.0, 4 node cluster live in 1 data center,
for document and cache
storage
Upgraded Couchbase 2.5.1, 16 node clusters, in 2 data centers, advanced views, Unified Analytics w/i ElasticSearch
Upgrade to Couchbase 4.0, adopt Nickel for ad hoc querying, Couchbase Lite
for mobile prototyping
2012 2013 2014 2015
COUCHBASE USAGE MODEL
As Reverse index store using map/reduce for faster look up
For Unified analytics combining Indexes from Couchbase and Elastic
Needed a solution that keeps client reports agnostic of back end changes by updating reports of magnitude
(Millions)
Provide holistic insight into report metrics and system health
As Document and Cache persistence store
Real time application uses Couchbase for responsive UI
For Ad Hoc Querying – Instant Analytics (NEW)
Ability to query key spaces and indexes using SQL-like interfaces
N1QL – SQL-Like Query Language
26
ReportingData
Report Audit Data
Application Log Data
Custom Connector
Map/Reduce Log StashJDBC Connector
Metrics
1.4 B Reporting Data Points3 TB of Index Size20 M Audit Records5 TB of Application Log
UNIFIED ANALYTICS
N1QL BUSINESS USE CASES
30
Derive metrics – Get stats on user selections, usage patterns
Quantify Impacts - During a data refresh, identify the set of impacted reports to predict cost of change and impact
Identify and update JSON documents - Operational Need
BUSINESS GAINS
32
Faster Performance - Over all processing time of fixing client reports is now reduced to 1/5th
Smart search - With creation of reverse index, able to perform targeted search and convert only affected documents.
Real time insights – Combining Couchbase and Elasticsearch, able to derive instant analytics, near real time.
Scalable - Able to onboard new clients rapidly
Adhoc Querying – Able to empower Analysts to run adhoc analytics.
33
HEADING TOWARDS
Upgrade cluster to Couchbase 4.0 to leverage Multi Dimensional Scaling
Prototype Couchbase Lite for mobile certification for AOD Application
Leverage the power of N1QL for Instant Analytics