unlock the power of big data - info.talend.com _talendconnect… · the “data lake” value •...
TRANSCRIPT
Unlock the Power of Big Data:Data Lake Industrialization
Pierrick Condette
Crédit Agricole Consumer Finance
IT Marketing & BigData Lead France
Jean-François Guilmard
Accenture
Big Data Lead France
Crédit Agricole Consumer Finance
European Consumer Finance key actor
Part of Crédit Agricole S.A. group
French market leader
A Digital Transformation Plan: “CA CF 3.0”
Our Digital Transformation
CA CF IT Transformation Principles
CA CF Datalake illustration
A Journey: deploy a BigData platform
A Major Ambition: becoming a Data Centric company
• INSEE
• etc.
• Eulerian
• Dynatrace
• 1000mercis
The “Data Lake” Value• Centralize and value company data to answer Business needs & expectations from all CA CF Directions
• Ease and accelerate projects delivery in Operational, Decisional and Data Science domains
• Separate platform (no data exchanges)
• Integration & ML industrializationplanned in 2018
For Streaming:
POC in progress, first usage in 2018
The “Data Lake” Program
Infrastructure
Industrialization
Business
Projects
More than 20 projects identified :
8 LIVE (e.g.: website KPIs monitoring, 360° customer, regulatory calculation, DMP data integration, reseller support)
> 10 in Progress (client segmentation, real-time reporting, social media loans subscription, data science lab, BI decommissioning)
Mixed SCRUM Team (Architects, Data Engineers, Ops)
Planning & Definition Infrastructure
Industrialization
Business Projects
Winter 2016 Summer 2016 Fall 2016 Winter 2017 Fall 2017
• Paris = 4 persons
• Nantes (France) = 4-6 persons
• Mauritius = 6-8 persons
Summer 2017
Major Go-Live
Building a common assets base: policies, procedures, referential,
standard reusable components, normalized data, etc.
Provided by group
specialized entityMapR Converged Data Platform HPE Vertica Elasticsearch
Why TALEND ?
HISTORY• Already used as an ETL for operational data transformations (Mainframe, etc.)
• Internal positive feedbacks, existing guidelines and qualified administrators
BENEFITS✓ Ease internal team transition from BI to Big Data
✓ Simplify, standardize then accelerate the creation of distributed processing (components logic)
✓ Enable industrialization and consistency: common workspace for all projects and developers where policies and best practices are applied(e.g. naming rules, pre- & post- jobs, joblet reuse, etc.)
✓ Ease testing and deployment on environments thanks to Talend Command Centre (TAC).
✓ Foster client ownership and jobs maintenance thanks to jobs readability
TALEND Spark Job Examples (1/2)
1. Hive table reading to get “Contracts”
over 24-month (filter on date)
2. Partitioning the 900 millions Contract
rows by Client
3. Client aggregation & KPI calculation
4. Results writing into MapR-DB
MapR table truncate performed within
shell script before launching job
SPARK Tuning (config file) 1 2 3 4
Client Segmentation (Monthly)
• 13 similar jobs for pre-KPIs calculation + 3 final jobs to generate the two ‘X’ & ‘Y’ Axis before Segmentation (‘X’ & ‘Y’ crossing)
TALEND Spark Job Examples (2/2)
1. MapR-DB table reading to get JSON
files (filter on rowkey Hive not required)
2. JSON data extraction (based on a
Joblet – same for Full & Delta)
3. Data transformation & writing into
Vertica (JDBC)
4. Rejection management
5. If OK, rowkey deletion from MapR-DB
source table (Java custom code)
1 2 3 4
IDD3.0: Portefeuille (every 5min)
• Delta & Full Jobs to structure raw information (JSON file) for a business application (should evolve to streaming logic)
1 2 3
4
5
Lessons Learned : 4 Success Factors
Invest since the beginning on
technical expertise
• Architecture decisions based on prototypes
• Support complex developments
• Perform performance analysis and processing
jobs tuning and optimization
Build a common assets base
• Ensure consistency between projects (technical,
process, data)
• Reduce time and costs delivery
• Ease maintenance and supervision
Organize a multidisciplinary
core team to promote co-working
and knowledge sharing
• Build/maintain internal resources skills on the
complex and evolving BigData ecosystem
• Ease integration process thanks to direct access
to internal resources (key POCs, legacy docs, etc.)
Prioritize business projects
to grow at the right pace
• MVP approach led by PO / Scrum Master couple
• Quick value to selected business projects
• Maintain positive dynamic based on proven
business results
Be Eligible to Win Prizes at the End of the Show!