from data analytics to big data
TRANSCRIPT
from business analytics to
an overview from telecom data scientists
Ph.D. Ismail REBAI, Analytical CRM & BI Director
Ines TEACĂ, BI Project ManagerApril 23rd, 2013
2
agenda
general overview of Big Data
telecom industry
big data at Orange France. Portal use case.
the data scientist
3
big data overview
4
big data is everywhere
Cell phone, texts, digital, music and movies
Financial markets and services
Real time traffic
Security and automation
Biomedical research and personal health
Weather and climate
Air traffic
Data Analytics
People
Cloud Devices
Network Ecosystems
CDRs
!
5
big data definition
big
data
3V’s
or
collaboration opportunity?
6
with Big Data, business is back in control of data
Mainframe Standalone
PC
Connected
PC
Connected
PC
Mobile
20102000199019801970
Data
Ownership
IT
Business
7
big data market. explore opportunities
The Hadoop-MapReduce market is
forecast to grow at a compound annual
growth rate (CAGR) 58% reaching $2.2
billion in 2018.
2012 Big Data Market had a clear actors
ranking with IBM, HP and Teradata.
Verizon is the only telecom pioneer in
the Big Data innovation context as listed
by the media. Source: http://wikibon.org/wiki/
8
telecom industry
9
in telecom. data increased a thousand-fold in the past 20
years
Orange presentation
Figure 1: Data volume on telecoms networks, worldwide, 1986–
2013 [Source: Analysys Mason, 2013]
Figure 2: Data on telecoms networks by type,
worldwide, 1990 and 2010 [Source: Analysys
Mason, 2013]
97%
3%
1990
2%
98%
2010
Analogue
Digital
10
telecom data monetization. exploration phase
Orange presentation
Telefonica Dynamic Insight (TDI) was launched on 9th October 2012
Focus to become analytical insight provider for companies and public sector organizations.
TDI Target: R&D, venture capital, digital service development and global partnership.
Processing: separate divisions and business units for Telefonica innovation activities
Precision Market insight division was launched on 1st October 2012
Focused to monetize collected customer data
In addition to basic data VERIZON used also geographic location, apps downloaded and web sites accessed.
Smart Steps
Outdoor Media Measurement
Venue Audience Measurement
Retail site analytics
PRIVACY
11 Orange presentation
telecom Orange France
use case
12
orange portal. a success story based on big data
platform
an innovation nominee for Orange Innovation Awards 2013
a «fast adopter» using as foundation open source technologies.
DMGP Orange France
Orange Labs Product and Services
http://www.orange.fr/portail
13
the project setup
Orange presentation
building a Big Data platform to extract value from 50 million customers usage data and offer better personalization applying mass predictive models
1st Orange Big Data service success story
Project foundation based on
open source
Direct impact on
digital contents business
Anticipation of Data Science
& Scientist need!
• industrialization and
experimentation (treatment of
100% anonymized usage
customer logs)
• Khiops, Orange Labs Data
Mining environment connected
to the open source
environment
a platform to distribute high
volume data computations on a
set of PC (Hadoop)
a high availability data
service to deliver customer
data to all Orange web services
• Improved customer
experience and click rates
using better targeting
• Reuse of highly available
data service of the Portal to
fill and answer anonymous
profiles for decision making
14
the data scientist
15
a data science team transformation phases
Orange presentation
description : several teams
with different data skills are
dealing with their own data
projects
storage location : each team is
storing and dealing with its
own data
Phase 1
description : one team is
ahead of the other teams and
has developed more data
skills
storage location : each team is
storing and dealing with its
own data
Phase 2
description : team 3 becomes
the data service providers of
the group and provide
dedicated services when
required to all BU teams (as a
service center)
storage location : all data are
centralized and stored in the
same data storage
infrastructure
Phase 3
Team 1
Team 3
Team 2
Team 4
decentralized data
Team 1
Team 3
Team 2
Team 4
decentralized data
Team 1
Team 3 – service
provider
Team 2
Team 4
centralized data
Business Units: Business Units:Business Units:
16
the data scientist or data science team?
Datamining and data transformation
Mathematics, statistics, econometrics and BI
General techniques: platform administrator, database administrator
Specific techniques: ETL, CEP, open source solutions (Hadoop…)
Legal
Business (marketing/sales)
Communication
17
from data to big data
18
(big) data analytics. are we there yet?
http://www.youtube.com/watch?v=LrNlZ7-SMPk
movie
19
big data in Romania. today
via Google Trends as seen on 18 April
20 Orange presentation
the data scientist
thanks
More questions?
Ismail.Rebai at orange.com
Ines.Teacă at orange.com
22 Orange presentation
appendix
23
big data routine for searching a data scientist
C l a s s D a t a S c i e n t i s t {
I s s k e p t i c a l , c u r i o u s . H a s i n q u i s i t i v e m i n d
K n o w s M a c h i n e L e a r n i n g , S t a t i s t i c s , P r o b a b i l i t y A p p l i
e s S c i e n t i fi c M e t h o d .
R u n s E x p e r i m e n t s
I s g o o d a t C o d i n g & H a c k i n g
A b l e t o d e a l w i t h I T D a t a E n g i n e e r i n g
K n o w s h o w t o b u i l d d a t a p r o d u c t s
A b l e t o fi n d a n s w e r s t o k n o w n u n k n o w n s
T e l l s r e l e v a n t b u s i n e s s s t o r i e s f r o m d a t a
H a s D o m a i n K n o w l e d g e
}
24
big data. ecosystem
25
Vincent Granville
DJ PatilJeff
HammerbacherMok Oh
Daniel Tunkelang
data scientists
Former LinkedIn
Scientist
Current Data
Scientist @
Greylock
Former Facebook
Current Chief
Scientist @ Cloudera
Former PayPal
Chief Scientist
Current
Entrepreneur
Analytic Bridge,
Data Science
Central
Data Science
Director @