why open data science matters | gartner bi & analytics summit '16
TRANSCRIPT
© 2016 Continuum Analytics - Proprietary
WHY OPEN DATA SCIENCE MATTERS Why open source is eating the world
Travis Oliphant, CEO Michele Chambers, CMO & VP Products Continuum Analytics Gartner BI & Analytics Summit 2016
Agenda
© 2016 Continuum Analytics- Confidential & Proprietary
• About Us • What is Modern Analytics and how is different? • Why does Open Data Science matter? • Q&A
2
© 2016 Continuum Analytics- Confidential & Proprietary 3
• Travis Oliphant @teoliphant CEO & co-founder Continuum Analytics Ph.D. Mayo Clinic in Biomedical Engineering B.S., M.S. BYU Mathematics & Electrical Eng. Open Source contributor and leader since 1997
Creator of NumPy and SciPy Started Numba
Author Guide to NumPy
About Us
© 2016 Continuum Analytics- Confidential & Proprietary 4
• Michele Chambers @ mcAnalytics CMO & VP Product Continuum Analytics M.B.A Duke University, B.S. Computer Engineering Author
Big Data Big Analytics Wiley Modern Analytics Methodologies: Driving Business Value with Analytics Pearson FT Press Advanced Analytics Methodologies: Driving Business Value with Analytics Pearson FT Press
About Us
WHAT’S THE PROBLEM?
© 2016 Continuum Analytics- Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary 6
Why are major corporations moving to Modern Analytics & Open Data Science?
Large Investment Banks Major Upstream Oil & Gas Global CPG Manufacturers How can I create and
deploy timely risk models? How can I possibly identify
the root causes of my complex problem and
remediate early enough to create revenue assurance?
How can I take advantage of all this new sensor
information now?
© 2016 Continuum Analytics- Confidential & Proprietary 7
Industry Leaders Trusting Open Data Science
The Past vs. Present
© 2016 Continuum Analytics- Confidential & Proprietary 8
Decreasing Use
• Vendor lock in • High costs • Lack of integration • Inability to easily deploy • Skills gap
Proprietary Software
• Avoids vendor lock in • Reduces cost • Open APIs and
connectors • Eliminates chasm
between build & deploy • Accessible to tomorrow’s
talent
Accelerating Adoption Open Source Software
Evolving Technology
© 2016 Continuum Analytics- Confidential & Proprietary 9
• Limited Data Sources • Legacy Compute
Engines • On-premise
Status Quo Proprietary Software
• Big Data • Modern Analytics • Distributed Computing • High Performance
Computing • Cloud • Streaming
Next Generation Open Source Software
Business Intelligence
How’s Modern Analytics Different from Traditional Analytics?
© 2016 Continuum Analytics- Confidential & Proprietary 10
Traditional Analytics SQL
Analytics Descriptive Statistics
Data Mining
Predictive Analytics
Prescriptive Analytics Modern Analytics
Machine Learning
Simulation Optimization
How’s Modern Architectures Different from Traditional Architectures?
© 2016 Continuum Analytics- Confidential & Proprietary 11
Distributed Modern Architecture
Cloud Computing
Parallel Computing
Parallel & Distributed Computing
Stream Computing
Monolithic Traditional Architecture
Centralized Computing
Network Computing
High Performance Computing
Evolving Roles
© 2016 Continuum Analytics- Confidential & Proprietary 12
• Statistician • Programmer
Status Quo Proprietary Software
• Data Science Teams • Business Analyst • Data Scientist • Developer • Data Engineer • DevOps
Next Generation Open Source Software
How are Modern Roles Different from Traditional Roles?
© 2016 Continuum Analytics- Confidential & Proprietary
Team | Collaborative Individual | Silo
Modern Roles Traditional Roles
13
Modern Data Science Teams use…
© 2016 Continuum Analytics- Confidential & Proprietary 14
Data Scientist • Hadoop / Spark • Programming
Languages • Analytic Libraries • IDE • Notebooks • Visualization
Biz Analyst • Spreadsheets • Visualization • Notebooks • Analytic
Development Environment
Data Engineer • Database / Data
Warehouse • ETL
Developer • Programming
Languages • Analytic Libraries • IDE • Notebooks • Visualization
DevOps • Database / Data
Warehouse • Middleware • Programming
Languages
RIGHT TECHNOLOGY FOR THE PROBLEM
Modern Data Science Teams Want
© 2016 Continuum Analytics- Confidential & Proprietary 15
Collaboration
• Iterate on analysis • Share discoveries with team • Interact with teams across
the globe
Interactivity
• Interact with data • Build high performance
models • Visualize results in context
Integration
• Work with open source and legacy data systems
• Leverage data science languages: Python, R, Matlab, SAS, SPSS, Excel, Java, C/C++, C#, .NET, FORTRAN and more
Predict
Share
Deploy
with Open Data Science
WHAT’S OPEN DATA SCIENCE?
© 2016 Continuum Analytics- Confidential & Proprietary
“ ”17
An interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms
Wikipedia
Data Science is …
© 2016 Continuum Analytics- Confidential & Proprietary
Data Science is not just Machine Learning…
© 2016 Continuum Analytics - Confidential & Proprietary 18
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
Data Science is Interdisciplinary…
© 2016 Continuum Analytics - Confidential & Proprietary 19
Hadoop, Spark
GPUs, mul2-‐cores
Classifica2on, deep learning
Regression, PCA
Web crawling, scraping, 3rd party data & API providers, predic2ve
services & APIs
Data warehouse, querying, repor2ng
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
Open Data Science is …
an inclusive movement
that makes open source tools of data science -- data, analytics, & computation – easily work together as a connected ecosystem
© 2016 Continuum Analytics- Confidential & Proprietary 20
Open Data Science Means Open….
Availability Innovation
Interoperability Transparency
For everyone in the data science team
© 2016 Continuum Analytics- Confidential & Proprietary
OPEN DATA SCIENCE IS THE FOUNDATION TO MODERNIZATION
21
Another words, if you want to…
Control the Chaos Empower the User Lead with Analytics Evangelize the New Modernize the Core
© 2016 Continuum Analytics- Confidential & Proprietary 22
YOU’VE GOT TO USE OPEN DATA SCIENCE
Open Data Science Vibrant and Growing Community
© 2016 Continuum Analytics- Confidential & Proprietary 23
Python Community
30M+ Anaconda Downloads
3M+
Packages in Anaconda
720+
R Community
16M+ Spark Python Usage
50%+
© 2016 Continuum Analytics- Confidential & Proprietary 24
Open Data Science Community
A gathering of Python enthusiasts for sharing ideas and learning from each other for ever evolving challenges
Promotes and supports ongoing R&D of open source compu2ng tools through educa2onal, community and public channels
Support for the Apache Community of open source soJware projects which are for the public good
Suppor2ng the R community, the R Founda2on, and others developing and distribu2ng R
Open Source Communities Creates Powerful Technology for Data Science
© 2016 Continuum Analytics- Confidential & Proprietary 25
Numba
dask
xlwings
Airflow
Blaze
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
Python is the Common Language
© 2016 Continuum Analytics- Confidential & Proprietary 26
Numba
dask
xlwings
Airflow
Blaze
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
© 2016 Continuum Analytics- Confidential & Proprietary 27
Python Trusted by Industry Leaders
“ ”28
Everyone at JPMorgan now needs to know Python and there are around 5,000 developers using it at Bank of America. There are close to 10 million lines of Python code in Quartz and we got close to 3,000 commits a day. It’s a good scripting language and easily integrated into both the front and back ends, which was one of the reasons we chose it in the first place.
Kirat Singh, Former Global Head of Risk Systems, Bank of America Merrill Lynch
Python is Everywhere
© 2016 Continuum Analytics- Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary 29
Why Companies are Migrating to ODS…
Large Investment Bank Major Upstream Oil & Gas Global CPG Manufacturer Problem • Hard to find people to create
proprietary risk assessment models
• Takes months and years to deploy Solution • Moved to ODS and leveraged
innovation in ODS Results • Create and deploy risk models in
days not years with easier to find and hire data scientists
Problem • Complex model and simulation
required with disparate internal and external data
Solution • Moved to ODS to easily integrate
multiple data feeds and leverage OS innovation
Results • Created full lifecycle predictive
model and simulation for revenue assurance
Problem • Unable to ingest Big Data from
sensors to proactively monitor oil well holes
Solution • Moved to ODS and leveraged
diversity of ODS analytics to create novel visualizations and predictive models using sensor data
Results • Gained insights into oil hole issues
in weeks not years to detect issues earlier and increase profitability
Python’s Not the Only One…
© 2016 Continuum Analytics- Confidential & Proprietary 30
SQL
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
But it’s also a Great Glue Language
© 2016 Continuum Analytics- Confidential & Proprietary 31
SQL
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
Anaconda is the Open Data Science Platform Bringing Technology Together…
© 2016 Continuum Analytics- Confidential & Proprietary 32
Numba
dask Airflow
SQL
xlwings Blaze
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
© 2016 Continuum Analytics- Confidential & Proprietary 33
But Most Importantly Empowering Everyone on the Data Science Team
Data Scientist Biz Analyst Data Engineer Developer DevOps
Deploy & Operate
Explore & Analyze
Collaborate & Publish
• Accelerates Time-to-Value
• Empowers the Data Science team
• Connects Open Source Communities
34
is…. the leading modern open source analytics platform powered by Python the fastest growing open data science language
the innovative open data science platform to exploit data, analytics, and computation
© 2016 Continuum Analytics- Confidential & Proprietary 35
Introducing Anaconda The Modern Open Source Analytics Platform Powered by Python
§ Enterprise Ready PlaNorm – Simplify administra2on – Use modern data science
– Collaborate with en2re team – Leverage modern architectures
– Integrate data sources – Accelerate performance Security
Governance
Provenance
R Scala
Python
R | Scala
JS
C | C++ Fortran
APPLICATIONS
DATA
HARDWARE
ANALYTICS
Model Building
Data Exploration Software Development
HIGH PERFORMANCE
Business Analyst
Data Scientist
Developer
Data Engineer
DevOps
Data Science Team
Cloud On-premises
LANGUAGES
OPERATIONS
ANACONDA GIVES SUPERPOWERS TO PEOPLE WHO CHANGE THE WORLD
© 2016 Continuum Analytics- Confidential & Proprietary
Modern Data Science Teams Love Anaconda
© 2016 Continuum Analytics- Confidential & Proprietary 37
Data Scientist • Hadoop / Spark • Programming
Languages • Analytic Libraries • Notebooks • Visualization • IDE
Biz Analyst • Spreadsheets • Visualization • Notebooks • Analytic
Development Environment
Data Engineer • Database / Data
Warehouse • ETL
Developer • Programming
Languages • Analytic Libraries • IDE • Notebooks • Visualization
DevOps • Database / Data
Warehouse • Middleware • Programming
Languages
© 2016 Continuum Analytics- Confidential & Proprietary 38
Anaconda Trusted by Industry Leaders Financial Services
Risk Mgmt, Quant modeling, Data exploration and processing, algorithmic trading, compliance reporting
Government Fraud detection, data crawling, web & cyber data analytics, statistical modeling
Healthcare & Life Sciences Genomics data processing, cancer research, natural language processing for health data science
High Tech Customer behavior, recommendations, ad bidding, retargeting, social media analytics
Retail & CPG Engineering simulation, supply chain modeling, scientific analysis
Oil & Gas Pipeline monitoring, noise logging, seismic data processing, geophysics
Questions?
… and so that is why Open Data Science is eating the world.
© 2016 Continuum Analytics- Confidential & Proprietary 39
221 W. 6th Street Suite #1550 Aus2n, TX 78701 +1 512.222.5440
[email protected] @Con2nuumIO
CONTINUUM ANALYTICS We Empower Data Science Teams to Change the World
Stop by booth 421 to get a signed book