data science: the art of foul play by serhiy shelpuk

23
Data Science the art of foul play Sergey Shelpuk SoftServe, Inc. [email protected] September, 2013

Upload: softserve-inc

Post on 11-May-2015

679 views

Category:

Technology


3 download

DESCRIPTION

Serhiy Shelpuk, Lead Data Scientist, Competence Manager at SoftServe, Inc., delivered an insightful presentation on Data Science and SoftServe`s Data Science Group Knowledge Model at the 2013 IT Weekend Ukraine conference that took place on September 14, 2013, in Kyiv, Ukraine. Here`s his presentation.

TRANSCRIPT

Page 1: Data Science: The Art of Foul Play by Serhiy Shelpuk

Data Science the art of foul play

Sergey Shelpuk

SoftServe, Inc.

[email protected]

September, 2013

Page 2: Data Science: The Art of Foul Play by Serhiy Shelpuk
Page 3: Data Science: The Art of Foul Play by Serhiy Shelpuk

“Your goal should not be to buy players, it should be to buy wins. In order to buy wins you should buy runs” (c)

Page 4: Data Science: The Art of Foul Play by Serhiy Shelpuk
Page 5: Data Science: The Art of Foul Play by Serhiy Shelpuk

More data is available for companies

Storage technologies allow storing and operating it

Advanced analytics could be applied to this new data to achieve competitive advantage

Page 6: Data Science: The Art of Foul Play by Serhiy Shelpuk

Data Scientist: The Sexiest Job of the 21st Century

For Today’s Graduate, Just One Word: Statistics

“I keep saying that the sexy job in the next 10 years will be statisticians, and I’m not kidding.”

Hal Varian chief economist at Google

Data Scientist: The Hottest Job You Haven't Heard Of

Page 7: Data Science: The Art of Foul Play by Serhiy Shelpuk

• Hybrid IT and Cloud Computing

• Strategic Big Data

• Actionable Analytics

• In Memory Computing

• Integrated Ecosystems

Top 10 Strategic Technology Trends for 2013

• Mobile Device Battles

• Mobile Applications and HTML5

• Personal Cloud

• Enterprise App Stores

• The Internet of Things

Page 8: Data Science: The Art of Foul Play by Serhiy Shelpuk

McKinsey Global Institute projects approximately 140,000 to 190,000 unfilled positions of data analytics experts in the U.S. by 2018 and a shortage of 1.5 million managers and analysts who have the ability to understand and make decisions using big data.

Page 9: Data Science: The Art of Foul Play by Serhiy Shelpuk
Page 10: Data Science: The Art of Foul Play by Serhiy Shelpuk

Business Tasks

• Define prospective customers

• Define traffic jams in the city

• Recommend restaurants and menus

• Adjust UI to the particular user

• Classify body part on X-Ray image

• Define market niche

• Define influencers in the social networks

• Define similar customers or projects in portfolio

• Define informal groups in the organization

• Define fraud bank transaction

• Define network intrusion attempts

• Provide automatic aircraft engine testing

• Provide automatic IT infrastructure monitoring

• Provide clinical test analysis

• Define the best price for the goods or services to maximize profits

• Define best working schedule for the store

• Define best amount of production

• Define best business rules

Model Family Classification Clustering Anomaly Detection

Optimization

Algorithms • Naïve Bayes • Logistic regression • Support Vector

Machines • Neural Networks

• K-Means • K nearest

neighbor • Self-organized

maps • Mixture of

Gaussians

• Mixture of Gaussians

• Self-learning anomaly detection

• Gradient descent • Simplex method • Newton’s method • Normal equations • Genetic algorithms

Page 11: Data Science: The Art of Foul Play by Serhiy Shelpuk

Cross Industry Standard Process for Data Mining

Page 12: Data Science: The Art of Foul Play by Serhiy Shelpuk

Business

Level

• Basics of Business Analysis

• Basics of Economics

• Basics of Product Management

• Basics of Organizational Behavior

Logic

Level

• Statistics/Probability

• Machine Learning

• Data Mining

• Artificial Intelligence

Technology

Level

• Matlab/Octave

• R

• SQL

• Parallel Computing

SoftServe Data Science Group Knowledge Model

Page 13: Data Science: The Art of Foul Play by Serhiy Shelpuk

Deep Learning Neural Networks

Page 14: Data Science: The Art of Foul Play by Serhiy Shelpuk

Learning algorithm

Task: recognize a motorcycle

Feat

ure

ext

ract

or

Page 15: Data Science: The Art of Foul Play by Serhiy Shelpuk

The concept of Autoencoder

Page 16: Data Science: The Art of Foul Play by Serhiy Shelpuk

… … …

The concept of Autoencoder

© Andrew Y. Ng

Page 17: Data Science: The Art of Foul Play by Serhiy Shelpuk

Large scale deep learning networks

See more: Building high-level features using large scale unsupervised learning

Page 18: Data Science: The Art of Foul Play by Serhiy Shelpuk
Page 19: Data Science: The Art of Foul Play by Serhiy Shelpuk

Pre-trained as Autoencoder Typical classification

neural network

Deep learning neural networks

Page 20: Data Science: The Art of Foul Play by Serhiy Shelpuk

Video

Text/NLP

Images

Few results

© Andrew Y. Ng

Page 21: Data Science: The Art of Foul Play by Serhiy Shelpuk

Phase 1 results (old-fashion anomaly detection)

Phase 2 prototype (deep learning approach)

Deep Learning in SoftServe

Page 22: Data Science: The Art of Foul Play by Serhiy Shelpuk

Useful Resources

• Introduction to Statistics • Introduction to Artificial Intelligence

• Machine Learning • Probabilistic Graphical Models • Statistics One

Page 23: Data Science: The Art of Foul Play by Serhiy Shelpuk

Thank you!