data science

11

Click here to load reader

Upload: shankarradhakrishnan

Post on 22-Jun-2015

1.076 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Data science

Data Science

Shankar Radhakrishnan Cognizant

Page 2: Data science

History…

• Questions first, data later

• Data model first, data processing later

• Size first, project second, react overtime

• Focus on accuracy, assume little

• Importance to completeness and comprehensiveness

• Expose raw data to decision makers

• Provide insights but those that are not actionable

• Bound by constraints (Procurement, Process, Build Insights, Interaction)

Page 3: Data science

What’s Changed ?

• Medium to participate is vast

• Mode to reach expanded

• Data types are vast and voluminous

• Noise is huge, yet accepted

• Urgency precedes accuracy

• Guidance is better than completeness

• Cost to store and process has fallen (and still falling)

• More ways and means to process data at scale

Page 4: Data science

Speaking of Data

• Volume - Data at rest

• Variety - Data in many forms

• Velocity - Data in motion

• Veracity - Data in doubt

Page 5: Data science

Data Science

“ Data Science is the art of turning data into actions ”

This is accomplished through creation of data products, that provide actionable informationwithout exposing underlying data or analytics

“ Scientific study of the creation, validation and transformation of data to create meaning ”

http://www.datascienceassn.org/code-of-conduct.html

Page 6: Data science

While we are on definitions…

Data Mining

“ Non-trivial process of identifying valid, novel, potentially useful and understandable structures or patterns or models or

relationships in data to enable data driven decision making ”

Statistics

“ Science of learning from data or of making sense out of data ”

Page 7: Data science

Science of Data Science

• Analyze and understand data that’s available

• Find and acquire what more is needed

• Discover what’s not known from data

• Predict and build “actionable insights” from data

• Build data products that has “immediate” business impact

• Make it easy for business to “use”

• Help decision making to drive “business value”

Page 8: Data science

Data Science Toolkit

Python

R

Java

Textwrangler

SQL

C, C++

Mahout

NLTK

OpenNLP

GPText

SciPy

Pandas

scikit-leam

Hadoop

Hive

HAWQ

PL/Python

PL/R

PL/Java

Proprietary

D3.js

Gephi

Graphviz

R

Tableau

Proprietary

Languages Libraries Database Visualization

Page 9: Data science

Approach, Techniques

• Classification

• Filtering

• Structure

• Clustering

• Disambiguation

• De-duplication

• Normalization

• Correlation

• Prediction

• Discover

• Reason

• Model

• Deploy

• Visualize

• Recommend

• Predict

• Explore

• Machine Learning

• Decision Trees

• Bayesian Networks

• Logistic Regression

• Monte Carlo Methods

• Component Analysis

• Fuzzy Modeling

• Neural Networks

• Genetic Algorithms

Step Process Technology

Page 10: Data science

Data Science In Action

• Improving User Experience

• Multi-device event stream analysis

• Intrusion detection, avoidance

• Collocation analysis from cell-phone towers

• Text Mining, Bandwidth Throttling

• Network Performance & Optimization

• Mobile User Location Analytics

• Customer Churn Prevention

• Social Media and Sentiment Analysis

• Location Based Initiatives

Page 11: Data science

Thanks !