data science
TRANSCRIPT
Data Science
Shankar Radhakrishnan Cognizant
History…
• Questions first, data later
• Data model first, data processing later
• Size first, project second, react overtime
• Focus on accuracy, assume little
• Importance to completeness and comprehensiveness
• Expose raw data to decision makers
• Provide insights but those that are not actionable
• Bound by constraints (Procurement, Process, Build Insights, Interaction)
What’s Changed ?
• Medium to participate is vast
• Mode to reach expanded
• Data types are vast and voluminous
• Noise is huge, yet accepted
• Urgency precedes accuracy
• Guidance is better than completeness
• Cost to store and process has fallen (and still falling)
• More ways and means to process data at scale
Speaking of Data
• Volume - Data at rest
• Variety - Data in many forms
• Velocity - Data in motion
• Veracity - Data in doubt
Data Science
“ Data Science is the art of turning data into actions ”
This is accomplished through creation of data products, that provide actionable informationwithout exposing underlying data or analytics
“ Scientific study of the creation, validation and transformation of data to create meaning ”
http://www.datascienceassn.org/code-of-conduct.html
While we are on definitions…
Data Mining
“ Non-trivial process of identifying valid, novel, potentially useful and understandable structures or patterns or models or
relationships in data to enable data driven decision making ”
Statistics
“ Science of learning from data or of making sense out of data ”
Science of Data Science
• Analyze and understand data that’s available
• Find and acquire what more is needed
• Discover what’s not known from data
• Predict and build “actionable insights” from data
• Build data products that has “immediate” business impact
• Make it easy for business to “use”
• Help decision making to drive “business value”
Data Science Toolkit
Python
R
Java
Textwrangler
SQL
C, C++
Mahout
NLTK
OpenNLP
GPText
SciPy
Pandas
scikit-leam
Hadoop
Hive
HAWQ
PL/Python
PL/R
PL/Java
Proprietary
D3.js
Gephi
Graphviz
R
Tableau
Proprietary
Languages Libraries Database Visualization
Approach, Techniques
• Classification
• Filtering
• Structure
• Clustering
• Disambiguation
• De-duplication
• Normalization
• Correlation
• Prediction
• Discover
• Reason
• Model
• Deploy
• Visualize
• Recommend
• Predict
• Explore
• Machine Learning
• Decision Trees
• Bayesian Networks
• Logistic Regression
• Monte Carlo Methods
• Component Analysis
• Fuzzy Modeling
• Neural Networks
• Genetic Algorithms
Step Process Technology
Data Science In Action
• Improving User Experience
• Multi-device event stream analysis
• Intrusion detection, avoidance
• Collocation analysis from cell-phone towers
• Text Mining, Bandwidth Throttling
• Network Performance & Optimization
• Mobile User Location Analytics
• Customer Churn Prevention
• Social Media and Sentiment Analysis
• Location Based Initiatives
Thanks !