data analytics for engineers (mac) - mathworks€¦ · data analytics for engineers will wilson...
TRANSCRIPT
1© 2016 The MathWorks, Inc.
Data Analytics for Engineers
Will Wilson
Application Engineer
MathWorks
2
Agenda
Definition
Common Challenges
Case Study
Wrap Up
3
Data Analytics is the process of leveraging the information in your data so
you can take action!
Data analytics helps companies and organizations to make better
business decisions.
Scope
– Engineering and Business data
– May include Machine Learning
What is Data Analytics?
4
Challenges with Data Analytics
Aggregating data from multiple sources
Cleaning data
Choosing a model
Moving to production
5
Goal:
– Implement a tool for easy and accurate
computation of day-ahead system load forecast
Requirements:
– Acquire and clean data from multiple sources
– Accurate predictive model
– Easily deploy to production environment
Case Study: Day-Ahead Load Forecasting
6
Source Data – Energy Load & Weather
http://www.ncdc.noaa.gov/http://mis.nyiso.com/public/
.CSV .txt
7
Techniques to Handle Missing Data
List-wise deletion
– Unbiased estimates
– Reduces sample size
Implementation options
– Built in to many
MATLAB functions
– Manual filtering
8
Techniques to Handle Missing Data
Substitution
– Replace missing data
points with a reasonable
approximation
Easy to model
Too important to exclude
9
Merge Different Sets of Data
Join along a common axis
Popular Joins:
– Inner
– Full Outer
– Left Outer
– Right Outer
Inner Join
Full Outer Join
Left Outer Join
10
Full Outer Join
X Y Z1 0.1 0.2
3 0.3 0.4
5 0.5 0.6
7 0.7 0.8
Key B Y Z
1
3
4
5
7
9
First Data Set
A B
1 1.1
4 1.4
7 1.7
9 1.9
Second Data Set
Key
Key
1.1
1.4
1.7
1.9
0.1
0.3
0.7
0.5
0.2
0.4
0.8
0.6
NaN
NaN
NaN
NaN
NaN
NaN
Joined Data Set
11
Challenges with Data Analytics
Aggregating data from multiple sources
Cleaning data
Choosing a model
Moving to production
12
Machine LearningCharacteristics and Examples
Characteristics
– Lots of variables
– System too complex to know
the governing equation(e.g., black-box modeling)
Examples
– Pattern recognition (speech, images)
– Financial algorithms (credit scoring, algo trading)
– Energy forecasting (load, price)
– Biology (tumor detection, drug discovery)
93.68%
2.44%
0.14%
0.03%
0.03%
0.00%
0.00%
0.00%
5.55%
92.60%
4.18%
0.23%
0.12%
0.00%
0.00%
0.00%
0.59%
4.03%
91.02%
7.49%
0.73%
0.11%
0.00%
0.00%
0.18%
0.73%
3.90%
87.86%
8.27%
0.82%
0.37%
0.00%
0.00%
0.15%
0.60%
3.78%
86.74%
9.64%
1.84%
0.00%
0.00%
0.00%
0.08%
0.39%
3.28%
85.37%
6.24%
0.00%
0.00%
0.00%
0.00%
0.06%
0.18%
2.41%
81.88%
0.00%
0.00%
0.06%
0.08%
0.16%
0.64%
1.64%
9.67%
100.00%
AAA AA A BBB BB B CCC D
AAA
AA
A
BBB
BB
B
CCC
D
13
Overview – Machine Learning
Machine
Learning
Supervised
Learning
Classification
Regression
Unsupervised
LearningClustering
Group and interpretdata based only
on input data
Develop predictivemodel based on bothinput and output data
Type of Learning Categories of Algorithms
14
Challenges with Data Analytics
Aggregating data from multiple sources
Cleaning data
Choosing a model
Moving to production
15
Integrate analytics with your enterprise systemsMATLAB Compiler and MATLAB Coder
.exe .lib .dll
MATLAB
Compiler SDK
MATLAB
Compiler
MATLAB
Runtime
MATLAB Coder
16
Scale up with MATLAB Production Server
Most efficient path for creating enterprise applications
Deploy MATLAB programs into production
– Manage multiple MATLAB programs and versions
– Update programs without server restarts
– Reliably service large numbers of concurrent requests
Integrate with web, database, and application servers
MATLAB Production Server(s)
HTML
XML
Java Script
Web Server(s)
17
MATLAB
Desktop
Deployed AnalyticsMATLAB Production Server
MATLAB
Production
Server
Web
Application
Server
MATLAB
Production Server
Requ
est B
roke
r
CTF
Apache Tomcat
Web Server/
Webservice
Weather
Data
Energy
Data
Predictive
Models
Train in
MATLAB
18
Data Analytics Products
Develop
Predictive ModelsAccess and
Explore DataPreprocess Data
Integrate Analytics
with Systems
MATLAB
MATLAB Production Server
Statistics and Machine Learning ToolboxDatabase Toolbox
Neural Network ToolboxData Acquisition Toolbox
Image Processing Toolbox
Signal Processing Toolbox Computer Vision System Toolbox
Curve Fitting Toolbox
MATLAB Compiler
MATLAB Compiler SDK
Parallel Computing Toolbox, MATLAB Distributed Computing Server
Mapping Toolbox
Image Acquisition Toolbox
OPC Toolbox
Econometrics ToolboxUsed in today’s demo
Additional Data Analytics
products
19
Key Takeaways
Data preparation can be a big job; leverage built-in
MATLAB tools and spend more time on the analysis.
Rapidly iterate through different predictive models,
and find the one that’s best for your application.
Leverage parallel computing to scale-up your analysis
to large datasets.
Eliminate the need to recode by deploying your
MATLAB algorithms into production.
20© 2016 The MathWorks, Inc.
© 2016 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for a list
of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders.