big data analytics with matlab - mathworks · big data analytics with matlab dmitrij martynenko,...
TRANSCRIPT
1 © 2015 The MathWorks, Inc.
Big Data Analytics with MATLAB
Dmitrij Martynenko, Application Engineer, The MathWorks Germany
2
Data Science with MATLAB
§ Data Analysis § Statistics § Machine Learning § Software Engineering § Multivariable Calculus and Linear Algebra § Big Data § Data Cleaning § Data Visualization and Communication
3
How do you define Big Data?
“Any collection of data sets so large and complex that it becomes difficult to process using … traditional data processing applications.”
(Wikipedia)
“Any collection of data sets so large that it becomes difficult to process using
traditional MATLAB functions, which assume all of the data is in memory.” (MATLAB)
4
Big Data – Data Sources
File I/O • Text • Spreadsheet • XML • CDF/HDF • Image • Audio • Video • Geospatial • Web content
Hardware Access • Data acquisition • Image capture • GPU • Lab instruments
Communication Protocols • CAN (Controller Area Network) • DDS (Data Distribution Service) • OPC (OLE for Process Control) • XCP (eXplicit Control Protocol)
Database Access • Financial Data • ODBC • JDBC • HDFS (Hadoop)
5
Three Dimensions of Scaling
Compute power • Larger, complex problems • Cloud technologies
Data • More data, more quickly • Complicated, incomplete, and variable formats • System too complex to know governing equation
People • Share algorithms, protect IP • Web and enterprise
6
Three Dimensions of Scaling - MathWorks’ Solutions
Compute power MATLAB parallel computing solutions
Data MATLAB Hadoop interface Distributed arrays
People MATLAB deployment tools
7
Scale Your Data Memory and Data Access § 64-bit processors § Memory Mapped Variables § Disk Variables § Databases § Datastores
Programming Constructs § Streaming § Block Processing § Parallel-for loops § GPU Arrays § SPMD and Distributed Arrays § MapReduce
Platforms § Desktop (Multicore, GPU) § Clusters § Cloud Computing (MDCS for EC2) § Hadoop
8
Datastore
MATLAB – Access Data in HDFS
HDFS
Node Data
Node Data
Node Data
Hadoop
Datastore access portions of data stored in HDFS from MATLAB
ds = datastore('hdfs://localhost:9000/datasets/airline/airlinedata.csv’);
9
Datastore
MATLAB Distributed Computing Server - Hadoop
MapReduce Code
HDFS
Node Data
MATLAB Distributed Computing
Server
Node Data
Node Data
Map Reduce
Map Reduce
Map Reduce
10
Scalable Data Workflow Easily migrate from desktop to Clusters/Hadoop
Desktop
datastore/mapreduce Access HDFS
Connected to Clusters
mapreduce on clusters including Hadoop (HDFS)
MATLAB Distributed Computing Server
MATLAB Compiler
MATLAB (Parallel Computing)
Desktop
datastore/mapreduce Access HDFS
Connected to Clusters
mapreduce on clusters including Hadoop (HDFS)
Production Clusters
Deploy mapreduce for use on production clusters
11
Key Takeaways
§ Easy access to Big Data from your desktop with MATLAB
§ Work on the desktop with MATLAB and scale to clusters
§ Easy deployment into production including support for Hadoop
12
Resources
§ MATLAB MapReduce and Hadoop – http://www.mathworks.com/discovery/matlab-mapreduce-hadoop.html – Google “MATLAB Hadoop”
§ Consulting Team – MATLAB for Business Critical Applications
§ Reach out to your account team