big data analytics with matlab - it.mathworks.com · big data analytics with matlab dmitrij...

13
1 © 2015 The MathWorks, Inc. Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany

Upload: others

Post on 19-Aug-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

1 © 2015 The MathWorks, Inc.

Big Data Analytics with MATLAB

Dmitrij Martynenko, Application Engineer, The MathWorks Germany

Page 2: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

2

Data Science with MATLAB

§  Data Analysis §  Statistics §  Machine Learning §  Software Engineering §  Multivariable Calculus and Linear Algebra §  Big Data §  Data Cleaning §  Data Visualization and Communication

Page 3: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

3

How do you define Big Data?

“Any collection of data sets so large and complex that it becomes difficult to process using … traditional data processing applications.”

(Wikipedia)

“Any collection of data sets so large that it becomes difficult to process using

traditional MATLAB functions, which assume all of the data is in memory.” (MATLAB)

Page 4: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

4

Big Data – Data Sources

File I/O •  Text •  Spreadsheet •  XML •  CDF/HDF •  Image •  Audio •  Video •  Geospatial •  Web content

Hardware Access •  Data acquisition •  Image capture •  GPU •  Lab instruments

Communication Protocols • CAN (Controller Area Network) • DDS (Data Distribution Service) • OPC (OLE for Process Control) • XCP (eXplicit Control Protocol)

Database Access •  Financial Data •  ODBC •  JDBC •  HDFS (Hadoop)

Page 5: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

5

Three Dimensions of Scaling

Compute power •  Larger, complex problems •  Cloud technologies

Data •  More data, more quickly •  Complicated, incomplete, and variable formats •  System too complex to know governing equation

People •  Share algorithms, protect IP •  Web and enterprise

Page 6: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

6

Three Dimensions of Scaling - MathWorks’ Solutions

Compute power MATLAB parallel computing solutions

Data MATLAB Hadoop interface Distributed arrays

People MATLAB deployment tools

Page 7: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

7

Scale Your Data Memory and Data Access §  64-bit processors §  Memory Mapped Variables §  Disk Variables §  Databases §  Datastores

Programming Constructs §  Streaming §  Block Processing §  Parallel-for loops §  GPU Arrays §  SPMD and Distributed Arrays §  MapReduce

Platforms §  Desktop (Multicore, GPU) §  Clusters §  Cloud Computing (MDCS for EC2) §  Hadoop

Page 8: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

8

Datastore

MATLAB – Access Data in HDFS

HDFS

Node Data

Node Data

Node Data

Hadoop

Datastore access portions of data stored in HDFS from MATLAB

ds = datastore('hdfs://localhost:9000/datasets/airline/airlinedata.csv’);

Page 9: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

9

Datastore

MATLAB Distributed Computing Server - Hadoop

MapReduce Code

HDFS

Node Data

MATLAB Distributed Computing

Server

Node Data

Node Data

Map Reduce

Map Reduce

Map Reduce

Page 10: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

10

Scalable Data Workflow Easily migrate from desktop to Clusters/Hadoop

Desktop

datastore/mapreduce Access HDFS

Connected to Clusters

mapreduce on clusters including Hadoop (HDFS)

MATLAB Distributed Computing Server

MATLAB Compiler

MATLAB (Parallel Computing)

Desktop

datastore/mapreduce Access HDFS

Connected to Clusters

mapreduce on clusters including Hadoop (HDFS)

Production Clusters

Deploy mapreduce for use on production clusters

Page 11: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

11

Key Takeaways

§  Easy access to Big Data from your desktop with MATLAB

§  Work on the desktop with MATLAB and scale to clusters

§  Easy deployment into production including support for Hadoop

Page 12: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

12

Resources

§  MATLAB MapReduce and Hadoop –  http://www.mathworks.com/discovery/matlab-mapreduce-hadoop.html –  Google “MATLAB Hadoop”

§  Consulting Team –  MATLAB for Business Critical Applications

§  Reach out to your account team

Page 13: Big Data Analytics with MATLAB - it.mathworks.com · Big Data Analytics with MATLAB Dmitrij Martynenko, Application Engineer, The MathWorks Germany . 2 Data Science with MATLAB !

13 © 2015 The MathWorks, Inc.

Thank you!