Transcript
Page 1: Machine learning for java developers

Machine Learning for Java Developers

Nirmal FernandoWSO2 Inc.

{Java Colombo}

Page 2: Machine learning for java developers

Few things about me...

● Associated Technical Lead at WSO2● Team Lead of WSO2 Machine Learner● Just completed 4th year in the industry● Graduated from Department of Computer Science, University

of Moratuwa.● Schooled at St. Sebastian’s College, Moratuwa.● Can sing a bit :-)

https://goo.gl/qbAXLz

Page 3: Machine learning for java developers

Predictive Analytics

Extract information from existing datasets to determine patterns and predict future outcomes and trends.

It does not tell you what will happen in the future.

But forecasts what might happen in the future with an acceptable level of reliability. source: http://insidebigdata.com/2014/08/25/salespredict-

marketo-partner-using-predictive-analytics/

Page 4: Machine learning for java developers

Predictive Analytics

“Big Data Predictive Analytics” Forrester Research report is the

second most read Forrester report in Q3, 2015

https://www.forrester.com

Page 5: Machine learning for java developers

Predictive Analytics - Use cases

http://californialoanfind.com/what-and-who-is-teletrack/

Page 6: Machine learning for java developers

Predictive Analytics - Use cases

http://www.chrisdunn.com/

Page 7: Machine learning for java developers

Machine Learning

Field of study that gives computers

the ability to learn

without being explicitly

programmed.

- Arthur Samuel (1959)

Page 8: Machine learning for java developers

Machine Learning - Pipeline

Page 9: Machine learning for java developers

Machine Learning - Terminology

● Input data must be in tabular format ● Each row is called a data point ● Each column is called a feature ● Value you are going to predict is called the “response

variable”

Page 10: Machine learning for java developers

● Next value prediction

● Classification

● Clustering

● Recommendations

etc…

Machine Learning - What type of a problem?

Page 11: Machine learning for java developers

Next value prediction

Example of linear regression on one independent variable

Page 12: Machine learning for java developers

Predicting a discrete value

Classification

Page 13: Machine learning for java developers

Grouping similar data points

together.

Clustering

Page 14: Machine learning for java developers

Seek to predict preferences a user

would give to an item/product.

Recommendations

Page 15: Machine learning for java developers

● Supervised learning

● Unsupervised learning

● Reinforcement learning

Machine Learning - Which algorithm category?

Page 16: Machine learning for java developers

Supervised vs Unsupervised

Page 17: Machine learning for java developers

Supervised Learning Algorithms

Regression Classification

Linear RegressionLasso RegressionRidge Regression

Logistic RegressionSupport Vector Machine (SVM)Decision TreeRandom ForestNaive BayesBayesian Network

Page 18: Machine learning for java developers

Unsupervised Learning Algorithms

Clustering

K-meansK-mediansHierarchical Clustering….

Page 19: Machine learning for java developers

Java tools for Machine Learning

Tool License URL

Weka GNU General Public License

http://www.cs.waikato.ac.nz/ml/weka/

JSAT GPL v3 https://github.com/EdwardRaff/JSAT

Mahout Apache v2 https://mahout.apache.org/

Spark MLlib Apache v2 http://spark.apache.org/mllib/

Page 20: Machine learning for java developers

Speed

Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.

Ease of Use

Write applications quickly in Java, Scala, Python, R.

Easy to Deploy

Runs on existing Hadoop clusters and data.

Apache Spark MLlib - scalable machine learning library

Page 21: Machine learning for java developers

SparkConf - Configuration for a Spark application. Used to set various Spark parameters as key-value pairs.

SparkContext / JavaSparkContext - Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster. Only one SparkContext may active per JVM.

RDD / JavaRDD - A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated in parallel.

Apache Spark - few terms

Page 22: Machine learning for java developers

Filter - Return a new dataset formed by selecting those elements of the source on which function returns true.

Map - Return a new distributed dataset formed by passing each element of the source through a function.

Random Split - Split a dataset randomly based on a given ratio.

Cache - Persisting (or caching) a dataset in memory across operations.

Apache Spark - few operations on a RDD

Page 23: Machine learning for java developers

● Dataset

Pima Indian diabetes dataset

https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes

Number of instances : 768

Number of features : 8

Let’s solve a classification problem using Apache Spark

Page 24: Machine learning for java developers

● Response variable

Name : class

Values : 0 or 1

Interpretation : Whether a given Pima Indian has diabetes or not

Let’s solve a classification problem using Apache Spark

Page 25: Machine learning for java developers

● Objective

Build a classification model to predict whether a given Pima Indian has diabetes or not.

Let’s try to build a Logistic Regression model for this.

Let’s solve a classification problem using Apache Spark

Page 26: Machine learning for java developers

Code:

https://github.com/nirmal070125/ml-java-meetup

Solution using Apache Spark

Page 27: Machine learning for java developers

Powered by Apache Spark and Apache Spark MLlib.

● Manage and explore your data ● Analyze the data using machine learning algorithms● Build machine learning models● Compare and manage generated machine learning models● Predict using the built models● Use the built models with WSO2 CEP and WSO2 ESB.

http://wso2.com/products/machine-learner/

Few words on WSO2 Machine Learner

Page 28: Machine learning for java developers

Top Related