an explanation of machine learning for business
DESCRIPTION
Slides of the course on big data by Clement Levallois from EMLYON Business School. For business students. Check the online video connected with these slides. -> Machine learning explained in simple terms to a business audience: what is a training set, a test set, and how does machine learning differ from statistics.TRANSCRIPT
![Page 1: An explanation of machine learning for business](https://reader038.vdocuments.mx/reader038/viewer/2022100602/5583e3fad8b42a2a4d8b4658/html5/thumbnails/1.jpg)
MK99 – Big Data 1
Big data &
cross-platform analytics MOOC lectures Pr. Clement Levallois
![Page 2: An explanation of machine learning for business](https://reader038.vdocuments.mx/reader038/viewer/2022100602/5583e3fad8b42a2a4d8b4658/html5/thumbnails/2.jpg)
MK99 – Big Data 2
A short note on machine learning for business
![Page 3: An explanation of machine learning for business](https://reader038.vdocuments.mx/reader038/viewer/2022100602/5583e3fad8b42a2a4d8b4658/html5/thumbnails/3.jpg)
MK99 – Big Data 3
Machine Learning • Family of techniques to formulate predictions, based on
data
• Why is it called Machine learning? – Machine: it is about algorithms running on computers, not
equations solved with pen and paper
– Learning: the algorithms start with zero accuracy. Then, they get more accurate while being fed with data: the algorithm refines its parameters, it “learns”.
![Page 4: An explanation of machine learning for business](https://reader038.vdocuments.mx/reader038/viewer/2022100602/5583e3fad8b42a2a4d8b4658/html5/thumbnails/4.jpg)
MK99 – Big Data 4
Typical set up 1. We start with a training set
Data already collected: we know the actual values to be found Ex: a list of consumers, their characteristics and their associated credit score
2. The algorithms are trained on this set
-> A series of algorithms run on the training set. Their parameters get adjusted so that the actual values get progressively predicted the most accurately possible.
3. A test set (“fresh data”) is brought -> List of consumer characteristics. Their credit score is known but hidden.
4. Running the trained algo on the test set -> Predict the credit score for each consumer in the test set, using the algorithms that were trained on phase 1
5. A measure of accuracy - Given the correct values to be predicted in the test set, how accurate were the algorithms? -> Where the credit scores accurately predicted?
Actual values
![Page 5: An explanation of machine learning for business](https://reader038.vdocuments.mx/reader038/viewer/2022100602/5583e3fad8b42a2a4d8b4658/html5/thumbnails/5.jpg)
MK99 – Big Data 5
Vocabulary
• Data scientists “train” their model and then test it
• They are concerned by “out-of-sample” prediction
– The fact that their model predicts accurately data points in the training set (the “sample”) is trivial
– This is the accuracy on the test set that matters!
– This is called an “out-of-sample” prediction
![Page 6: An explanation of machine learning for business](https://reader038.vdocuments.mx/reader038/viewer/2022100602/5583e3fad8b42a2a4d8b4658/html5/thumbnails/6.jpg)
MK99 – Big Data 6
Why is machine learning (ML) so different from statistics?
• ML does not focus on causality – just prediction! – Note: for this reason, ML cannot predict the effect of
intervention - it has no causal model.
• ML has a special concern for out-of-sample prediction
– Will be especially careful about over-fitting
• ML picks its algorithms from diff academic disciplines
– Text, network relations, clustering, not just traditional statistics
• Coming from comput. sciences, ML has affinities with big data – Procedures optimized for speed and scale
But the best data scientists often started as statisticians / econometricians: See Hal Varian: Chief Economist at Google
![Page 7: An explanation of machine learning for business](https://reader038.vdocuments.mx/reader038/viewer/2022100602/5583e3fad8b42a2a4d8b4658/html5/thumbnails/7.jpg)
MK99 – Big Data 7
• Kaggle is a website hosting ML competitions, anybody can join
• Goal: make the best prediction on a dataset, with cash prizes
• From predicting clicks on ads to epileptic seizures
• Always the same setup: a training set, a test set, a scoring based on accuracy.
![Page 8: An explanation of machine learning for business](https://reader038.vdocuments.mx/reader038/viewer/2022100602/5583e3fad8b42a2a4d8b4658/html5/thumbnails/8.jpg)
MK99 – Big Data 8
This slide presentation is part of a course offered by EMLYON Business School (www.em-lyon.com)
Contact Clement Levallois (levallois [at] em-lyon.com) for more information.