sports category prediction with amazon ml
TRANSCRIPT
!
Sports Category prediction with Amazon ML
@nafu003 ookami, Inc. 2015/04/26
“Amazon Machine Learning (Amazon ML) is a robust AWS machine learning platform
in the cloud that allows software developers to train predictive models and use them to create powerful predictive applications.”
What is Amazon ML?
Amazon Machine Learning Developer Guide
Amazon ML Key Concepts• Datasources contain metadata associated with data
inputs to Amazon ML
• ML models generate predictions using the patterns extracted from the input data
• Evaluations measure the quality of ML models
• Batch predictions asynchronously generate predictions for multiple input data observations
• Real-time predictions synchronously generate predictions for individual data observations
Amazon Machine Learning Developer Guide
ML models
• Binary
• Multiclass
• Regression
ML models
• Binary
• Multiclass
• Regression
Datasources• Extract from ookami news database
• 99999 records
• 69730 training data
• 30433 evaluation data
• Each row has
• news title
• news summary
• news url
• sports category name
Better than the baseline
• Average F1 score: 0.49
• Baseline F1 score: 0.02
F1 score A measure of a test's accuracy
F1 score =2 ⇤ precision ⇤ recalprecision+ recall
F1 score equals two times precision times recall over precision plus recall
http://en.wikipedia.org/wiki/Precision_and_recall
• True positives
• Predict trueand it’s actually true
• False negative
• Predict falsebut it’s true contrary to expectations
71.65% is Soccer 39.50% actually
We can improve more
• Collect more data
• Choose right category
• Make sure each category has close number of records
References
• Amazon Machine Learning Developer Guide
• http://en.wikipedia.org/wiki/F1_score
• http://en.wikipedia.org/wiki/Precision_and_recall
• http://www.ees.dendai.ac.jp/Lectures/TechEng-Horio/HowToRead.pdf