assurance scoring: using machine learning and analytics to reduce risk in the public sector
TRANSCRIPT
![Page 1: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/1.jpg)
Assurance Scoring: Using Machine Learning and Analytics to Reduce Risk in the Public Sector
Matt Thomson17/11/2016
![Page 2: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/2.jpg)
2Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Outline
IntroductionTraditional Fraud DetectionAssurance ScoringMachine LearningBusiness RulesAnomaly DetectionGraph Links
![Page 3: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/3.jpg)
3Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Who am I?
Matt ThomsonSenior Data Scientist at CapgeminiPhD in Astrophysics (http://arxiv.org/abs/1010.3315)Several years experience in fraud detection
CapgeminiBig Data Analytics team~100 Data Scientists, Big Data Engineers and Data AnalystsFocus on Open Source and Big Data technologies to solve client
problemsSponsor the meetup today!
![Page 4: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/4.jpg)
4Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Introduction to the Problem
Public sector constantly working in an environment of reduced resources
Want to provide a better service but with greater efficiency
Therefore very important that limited resources are focussed correctly
Assurance Scoring Use ML and other analytical methods to identify the least risky people or applications so
that investigators resources can be targeted on the most risky
![Page 5: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/5.jpg)
5Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Hypothetical Example – 2016 Olympics tickets
Imagine running the application process for selling tickets to the 2016 Olympics
Avoid selling tickets to touts/resellers Vast majority of people applying for tickets are genuine Fraud detection with big class imbalance problem (<0.1%) Avoid approach of investigating each person applying
Lets say we know from 2012 Olympics which people ended up reselling their tickets – training data
Use ML to identify the 30% (say) least likely to be touts – fast tracked
Investigators focus on the high risk
![Page 6: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/6.jpg)
6Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Traditional Fraud Detection
Identify Historical
Training Data
Feature Engineering
Model Training and Evaluation
Model Execution
Feedback
![Page 7: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/7.jpg)
7Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Assurance Scoring
Focus on low-risk
Allows resources to be better focussed
Not limited to Machine Learning
Built using Python! Pandas, Scikit-learn etc Scala version using Spark MLlib
![Page 8: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/8.jpg)
8Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Assurance Scoring
![Page 9: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/9.jpg)
9Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
POLE ‘Analytical’ Data Layer
Disparate data sources - Atomic Layer
Atomic data is Transformed and Loaded into POLE
POLE Layer
EventLocationObjectPerson
![Page 10: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/10.jpg)
10Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
POLE ‘Analytical’ Data Layer
POLE contains ALL entities from the Atomic Layer, plus their inter-linkages
![Page 11: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/11.jpg)
11Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Assurance Scoring
![Page 12: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/12.jpg)
12Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Machine learning
Transform Selection Model
Training
Validation
Test
Feature extraction and selection Model Building
Variety of output files: logs, graphics, pickle models, etcTesting: Unit tests, monitoring tests and integration tests
Vector BuildInput Data
Manipulate, ExploreData
Framework: Structure, flexibility, consistency
![Page 13: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/13.jpg)
13Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Machine learning : Feature Engineering
SQL, Python
Transform
Explore
Select
Ask questions, validate
Refine features
• Feature Extraction
• Data exploration
• Feature selection
Historical Data
![Page 14: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/14.jpg)
14Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Machine Learning: Model Building
Training
Validation
Test
Split Datasets
Build Models
Hyper-parameter tuning
Selectedfeatures Models
Training results
Validation results
Testsresults
Compare Models
![Page 15: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/15.jpg)
15Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Low risk? High risk? Depends on classifier’s threshold
• True-positives : applications the model correctly classifies as high risk
• True negatives: applications model correctly classifies as low risk
• False-positives: applications the model scores as high risk but are not
• False-negatives: applications the model scores as low risk but were in fact high risk
![Page 16: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/16.jpg)
16Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Assurance Scoring
![Page 17: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/17.jpg)
17Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Business Rules
Identifying Fraud often been done using deterministic rules
Look for transactions near a threshold or at the end of the day
Primarily data queries on your feature vector
Olympics example – Anyone applying for more than £10,000 tickets
![Page 18: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/18.jpg)
18Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Assurance Scoring
![Page 19: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/19.jpg)
19Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Anomaly Detection
Use the training data to create a baseline of applications by postcode (say)
If a particular postcode has a larger than expected number of applications then those cases pushed into high-risk bucket
![Page 20: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/20.jpg)
20Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Assurance Scoring
![Page 21: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/21.jpg)
21Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Graph Links - Matching
Key part of assurance scoring – bringing data together from disparate sources
Probability of Match: 80%
Attribute Data Source 1 Data Source 2
Name Matt Thomson Matthew Thosmon
Phone Number 07123 456 789 07123 456 798
Favourite Sport Football Cricket
![Page 22: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/22.jpg)
22Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Assurance Scoring
![Page 23: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/23.jpg)
23Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
Further Details
Come and find [email protected] / @MattGThomsonAssurance Scoring brochure: http://ow.ly/4nbEUIBlogs:
• Introduction: https://www.capgemini.com/node/1380596• Integrating multiple techniques: http://bit.ly/24BmszV • Machine Learning: http://bit.ly/1QTMGnq• Many more on other topics
![Page 24: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/24.jpg)
24Copyright © Capgemini 2012. All Rights Reserved
Presentation Title | Date
We’re Hiring!
Data Sciencehttps://www.uk.capgemini.com/careers/jobs/data-scientist-0
Big Data Engineerhttps://www.uk.capgemini.com/careers/jobs/big-data-engineer
![Page 25: Assurance Scoring: using machine learning and analytics to reduce risk in the public sector](https://reader035.vdocuments.mx/reader035/viewer/2022070521/58f9a8c31a28aba5278b45af/html5/thumbnails/25.jpg)
The information contained in this presentation is proprietary.© 2012 Capgemini. All rights reserved.
www.capgemini.com
About CapgeminiWith more than 120,000 people in 40 countries, Capgemini is one of the world's foremost providers of consulting, technology and outsourcing services. The Group reported 2011 global revenues of EUR 9.7 billion.Together with its clients, Capgemini creates and delivers business and technology solutions that fit their needs and drive the results they want. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business ExperienceTM, and draws on Rightshore ®, its worldwide delivery model.
Rightshore® is a trademark belonging to Capgemini