enhancing search with predictive analytics 1_1040_fast.pdfmusic file sharing networks, understanding...
TRANSCRIPT
![Page 1: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/1.jpg)
Enhancing Search with Predictive Analytics
Text Analytics World – Boston 2013
Andrew Fast Chief Scientist
Elder Research, Inc. [email protected]
![Page 2: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/2.jpg)
• “It is difficult to describe, but you know it when you see it.” – Lord Justice Stuart Smith,
Cadogan Estates Limited v. Morris (1998)
• Likewise, most textual concepts cannot be easily defined with a single keyword query
The Elephant Test
Quote from: h,p://www.bailii.org/ew/cases/EWCA/Civ/1998/1671.html
![Page 3: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/3.jpg)
• Search and Predictive modeling each provides a different trade-off between power and generality.
Combining Search and Predictive Models
Keyword queries can answer any query, but with limited depth for complex queries.
Document Classification
Generality
Pow
er
Keyword Search
A predicIve model can answer one query well, especially a complex query
![Page 4: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/4.jpg)
Our Approach • A “search ensemble” ranking function that
“boosts” keyword relevance based on a predictive model
High Keyword Relevance, High Model Ranking
Model Ranking
Keyw
ord Re
levance
High Keyword Relevance, Low Model Ranking
Low Keyword Relevance, Low Model Ranking
Low Keyword Relevance, High Model Ranking
![Page 5: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/5.jpg)
The Problem • The Goal: Explore NEW interesting ideas using
OLD social entrepreneurship contest entries
• The Data: A collection of contest entries from 19 different contests sponsored by our client – Contests cover a range of topics such as health,
education, literacy, finance, technology, and geo-tourism.
• The Challenge: Emphasize high-quality entries in the results as entry quality varies widely
![Page 6: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/6.jpg)
Combining Search and Predictive Models • Keyword ranking does not help you find high-
quality entries … • … but Model Ranking is not topic centric.
• Complimentary strengths – Search for exploration and discovery – Predictive Models for trends and correlations
![Page 7: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/7.jpg)
THE MODEL
![Page 8: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/8.jpg)
Target Variable • Identify characteristics of past entries that are
correlated with that proposal being ‘Shortlisted’ by the Contest Judges
• Rankings: 1 – Likely Finalist 2 – Top Tier 3 – Honorable Mention 4 – Passed Screening 5 – No
• Note: Not every contest used all 5 rankings
‘Shortlisted’
![Page 9: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/9.jpg)
The Inputs • Learn a logistic regression model to fit the feature
weights
• Inputs:
Structured Data
Taxonomy Textual Features
• Budget Size • Maturity • Impact
• Auto-‐tagging taxonomy terms
• Length • Lexical
Diversity
![Page 10: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/10.jpg)
• Joint work with Beth Maser and Richard Iams at PPC
• Non-traditional, general approach – Broad, flexible taxonomy
• Focus on the range of interests of the organization
The Taxonomy
![Page 11: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/11.jpg)
Using the Taxonomy • Each contest emphasizes different branches of
the taxonomy – Taxonomy features need to be contest specific
• Step 1: Use the “Wisdom of Crowds” to find the center of each contest
• Step 2: Rate each entry based on the distance from the center
![Page 12: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/12.jpg)
Evaluation: Area Under the ROC • Evaluate the overall ranking provided by the model.
– Higher means more ‘Shortlisted’ entries at the top of the list
![Page 13: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/13.jpg)
Evaluation: Lift • Evaluates the improvement using the model at a
fixed amount of work – How much more efficient are the judges using our
model alone?
• Every contest showed positive lift. – Maximum lift of 3.3 – Average lift of 1.67
![Page 14: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/14.jpg)
THE SEARCH APPROACH
![Page 15: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/15.jpg)
Our Approach • A new search ranking function that “boosts”
keyword relevance for probable shortlisted entries
High Keyword Relevance, High Model Ranking
Model Ranking
Keyw
ord Re
levance
High Keyword Relevance, Low Model Ranking
Low Keyword Relevance, Low Model Ranking
Low Keyword Relevance, High Model Ranking
![Page 16: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/16.jpg)
The Prototype Platform
ERI Text Mining
Model (PredicIve + Taxonomy)
Search Index
Custom Search Interface
Data
![Page 17: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/17.jpg)
Faceted Search with Solr
Apache Solr is an open-source faceted search engine (http://lucene.apache.org/solr)
![Page 18: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/18.jpg)
• Text mining can be viewed from many different perspectives
• No single view provides a complete solution
• Must consider the
entire “beast” to get the best solution
“Blind Men and the Elephant”
![Page 19: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/19.jpg)
19
Contact Information
Andrew Fast, Ph.D. Chief Scientist
(434) 973-7673 www.datamininglab.com
![Page 20: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/20.jpg)
Practical Text Mining • Winner of the 2012
PROSE award for Computing and Information Science
• Written for a technical audience seeking more text experience
• Includes trial versions of major software tools
![Page 21: Enhancing Search with Predictive Analytics 1_1040_Fast.pdfmusic file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head](https://reader033.vdocuments.mx/reader033/viewer/2022042620/5f451c84f9135f3af707dbbb/html5/thumbnails/21.jpg)
21
Andrew Fast"Chief Scientist, Elder Research, Inc.
Dr. Fast graduated Magna Cum Laude from Bethel University and earned Master’s and Ph.D. degrees in Computer Science from the University of Massachusetts Amherst. There, his research focused on causal data mining and mining complex relational data such as social networks. At ERI, Andrew leads the development of new tools and algorithms for data and text mining for applications of capabilities assessment, fraud detection, and national security. Dr. Fast has published on an array of applications including detecting securities fraud using the social network among brokers, and understanding the structure of criminal and violent groups. Other publications cover modeling peer-to-peer music file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head coaches (work featured on ESPN.com). With John Elder and other co-authors, Andrew has written a book on Practical Text Mining, that was awarded the prose Award for Computing and Information Science in 2012.
Dr. Andrew Fast leads research in Text Mining and Social Network Analysis at Elder Research, the nation’s leading data mining consultancy. ERI was founded in 1995 and has offices in Charlottesville VA and Washington DC,(www.datamininglab.com). ERI focuses on Federal, commercial, investment, and security applications of advanced analytics, including stock selection, image recognition, biometrics, process optimization, cross-selling, drug efficacy, credit scoring, risk management, and fraud detection.