query understanding at linkedin [talk at facebook]
TRANSCRIPT
Query Understandingand
Search Assistance@ LinkedIn
Abhi Lad(Engineering Lead, Search Quality)
Outline
● Search at LinkedIn
● Goal of search
● Search assistance / Guided search
● Query understanding & rewriting
Search at LinkedIn
Search at LinkedIn
Universal search box
Search at LinkedInNavigational People search
Search at LinkedIn
Exploratory People search
FACETS
Search at LinkedIn
Exploratory People search
Search at LinkedIn
Job Search
Search at LinkedIn
Federated Search
JOBS
PEOPLE
PEOPLE
Goal of Search
Help users find who or what they are looking for
with minimal effort
Goal of search
Help users find who or what they are looking for
with minimal effort
1. Help users frame “good” queries
2. Understand the user’s underlying intent / information need
3. Rewrite the query to ensure good result set
4. Rank the results based on the user and the query
5. Provide good result attribution: snippets, highlighting
6. Propose next actions to refine results
Goal of search
Search Assistance
● Query Assistance: [Pre-retrieval] Help users frame their queries easily○ Autocomplete, Search suggestions in typeahead, Spellcheck, ...
● Guided Search: [Post-retrieval] Guide users through their search process○ Facet suggestions, Related searches, ...
Search Assistance
(Especially useful for exploratory queries)
Autocomplete & Search Suggestions
Query autocomplete
Search suggestions
Autocomplete & Search Suggestions
Query autocomplete => Entity detection => Search suggestions
Autocomplete & Search Suggestions
Query autocomplete => Entity detection => Search suggestions
Autocomplete system:
● Based on query logs● Index and retrieve using Lucene FST● Can complete last part of the query (even if entire query was previously unseen)
(Do not index people names)
Autocomplete & Search SuggestionsAutocomplete
Use query logs to index unigrams (tokens), bigrams, and entities (companies, titles, skills, locations)
● Compute co-occurrence statistics● Build FST for efficient “prefix => entity” retrieval
Query: [senior digital product manager sa|n francisco]
Score based on entity co-occurrence using last entity in the query (product manager):
● P(san francisco | product manager)● P(san diego | product manager)● P(sandisk | product manager)
Fall back to bigram co-occurrence:
● P(francisco | san) x P(san | manager)
Autocomplete & Search SuggestionsAutocomplete
● Personalization○ [ma]
■ machinist■ manager■ machine learning?
● Implicit spelling correction○ [macine lear] => machine learning
● Use similar entities to complete previously unseen queries○ [software engineer] ⇔ [software developer]○ Complete [hadoop software de|veloper] based on [hadoop software engineer]
Autocomplete & Search SuggestionsSearch Suggestions
● Personalization
○ [hadoop]
■ “People with hadoop skills”
■ “Jobs requiring hadoop skills”
● Suggestions with multiple entities
○ [hadoop engineer san francisco]
■ “Hadoop engineer jobs in San Francisco]
Spellcheck
● Fix obvious typos
● Help users spell names
Spellcheck
People namesCompanies
Titles
Past queries
Spellcheck
PROBLEM: User profiles as well as query logs contain many spelling errors
(Frequency alone is not helpful due to the long-tail distribution of entities)
Spellcheck
PROBLEM: User profiles as well as query logs contain many spelling errors
SOLUTION: Use query chains and click data to infer correct spelling
Spellcheck
● Better error model○ Improved metaphone (version 3)○ Platform aware: Keyboard edit distance on mobile
● Machine-learned model
● Support for partial queries○ Spellcheck-as-you-type for “Instant” search
Facet Suggestions
Facet Suggestions
Facet Suggestions
● Query awareness○ For TITLE queries, suggest seniority facet○ Don’t suggest facets for name queries○ Don’t suggest redundant/conflicting facets (location facet when query has location)
● User awareness○ User profile: Users often restrict search results to their own location, industry, seniority○ User behavior: Recruiters often restrict to particular industry, location
● Document set awareness○ Ensure minimum number of results○ Bias towards higher-quality results (people, jobs, …)
Query Understandingand
Rewriting
Query Understanding
Query Tagging
(Recognized entities: Names, titles, companies, schools, locations, skills)
Query TaggerSequential model trained on the following data:
● Emission probabilities (dictionary)○ Profiles – Names, Titles, Schools, Locations○ Standardized data – Companies, Skills
● Transition probabilities○ Query logs○ Tags for query tokens inferred based on result clicks
Query TaggerPrediction:
1. Segmentation: Maximum likelihood using unigram/bigram counts[data scientist] [linkedin] [mountain view]
2. Sequence labeling: Viterbi decoding[TITLE] [COMPANY] [LOCATION]
3. Entity linking: Dictionary[TITLE ID=435] [COMPANY ID=1337] [LOCATION ID=us:ca:mountain_view]
Query Tagging
● Query tags used for ranking model selection○ Name query => NAME MODEL○ Title query, Skill query => TITLE MODEL○ ...
● More precise matching with documents
[software engineer google new york]
is rewritten to
[TITLE:(software engineer) COMPANY:(google) GEO:(new york)]
Using query tags:
Entity-based filtering
BEFORE
AFTER
escapehatch
Query Expansion
Name synonyms Job Title synonyms
Query Expansion
● Titles○ Query reformulations
■ [programmer] => [software engineer] => CLICK■ [lawyer] => [attorney] => CLICK■ [attorney] => [legal counsel] => CLICK
● Names○ Query Reformulations○ Dictionaries
■ bob == robert■ beth == elizabeth■ ...
Name spelling variantsName Clustering
Name spelling variants
Two-step clustering:1. Coarse clustering – metaphone2. Finer clustering – edit distance, hand-written rules…
Each name is assigned to a clusterNC_SRIRAM = {sriram, sreeram, sriraam, shriram, …}
NC_SRIRAM
Name Clustering
Summary
● Search assistance and guided search are critical for ensuring search success○ Good query => good results
● High degree of structure in queries and documents (profiles, jobs, …)○ Query understanding and Document understanding are crucial○ “Things not Strings” => entity-based retrieval
● Query understanding and rewriting play an important role in result set quality○ A good initial set of documents simplifies the ranker’s job○ Good result set => accurate facet counts○ Allows for sorting options other than relevance (recency, number of connections, …)
Thank You!