final project presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · final...
Post on 30-Sep-2020
6 Views
Preview:
TRANSCRIPT
Final Project Presentation
Knowledge Graph Based Keyword Update
Ashwin Kumar, IIT DelhiMedia.net, Directi
January 20, 20141 Knowledge Graph Based Keyword Update
Mentor: Jigar Patel
Problem Statement
Given a keyword (in its best form), identify whether it is about a product whose newer version is available is the market. If yes, modify the keyword appropriately.
Examples.
Buy new iPhone 3G => Buy new iPhone 5
Samsung Galaxy S III Reviews => Samsung Galaxy S4 Reviews
Apple iPad 2 16GB => Apple iPad Mini 16GB
January 20, 20142 Knowledge Graph Based Keyword Update
Approach Followed
Created a graph containing all entities.
Entities have attributes associated with them.
There exists relationships across entities.
Parent-Child
Successor-Predecessor
January 20, 20143 Knowledge Graph Based Keyword Update
Knowledge Graph
January 20, 20144 Knowledge Graph Based Keyword Update
root
smartphones automobiles . . . . .
apple samsung
iphone 3GS iphone 4 iphone 4S iphone 5
. . . . .
. . .
Knowledge Graph
January 20, 20145 Knowledge Graph Based Keyword Update
iphone 5
Name iphone 5Brand apple inc.DeveloperManufacturer foxconnType smartphoneRelease Date 2012End Date presentParent appleSuccessor -Predecessor iphone 4SChildren nullWebsite www.apple.com/iphoneExternal Links
Step1: Sources of Data
Looked at three different sources of data.
Wikipedia
Has roughly 7.5 million entities.
Dbpedia
Contains dataset extracted from wikipedia dumps.
Last updated on May, 2012.
So, it is of no use.
Freebase
Stores wikipedia entities in RDF format.
Best data source available turned out to be Wikipedia.
January 20, 20146 Knowledge Graph Based Keyword Update
Step1: Wikipedia Date Extraction
Downloaded the latest wikipedia dump available. (Jun 14, 2013 sized 42GB)
Wrote my own parser to extract relevant information from each wikipedia page.
Created four tables.
Wikipedia Categories
Wikipedia Infobox
Wikipedia Redirection
Wikipedia External Links
Table creation took roughly 25 hours.
January 20, 20147 Knowledge Graph Based Keyword Update
Step1: Data Collection
January 20, 20148 Knowledge Graph Based Keyword Update
Step2: Inserting Entities
Targeted approach for different classes of products.
Status as of now.
Smartphones
Automobiles
Ipods/Ipads
Cameras
January 20, 20149 Knowledge Graph Based Keyword Update
Step2: Smartphones
Entity Identification.
Pages in categories Nokia mobile phones, Samsung mobile phones, Sony Ericsson mobile phones etc.
Pages in categories Smartphones, Touchscreen mobile phones, Multi-touch mobile phones etc.
Entity Classification.
Based on “manufacturer” / “developer” / “brand”.
January 20, 201410 Knowledge Graph Based Keyword Update
Step2: Smartphones
How to get release date?
Infobox
Available
Releasedate
Released
Production
Model Years
January 20, 201411 Knowledge Graph Based Keyword Update
Step2: Smartphones
How to get release date?
First paragraph of wikipedia article Apple held an event to formally introduce the phone on
September 12, 2012.
The beTouch E110 was released on February 15, 2010.
It is the fifth generation of the iPhone, succeeding the iPhone 4, and was announced on October 4, 2011.
Categories
Ford Freestar.
Only applicable in case of automobiles.
January 20, 201412 Knowledge Graph Based Keyword Update
Step2: Smartphones
Algorithm of Keyword Replacement
Generate Ngrams of the given keyword.
For each Ngram, check whether it matches with any entity present in the graph and generate a list of all matching Ngrams.
Merge shorter Ngrams to larger Ngrams to get filtered list.
If filtered list has exactly one Ngram, then only the keyword is subject to replacement.
January 20, 201413 Knowledge Graph Based Keyword Update
Step2: Smartphones
Example1.
Keyword: buy new iphone 3G now
Two Ngram matches: iphone, iphone 3G
Filtered match: iphone 3G
Go to parent of iphone 3G => apple
Get the entity with latest release date among the children of apple => iphone 5
Replace iphone 3G (2008) with iphone 5 (2012).
January 20, 201414 Knowledge Graph Based Keyword Update
Step2: Smartphones
Example2.
Keyword: iphone 3G vs samsung galaxy S
Four Ngram matches:
iphone
iphone 3G
samsung
samsung galaxy S
Filtered matches:
iphone 3G
samsung galaxy S
No replacement.
January 20, 201415 Knowledge Graph Based Keyword Update
Step2: Smartphones
Issues.
No release date available.
Infobox not present
If present, no “production” / “released” fields
No date mentioned in first para.
Several smartphones do not have a wikipedia page.
Nokia 3585i, BlackBerry 7730
Entities have multiple names.
buy new apple iphone four
Fortunately, wikipedia redirection helps in this case.
iphone four => iphone 4
January 20, 201416 Knowledge Graph Based Keyword Update
Step2: Smartphones
Results.
Total Entities Inserted: 900
Test Keywords: 5200
Keywords Updated: 3200
General Keywords (no replacement needed): ~1000
Nokia Connectivity Adapter Cable
Keywords that could not be updated: ~1000
..\Downloads\Data_wikipedia\enwiki-latest-pages-articles.xml\output8_smartphones.xls
January 20, 201417 Knowledge Graph Based Keyword Update
Step3: Automobiles
Entity Identification.
Categories: Ford vehicles, BMW vehicles, Porsche vehicles etc.
Categories: Hatchbacks, SUVs, Sedans etc.
Entity Classification.
Based on “wikipedia category”.
January 20, 201418 Knowledge Graph Based Keyword Update
Step3: Automobiles
An automobile cannot be replaced with another automobile of the same company arbitrarily.
Only a sedan car can replace a sedan car.
But “type” information is not available in organised form.
Decided to perform only year replacement.
Example.
2004 Chevrolet Silverado => 2013 Chevrolet Silverado
2003 Suzuki Aerio => 2007 Suzuki Aerio
January 20, 201419 Knowledge Graph Based Keyword Update
Step3: Automobiles
Tried to be on a safer side.
Replaced keyword only on a complete match.
Generated a list of stopwords for this.
compare, discount, engine etc.
Example. 2005 Mustang GT Convertible
January 20, 201420 Knowledge Graph Based Keyword Update
Year Entity Stopword
Step3: Automobiles
Results.
Total Entities Inserted: 4100
Keywords: ~30000
Keywords Updated: 22000
Keywords that could not be updated: ~8000
..\Downloads\Data_wikipedia\enwiki-latest-pages-articles.xml\output12_automobiles.xls
January 20, 201421 Knowledge Graph Based Keyword Update
Step3: Ipods
Found that there is no need to replace any ipod related keywords as almost all ipod models are selling to this date.
Ipod Classic
Ipod Touch
Ipod Shuffle
Ipod Nano
Discontinued Models
Ipod Mini
Ipod Photo
There are only a few keywords having these.
January 20, 201422 Knowledge Graph Based Keyword Update
Step4: Ipads and Tablets
Followed the same approach as of smartphones.
Results.
Total Entites Inserted: 82
Keywords: 4352
Keywords Replaced: almost all
..\Downloads\Data_wikipedia\enwiki-latest-pages-articles.xml\output7_ipad_ipod.xls
January 20, 201423 Knowledge Graph Based Keyword Update
Step5: Future Work
Extend this to cover all kind of electronic products.
Fill up missing entities.
Add month to “release date” attribute.
January 20, 201424 Knowledge Graph Based Keyword Update
THANK YOU
Any Question?
January 20, 201425 Knowledge Graph Based Keyword Update
top related