natural language processing with graph databases and neo4j
Post on 21-Apr-2017
3.442 Views
Preview:
TRANSCRIPT
Natural Language Processing With Graph DatabasesDataDay TexasJanuary 2016
William Lyon@lyonwj
About
Software Developer @Neo4jwilliam.lyon@neo4j.com
@lyonwjlyonwj.com
William Lyon
Agenda
• Brief intro to graph databases / Neo4j• Representing text as a graph• NLP tasks• Mining word associations• Graph based summarization and keyword
extraction• Content recommendation
Agenda
• Brief intro to graph databases / Neo4j• Representing text as a graph• NLP tasks• Mining word associations• Graph based summarization and keyword
extraction• Content recommendation Survey of NLP
methods with graphs
Intro to Graph Databases / Neo4j
Charts
Charts Graphs
Neo4j
Graph Database
• Property graph data model• Nodes and relationships
• Native graph processing• Cypher query language
The Whiteboard Model Is the Physical Model
Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREASDELIA
TOBIAS
MICA
Property Graph Model Components
Nodes • The objects in the graph • Can have name-value properties • Can be labeled
Relationships • Relate nodes by type and
direction • Can have name-value properties
CAR
DRIVES
name: “Dan” born: May 29, 1970
twitter: “@dan”name: “Ann”
born: Dec 5, 1975
since: Jan 10, 2011
brand: “Volvo” model: “V70”
LOVES
LOVES
LIVES WITH
OWNS
PERSON PERSON
Cypher: Graph Query Language
CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
“So what does this have to do with NLP?”
“Am I in the wrong talk?”
“I thought this was going to be about text processing….”
Natural Language Processing With Graphs
Natural Language Processing With Graphs
Uncovering meaning from text using a graph data model.
Representing Text As A Graph
“Nearly all text processing starts by transforming text into vectors.”
- Matt Biddulph www.hackdiary.com
Representing text as a graph
Text Adjacency Graph
Representing text as a graph
Text Adjacency Graph
My cat eats fish on Saturday.
Convert to array of words
Iterate with counter variable i,from 0 to number of words - 2
Get or create node forwords at index i and i+1
Create :NEXT relationship
Representing A Text Corpus As A Graph
Add followship frequency
Add word counts
Query Word frequency
Query Word pair frequencies (colocation)
NLP Tasks
Mining Word Associations
Word Associations
• Paradigmatic• words that can be substituted• “Monday” <—> “Thursday”• “cat” <—> “dog”
• Syntagmatic• words that can be combined with each other• “cold”, “weather”• colocations
Computing Paradigmatic Similarity
1. Represent each word by its context2. Compute context similarity3. Words with high context similarity likely have
paradigmatic relation
Paradigmatic Similarity1. Represent each word by its context
Paradigmatic Similarity1. Represent each word by its context
Paradigmatic Similarity1. Represent each word by its context
Left1 Right1
Paradigmatic Similarity2. Compute context similarity
Paradigmatic Similarity2. Compute context similarity
Paradigmatic Similarity2. Compute context similarity
www.lyonwj.com/2015/06/16/nlp-with-neo4j/
Paradigmatic Similarity3. Find words with high context similarity
http://earthlab.uoi.gr/theste/index.php/theste/article/viewFile/55/37CEEAUS corpus
Paradigmatic Similarity
Example
http://www.lyonwj.com/2015/06/16/nlp-with-neo4j/
https://github.com/johnymontana/nlp-graph-notebooks
https://class.coursera.org/textanalytics-001
Graph Based Summarization and Keyword Extraction
image credit: https://en.wikipedia.org/wiki/PageRank
https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
https://github.com/summanlp/textrank
Keyword Extraction
SummarizationOpinion mining
• Opinion mining• Summarize major opinions• Concise and readable• Major complaints /
compliments
http://kavita-ganesan.com/opinosis
1.Graph based representation of review corpus
2.Find and score candidate summaries
3.Select top scoring candidates as summary
Opinion Mining - Example
• Best Buy API• Product reviews by SKU
Opinion Mining - Example
Opinion Mining - Example
Opinion Mining - Example
1.Graph based representation of review corpus
2.Find and score candidate summaries
3.Select top scoring candidates as summary
Opinion Mining - Example
Find highest ranked paths of 2-5 words
Opinion Mining - Demo
“Easy to read in sunlight”
“Comfortable great sound quality”
“I love this washer”
Opinion Mining - Demo
“Bought this smart TV for the price”
“Easy to use this vacuum”
Opinion Mining - Demo
• iPython notebook
https://github.com/johnymontana/nlp-graph-notebooks
Content Recommendation
Content recommendation
“Networks give structure to the conversation while content mining gives meaning.”
http://breakthroughanalysis.com/2015/10/08/ltapreriitsouda/
- Preriit Souda
Using Data Relationships for Recommendations
Content-based filtering Recommend items based on what users have liked in the past
Collaborative filtering Predict what users like based on the similarity of their behaviors, activities and preferences to others
Movie
Person
Person
RATED
SIMILARITY
rating: 7
value: .92
Using Data Relationships for Recommendations
Content-based filtering Recommend items based on what users have liked in the past
Movie
Person
Person
RATED
SIMILARITY
rating: 7
value: .92
The article graph - data model
Building the article graph• Articles users have shared• Extract keywords using newspaper3k
python library• Insert in the graph• Scrape additional articles
https://github.com/johnymontana/nlp-graph-notebooks
The article graph - example
What are the keywords of the articles I liked?
Summary
• Property graph model• Represent text as a graph• Word associations• Opinion mining• Content recommendation
Resources
Resources
• http://kavita-ganesan.com/opinosis • http://jexp.de/blog/2015/01/natural-language-
analytics-made-simple-and-visual-with-neo4j/ • https://github.com/johnymontana/nlp-graph-notebooks
Opinion Mining
• “Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions”
• - Kavita Ganesan, Cheng Xiang Zhai, Jiawei Han University of Illinois at Urbana-Champaign
• Multi-sentence compression: Finding shortest paths in word graphs
• - Proceedings of the 23rd International Conference on Computational Linguistics. COLING 10. Beijing, Cina Aug23-27, 2010. Katy Fillipova
top related