entity queries
DESCRIPTION
Entity Queries. Seminar by Pankaj Vanwari Under guidance of Dr. S. Sudarshan. Overview of Presentation. Introduction to Entity Queries Keyword search on structured data Querying over unstructured data Entity queries using ontology based extraction Entity-relationship queries - PowerPoint PPT PresentationTRANSCRIPT
Entity Queries
Seminar by Pankaj Vanwari Under guidance of Dr. S. Sudarshan
Overview of PresentationIntroduction to Entity QueriesKeyword search on structured
dataQuerying over unstructured dataEntity queries using ontology
based extractionEntity-relationship queriesConclusion and future work
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
IntroductionQuery on database using
keyword searchRestricted to retrieving
pages/documentsEntity search on World Wide WebAnnotations and semantic links
to textWikipedia, Word-Net, etc… as
sourcesEntity near queries, indexing and
rankingEntity-relationship search to find
relationships between the entities
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Keyword search over graph structured dataSimple searching and browsing of
data.User types few keywords and then
follows the hyper-links interactively.Database is modeled as graph.Uses proximity based ranking, based
on foreign key and other similar links.Useful in searching enterprise
database for information without a query language.
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
BANKS (Browsing ANd Keyword Searching)RDB tuples constitute nodes of the
graph. Each foreign key- primary key link is a
directed edge (to avoid “hubs”).Link with higher importance is given
lower weight.Query result is a rooted directed tree.Backward edge (v, u) with weight
based on the number of links to v from the nodes of same type as u.
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Formal database model of BANKSs(R(u), R(v)) denote the similarity
between two relations R(u) and R(v) of nodes u & v.
If edge(u, v) exists but (v, u) does not then weight w(u, v) = s(R(u), R(v))
If (u, v) does not exist and (v, u) does then w(u, v) = INv(u) * s(R(v), R(u))
If both exists then the weight is minimum of the above equations.
Overall relevance score is obtained from the normalized edge and node scores.
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Querying over unstructured data Worlds Wide Web supported
keyword searching but not entity search.
Entities as first class citizens as opposed to pages.
No schema information on web documents to browse as in BANKS.
Statistics from large corpus with scoring and ranking from IR can be useful.
Challenges: Indexing and Annotations.
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
CSAWScaling Entity Search to world
wide webMajor components: Catalog,
Corpus and Query Processor.Data model of CSAWIndexes used in CSAW system:
The stem and full atype indexes, Reachability index and Forward index.
Scoring in CSAW: Selector energy, Gap and Decay and Aggregation.
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Entity Search with Dual-Inversion IndexDual inversion index : Document
inverted index and Entity inverted index.
Document inverted index: Given entity type E, maps to the documents where entity of type E occurs.
Entity Inverted Index: Entity instances as output from keywords as input.
Comparison of document and entity inverted indices.
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Entity Rank (Searching directly and holistically)Integrates both local and global
information in ranking.ow(amazon customer service
#phone)Entity search needs to be
contextual, holistic, uncertainty, associative, and discriminative.
Three layer model: Access (Global), Recognition (Local) and Validation (Hypothesis Testing).
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Entity queries using ontology based extraction Knowledge representation model
such as RDFS having general-purpose ontology on top of these representations.
Two ways of extracting knowledge structures automatically from text corpora: NLP/machine learning or human annotations.
YAGO, YAGO2 and ESTER all based on second approach with difference.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
YAGO (Yet Another Great Ontology)YAGO combines Wikipedia categories
with the Word-Net ontology.Extracts facts based on fixed relations.Fact is a triple having fact identifier I. y : I (I U C U R)XRX(I U C U R)Compatable to RDF.Relations: Type, SubClassOf, Means, …Other relations: BornInYear,
PoliticianOf,…Meta relations: Describes, Context,…
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
YAGO2 (extension of YAGO)Focus on temporal and spatial
knowledge.Declarative rules stored in text files.
Temporal dimensionFacts can only hold time points; time
spans are represented by two relations. 4 entity types (people, groups, artifacts
and events) 9 relations generalized to 2 relations
(StartsExistingOn and EndsExistingOn).
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
YAGO2 continued…Spatial Dimension
Harvests geo-entities from two sources Wikipedia and GeoNames.
class yagoGeoEntity groups all geo-entities related by hasGeoCoordinates to yagoGeoCoordinates.
3 entity types (events, groups & artifacts).
2 relations generalized to placedIn.Relation occursIn holds fact and geo-
entity.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
ESTER (Efficient Search on Text, Entities and Relations)Combined full-text and ontology search
system. Input is corpus and ontology.Three components: An entity recognizer, a
query engine, and a user interface.Entity recognition adds at position 0, the
artificial word < c >:< x > for each top-level category c of which x is an instance.For a fact (x; r; y) from YAGO add following artificial words: At pos1, add < r >:< p >, and at pos p, add entity :< y >.
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
ESTER continued…Query engine produces lists of word-in-
document occurrences; each item consisting of a document-id, a word-id, a score, & a position within the document.
Two basic operations prefix search & join.Given two occurrence lists, produced by
prefix search, join operation computes a single list of all items whose word ids occur in both lists, and sorted by document id.
Proactive interface to user.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Entity relationship queries over annotated webExample query: “Find cities and
countries in Europe where cities are capitals of respective countries”.
ERQ to handle relationships among entities across several pages.
High algorithmic complexity.Scoring entities individually and
aggregating the scores.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
WikiERQ: SSQ (Shallow Semantic Queries)ERQ directly over text Example query: “Find cities and
countries in Europe where cities are capitals of respective countries”.
Position based BCM for ranking answers. Key components proximity, ordering and mutual exclusion.
Single predicate scoringMultiple predicate scoring
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
WikiBANKSExtended graph model combines
graph model of BANKS with document model.
Each Wikipedia page/document by a node in the graph.
Near query model: find C near (K)Query evaluation algorithm:
selection predicates individually as near query and then using entity lists to evaluate the relation predicates (2 approaches).Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
WikiCSAWERQ over highly scalable CSAW
system.Queries in Master-Slave configurationCategory keyword mapping.Optimizing ERQ over CSAW:
Entity-Type and Keyword Pair Postings to improve merge step.
Compound Token-AND Iterator.Scoring based on Entity, Relation and
node prestige with weights. Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Conclusion and Future WorkChallenges faced by different approaches.Adding artificial words to link other pages
by enterprise (manually or defining rules).Integration of data by standards like RDF.Domain-centric concept search to handle
scalability. Ontology based mapping of user keywords to domains for higher accuracy.
Need for annotation of relations.Complex operations for adhoc queries.
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Questions ?
Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Thank You