learning semantic context-sensitive term associations for information retrieval
DESCRIPTION
Learning Semantic Context-sensitive Term Associations for Information Retrieval. Tamsin Maxwell School of Informatics, University of Edinburgh Dawei Song School of Computing, The Robert Gordon University. Outline. Motivation Context-sensitive Information Inference and Semantics - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/1.jpg)
Learning Semantic Context-sensitive Term Associations for Information Retrieval
Tamsin MaxwellSchool of Informatics, University of Edinburgh
Dawei SongSchool of Computing, The Robert Gordon University
![Page 2: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/2.jpg)
Outline
Motivation Context-sensitive Information Inference and
Semantics Event Extraction Algorithm Application in Information Retrieval
![Page 3: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/3.jpg)
Motivation
T1 = “ President Ronald Reagan ”
US former president, administration, budget, tax, etc.US former president, administration, budget, tax, etc.
T2 = “ President Reagan and Iran-Contra affair ”
Iran arms sales scandalIran arms sales scandal
“Reagan” in different contexts
T3 = “ Reagan and Nakasone ”
Japan trade warJapan trade war
![Page 4: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/4.jpg)
Motivation
T2 = “President Reagan and Iran-Contra affair”
Iran arms sales scandalIran arms sales scandalInformation Inference
“Reagan” in context of “Iran contra” carries/implies the information of “arms sales scandal”
![Page 5: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/5.jpg)
Context-sensitive Information Inference
Automatic derivation of implicit term associations from text Multi-dimensional representation of information Concept combination Information flow computation
![Page 6: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/6.jpg)
Kemp oppose president reagan stock tax urges
kemp 3 5 4 2 1 6
oppose 6 5
president 5 6 4 4 2
reagan 6 5 4
stock 6
tax
urges 4 6 5 3 2
Multi-dimensional Representation of Information
Hyperspace Analogue to Language (HAL)
![Page 7: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/7.jpg)
Reagan = < administration: 0.46, bill: 0.07, budget: 0.08, congress: 0.07, economic: 0.05, house: 0.09, officials: 0.05, president: 0.80, reagan: 0.09, senate: 0.05, tax: 0.06, trade: 0.09, veto: 0.08, white: 0.06, …>
Multi-dimensional Representation of Information
Collection: Reuters-21589
![Page 8: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/8.jpg)
...presence (on) German soil. (The) Germans, given (as they are to) romanticism, pacifism...
4 45 6 5
6
weight window size: 6
weight = window_size – distance + 1
“…presence on German soil. The Germans, given as they are to romanticism, pacifism and self-absorption, aren't sure whether they will allow American nuclear weapons to remain in Germany much longer.” --WSJ 1990
Handling Complex Sentences
soil: 6, given: 6, German: 5, romanticism: 5, presence: 4, pacifism: 4
![Page 9: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/9.jpg)
HAL vs Semantic HAL
Semantic HAL
allow: 6, weapons: 6, want: 6, missiles: 6, seem: 6, believe: 6, American: 5, nuclear: 4
allow Germans weapons American nuclear want not Germans missiles seem Germans believe Germans
“The Germans, given as they are to romanticism, pacifism and self-absorption, aren't sure whether they will allow American nuclear weapons to remain in Germany much longer.”
![Page 10: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/10.jpg)
Combining Vectors in HAL Space
A more general and flexible way of deriving the meaning from any arbitrary composition of related terms, not being limited to syntactically valid phrases.
Information Flow
![Page 11: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/11.jpg)
Combining Vectors in HAL Space
Concepts ordered by dominance values (based on IDF) Scaling the dimensions in the dominant concept higher Increase the weights of intersecting dimensions Vector addition Normalize the composition vector and set a threshold to cut
off lowly weighted dimensions For more than two concepts, this can be done recursively
![Page 12: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/12.jpg)
Reagan = < administration: 0.46, bill: 0.07, budget: 0.08, congress: 0.07, economic: 0.05, house: 0.09, officials: 0.05, president: 0.80, reagan: 0.09, senate: 0.05, tax: 0.06, trade: 0.09, veto: 0.08, white: 0.06, …, >
Iran = < arms: 0.71, attack: 0.18, gulf: 0.21, iran: 0.33, iraq: 0.31, missiles: 0.11, offensive: 0.13, oil: 0.18, reagan: 0.10, sales: 0.20, scandal: 0.25, war: 0.20, … >
Reagan Iran= < administration: 0.11, affair: 0.06, arms: 0.72, attack: 0.08, contra: 0.14, deal: 0.08, diversion: 0.07, gulf: 0.11, house: 0.10, initiative: 0.06, iran: 0.22, november: 0.06, policy: 0.07, president: 0.26, profits: 0.08, reagan: 0.23, sales: 0.15, scandal: 0.31, secret: 0.06, senate: 0.06, war: 0.12 >
Combining Vectors in HAL Space
![Page 13: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/13.jpg)
Combining Vectors in HAL Spacewith Semantics
Concepts can be ordered by semantic dominance (based on IDF)
weapons American nuclear Use modification dictionary in event parser Proceed as for normal HAL space
Pred=allow Arg0=they modArg0a=weapons modArg0b=American modArg0c=nuclear
dominates dominates
![Page 14: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/14.jpg)
HAL-based “information flow”
)degree( iff ,,1 jin ccjii
scandal iran reagan,
Barwise & Seligman (1997)
Information described by tokens i1…,in carries information described by j
..with respect to a given collection
iff concepts are included
![Page 15: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/15.jpg)
Event Extraction Algorithm
Preprocessing Combined syntactic-semantic parsing
Semantic role labeling Dependency parsing
Trace the dependency tree from predicates and arguments to identify event structure
Event or modifier pruning
![Page 16: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/16.jpg)
Semantic Role Labeling
![Page 17: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/17.jpg)
SRL for Event Representation
![Page 18: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/18.jpg)
Not all predicates indicate events Events are interpreted using dependencies
Semantic-Syntactic Parsing
![Page 19: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/19.jpg)
Event Extraction
replied defendant permit not replied defendant enjoy lands
The defendant replied that no City permit was necessary as defendant lands enjoy interjurisdictional immunity…
![Page 20: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/20.jpg)
Application in Information Retrieval
IR can be viewed as a reasoning process to capture the information transformation Query Expansion: QQ’
The use of information flow to derive an improved query
![Page 21: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/21.jpg)
space program |-
program:1.00 space:1.00 nasa:0.97 U.S.:0.96 agency:0.95 shuttle:0.95 national:0.95 soviet:0.95 aeronautics:0.87 satellite:0.87 scientists:0.83 flights:0.78 pentagon:0.78
Information Flow for Query Expansion
Q as initial query submitted to a search system Apply information flow computation to a number (e.g., 30) of pseudo-
relevant documents A number of top ranked information flows derived from Q and their
associated weights form an expanded query Submit the expanded query back to the retrieval system and evaluate
the average precision of the newly retrieved documents
![Page 22: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/22.jpg)
Aspect Hidden Markov Model
Qj
di di+1 di+2
Qj+1
w
P(Qj)
P(w|di,Qj)
P(di|Qj)
... ...
... ...
P(w|Q)
)|(*)|(),|(
)(*)|(*),|()|(;
dwPQwPdQwP
QPQdPdQwPQwP
jj
jjDdQQ
j
i
QqqqQkjjjj },...,,{
21
Information flow
Importance of Qj in Q
Q = {space program} {{space}, {program}, {space program}}
Huang, Q., and Song, D. (2008) A Latent Variable Model for Query Expansion Using the Hidden Markov Model. ACM 17th Conference on Information and Knowledge Management (CIKM 2008), poster, pp. 1417-1418.
![Page 23: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/23.jpg)
Evaluation
Baseline Relevance Model
InformationFlow
AHMM
AP89Topics 1-50
0.1991 0.2270(+14%)
0.2677(+34.5%)
0.2778(+39.3%)
AP88-89Topics 101-150
0.2338 0.3069(+31.3%)
0.3193(+36.6%)
0.3259(+39.4%)
AP88-89Topics 151-200
0.3135 0.3471(+10.7%)
0.3965(+26.5%)
0.4081(+30.2%)
![Page 24: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/24.jpg)
Food for Thought
Can incorporation of semantic word dependencies consistently enhance IR precision/performance?
Can they be incorporated into existing IR systems?
![Page 25: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/25.jpg)
References
Dawei Song and Peter Bruza (2001), Discovering Information Flow Using a High Dimensional Conceptual Space. SIGIR 2001: 327-333.
Dawei Song and Peter Bruza (2003), Towards Context Sensitive Information Inference. JASIST 54(4): 321-334.
K. Tamsin Maxwell, Jon Oberlander and Victor Lavrenko (2008). Evaluation of Semantic Events for Legal Case Retrieval. ESAIR 2008: 39-41.
Huang and Dawei Song (2008), A Latent Variable Model for Query Expansion Using the Hidden Markov Model. ACM 17th Conference on Information and Knowledge Management (CIKM 2008), poster, pp. 1417-1418.
![Page 26: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/26.jpg)
Questions?
Thank you!
![Page 27: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/27.jpg)
Baseline Relevance Model
AHMM
AP88-90 (730MB)Topics 151-200
0.2077 0.2639(+27.1%)
0.2806(+35.1%)
ROBUST (1.9GB)Topics 601-700
0.2920 0.3143(+7.1%)
0.3660(+25.3%)
WT10G (10.9GB)Topics 501-550
0.2032 0.2134(+5%)
0.2370(+16.6%)
Aspect Hidden Markov Model Evaluation
![Page 28: Learning Semantic Context-sensitive Term Associations for Information Retrieval](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568150a2550346895dbea3b1/html5/thumbnails/28.jpg)
Query: “What is the liability of the United States under the Federal Tort Claims Act for injuries sustained by employees of an independent contractor working under contract with an agency of the United States government?”
Document: “The DEFENDANT replied that no City permit was necessary because DEFENDANT lands enjoy interjurisdictional immunity as public property within the meaning of STATUTE of the Constitution Act , 1867 , or because the management of those lands is vital to the DEFENDANT ‘s federal under taking pursuant to the federal STATUTE jurisdiction over navigation and shipping .”
Sample Legal Query