redeeming relevance for subject search in citation indexes shannon bradshaw the university of iowa...
TRANSCRIPT
![Page 1: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/1.jpg)
Redeeming Relevance for Subject Search in
Citation Indexes
Shannon Bradshaw
The University of Iowa
![Page 2: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/2.jpg)
Citation Indexes
Valuable tools for research Examples: SCI, CiteSeer, arXiv, CiteBase Permit traversal of citation networks Identify significant contributions Subject search is often the entry point
![Page 3: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/3.jpg)
Subject search
Query similarity Citation frequency
![Page 4: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/4.jpg)
Citation frequency
PageRank Example: 2 papers similar in terms of relevance published at roughly the same time Paper A cited only by its author Paper B cited 10 times by other authors Paper B likely to have greater priority for
reading
![Page 5: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/5.jpg)
Problem
Boolean retrieval metrics Many top documents are not relevant Effective for Web-searches Any one of several popular pages will do Not so for users of citation indexes
![Page 6: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/6.jpg)
Reference Directed Indexing (RDI)
Objective: To combine strong measures of both relevance and significance in a single metric
Intuition: The opinions of authors who cite a document effectively distinguish both what a document is about and how important a contribution it makes
Similar to the use of anchor text to index Web documents
![Page 7: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/7.jpg)
Example Paper by Ron Azuma
and Gary Bishop On tracking the heads
of users in augmented reality systems
Head tracking is necessary in order to generate the correct perspective view
![Page 8: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/8.jpg)
A single reference to Azuma
Azuma et al. [2] developed a 6DOF tracking system using linear accelerometers and rate gyroscopes to improve the dynamic registration of an optical beacon ceiling tracker.
![Page 9: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/9.jpg)
Summarizes Azuma paper as…
A six degrees of freedom tracking system With additional details:
Improves dynamic registration Optical beacon ceiling tracker Linear accelerometers Rate gyroscopes
![Page 10: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/10.jpg)
Leveraging multiple citations
For any document cited more than once… We can compare the words of all authors Terms used by many referrers make good
index terms for a document
![Page 11: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/11.jpg)
Repeated use of “tracking” and “augmented reality”
Whereas several augmented reality environments are known (cf. State et al. 1] Azuma and Bishop [3])
… e.g. landmark tracking for determining head pose in augmented reality [2, 3, 4, 5]
Azuma and Holloway analyze sources of registration and tracking errors in AR systems [2, 11, 12].
Azuma et al. [2] developed a 6DOF tracking system using linear accelerometers
![Page 12: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/12.jpg)
A voting technique
RDI treats each citing document as a voter The presence of a query term in referential
text is a vote of “yes” The absence of that term, a “no” The documents with the most votes for the
query terms rank highest
![Page 13: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/13.jpg)
Related Work
McBryan – World Wide Web Worm Brin & Page – Google Chakrabarti et. al - CLEVER Mendelzon et. al - TOPIC Bharat et. al – Hilltop Craswell et. al – Effective Site Finding
![Page 14: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/14.jpg)
Contributions
Application to scientific literature “Anchor text” for unrestricted subject search “Anchor text” for combining measures of
relevance and significance
![Page 15: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/15.jpg)
Rosetta
Experimental system in which we implemented RDI
Term weighting metric:
Ranking metric:
i
idid
N
nw
log1
q
iddd wns1
![Page 16: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/16.jpg)
![Page 17: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/17.jpg)
Experiments
10,000 research papers Gathered from CiteSeer Each document cited at least once Evaluated
Retrieval precision Impact of search results
![Page 18: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/18.jpg)
Comparison system
We compared Rosetta to a traditional content-based retrieval system
Comparison system uses TFIDF for term weighting:
And the Cosine ranking metric:
)log(log 22 kikik dfNtfw
t
k
t
k
jkik
t
k
jkik
ji
QTERMTERM
QTERMTERMQueryDocCOSINE
1 1
22
1,
)()(
)()(
![Page 19: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/19.jpg)
Indexing
Indexed collection in both Rosetta and the TFIDF/Cosine system
Rosetta indexed documents based on references to them
The TFIDF/Cosine system indexed documents based on words used within them
Required that each document was cited at least once to ensure that both systems indexed the same set of documents
![Page 20: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/20.jpg)
As referential text, Rosetta used CiteSeer’s “contexts of citation”
![Page 21: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/21.jpg)
As referential text, Rosetta used CiteSeer’s “contexts of citation”
![Page 22: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/22.jpg)
Queries
32 queries in our test set Queries were key terms extracted from
“Keywords” sections of documents Queries extracted from sample of 24
documents Document from which key term was extracted
established the topic of interest
![Page 23: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/23.jpg)
Queries
![Page 24: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/24.jpg)
Relevance assessments
The topic of interest for a query was the idea identified by the corresponding key term
Relevant documents directly addressed this same topic
Example: Query: “force feedback” Relevant: Work on providing a sense of touch in
VR applications or other computer simulations
![Page 25: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/25.jpg)
Retrieval interface
Meta-interface Queried both systems Used top 10 search results from each system Integrated all 20 search results Presented them in random order No way to determine the source of a retrieved
document
![Page 26: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/26.jpg)
Experimental summary
32 queries drawn from document key terms Document identified the topic of interest Relevant documents addressed the same
topic Used a meta-search interface Evaluated top 10 from both systems Origin of search results hidden
![Page 27: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/27.jpg)
Precision at top 10
On average RDI provided a 16.6% improvement over TFIDF/Cosine
1 or 2 more relevant documents in the top 10 Result is significant
t-test of the mean paired difference Test statistic = 3.227 Significant at a confidence level of 99.5%
![Page 28: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/28.jpg)
Precision at top 10 (cont’d)
00.10.20.30.40.50.60.70.80.9
1
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Query
Pre
cisi
on
at
top
10
RDI TFIDF/Cosine
![Page 29: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/29.jpg)
Many retrieval errors avoided
Example: software architecture diagrams Most papers about software architecture
frequently use the term “diagrams” Few are about tools for diagramming TFIDF/Cosine system -- 0/10 relevant Rosetta -- 4/10 relevant (3 in top 5) Rosetta made the correct distinction more
often
![Page 30: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/30.jpg)
Rosetta Shortcomings
Retrieval metric sorts search results by number of query terms matched
Some authors reuse portions of text in which other documents are cited
![Page 31: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/31.jpg)
Impact of search results
A look at the number of citations to documents retrieved for each query
Compared RDI to a baseline provided by the TFIDF/Cosine system
TFIDF/Cosine includes no measure of impact Seeking only a measure of the relative impact
of documents retrieved by RDI on a given topic
![Page 32: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/32.jpg)
Experiment
For each query… Calculated the average citations/year for
each document Average publication year for Rosetta – 1994 TFIDF/Cosine – 1995 Found the median number of citations/year
for each set of search results Found the difference between the median for
Rosetta and the median for TFIDF/Cosine
![Page 33: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/33.jpg)
Difference in impact
On average the median citations/year… 8.9 for Rosetta 1.5 for the baseline
![Page 34: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/34.jpg)
Difference in impact (cont’d)
0
5
10
15
20
25
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Search results for each query
Med
ian
cita
tions
per
yea
r
RDI TFIDF/Cosine
![Page 35: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/35.jpg)
Summary of Experiments
Small study – results are tentative Surpassed retrieval precision of a widely
used relevance-based approach Consistently retrieved documents that have
had a significant impact
![Page 36: Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e905503460f94b95649/html5/thumbnails/36.jpg)
Future Work
Retrieval metric that eliminates Boolean component
Large scale implementation with CiteSeer data
Studies with more sophisticated relevance-based retrieval systems
Comparison with popularity-based retrieval techniques