information retrieval – and projects we have done
DESCRIPTION
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari (08005036) Harshit Mittal (08005032) Rohit Kumar Saraf (08005040) Vinay Surana (08005031). Guided by Prof. Pushpak Bhattacharyya. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/1.jpg)
Information Retrieval –Information Retrieval –and projects we have done.and projects we have done.
Group Members:Aditya Tiwari (08005036)Harshit Mittal (08005032)
Rohit Kumar Saraf (08005040)
Vinay Surana (08005031)Guided by Prof. Pushpak Bhattacharyya
![Page 2: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/2.jpg)
MotivationMotivationWeb, documents and encyclopedia all
have tremendous amount of data and information in them. The information thus available serves only the intent of the creator or collector of data.
However, there can be other uses of that data/information as well. The need is to mine the right information from the data and use it appropriately.
![Page 3: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/3.jpg)
Information RetrievalInformation Retrieval
![Page 4: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/4.jpg)
ApplicationsApplicationsWeb search – Google, YahooQuerying/QA system like Watson
(developed by IBM).Spam filteringAutomatic SummarizationCross-lingual retrieval
en.wikipedia.org/wiki/Information_retrieval_applications
![Page 5: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/5.jpg)
Information RetrievalInformation RetrievalIR is the study of concerned with
searching for documents, and for metadata about documents, as well as that of searching relational databases and the WWW.
The data objects that are collected can be images, documents, videos, mind maps, music
en.wikipedia.org/wiki/Information_Retrieval
![Page 6: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/6.jpg)
Wiki Mind MappingWiki Mind Mapping
Harshit Mittal (IIT-B)[email protected]
Aditya Tiwari (IIT-B)[email protected]
Akhil Bhiwal (VIT University)[email protected]
6
![Page 7: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/7.jpg)
Project IdeaProject Idea
Represent the textual information in graphical form which is easier to understand and more intuitive to read. The visual representation should be able to summarize the text.
7
![Page 8: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/8.jpg)
Research GoalResearch Goal
Use of phrases to represent semantic information.
Hierarchical representation of information of a given text
8
![Page 9: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/9.jpg)
Mind mapsMind mapsA mind map is a diagram used to
represent words, ideas, tasks, or other items linked to and arranged around a central key word or idea.
Example Mind map in the next slide.
9http://en.wikipedia.org/wiki/Mind_maps
![Page 10: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/10.jpg)
Mind mapMind map
10http://www.spicynodes.org/blog/2010/05/21/stuff-we-like-climate-change-mind-maps/
![Page 11: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/11.jpg)
What’s the difficult part?What’s the difficult part?
We can’t represent information from any article in mind-map as it is. That would make it incoherent and clumsy.
Phrase extraction
General rules of grammar don’t apply here.
11
![Page 12: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/12.jpg)
Possible SolutionPossible Solution
Develop new linguistic rules for representation of text in visual form.
Use existing summarization tools to generate summary and try to represent that in mind-map.
12
![Page 13: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/13.jpg)
How we did it.How we did it.Pulling out the article section wise
from the Wikipedia page.
Parsing each section sentence wise using the Stanford parser.
Extracting “relevant” phrases using Tregex (another Stanford tool).
Putting these phrases into a mind map, section wise.
13http://nlp.stanford.edu/software/tregex.shtml
![Page 14: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/14.jpg)
Extraction of relevant Extraction of relevant informationinformationIdentifying subtrees from the parse
tree of a sentence that are important.
This was done using a few heuristics like: ◦ Presence of a superlative adjective in a noun
phrase
14http://nlp.stanford.edu/software/tregex.shtml
![Page 15: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/15.jpg)
Extraction of relevant Extraction of relevant informationinformationPresence of a cardinal number in
a noun phrase
15http://nlp.stanford.edu/software/tregex.shtml
![Page 16: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/16.jpg)
Extraction of relevant Extraction of relevant informationinformation
Matching of a particular verb to the bag of verbs that were considered relevant for a particular article. For example : for the history section, verbs like find , discover, settle, decline were considered “more useful”, as compared to words like derive, deduce etc. which were considered useful for some other section.
16
![Page 17: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/17.jpg)
Ex : The name India is derived from Indus.
17
Extraction of relevant Extraction of relevant informationinformation
http://nlp.stanford.edu/software/tregex.shtml
![Page 18: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/18.jpg)
18
Code Generated Mind MapCode Generated Mind Map
![Page 19: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/19.jpg)
EvaluationEvaluation
19http://en.wikipedia.org/wiki/Precision_and_recall
![Page 20: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/20.jpg)
EvaluationEvaluationSurvey based:
Asking a person to generate 10 questions from given article.
Asking another person to answer those question with the help of mind-map.
Repeating the same exercise in reverse manner for another article.
20
![Page 21: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/21.jpg)
ObservationsObservationsPros:
◦Extraction of right information with high accuracy.
◦Concept of phrase extraction works well.
◦High precision value were obtained (between 0.5-0.75).
21
![Page 22: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/22.jpg)
ObservationsObservationsCons
◦Information presented in mindmap of low depth is clumsy.
◦Low recall value (0.2 – 0.4)
◦Linking of node phrases with their apt description.
◦Heuristics defining “important phrases” need to be refined.
22
![Page 23: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/23.jpg)
LimitationsLimitationsBag of words and Tregex
expressions is hand-coded instead of machine learned.
Garbage phrases are being generated.
Level of hierarchy is limited to 3.23
![Page 24: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/24.jpg)
Future workFuture workUsing machine learning to determine
the important keywords for a given sentence.
We want to explore the possibility of finding patterns in subtree expressions using machine learned approach.
Refinement of generated phrases.24
![Page 25: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/25.jpg)
ReferencesReferences
http://en.wikipedia.org/wiki/Mind_mapshttp://en.wikipedia.org/wiki/
Precision_and_recallTool : Stanford Parser and Stanford
Tregex Match http://nlp.stanford.edu/software/tregex.shtml
25
![Page 26: Information Retrieval – and projects we have done](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813d17550346895da6d496/html5/thumbnails/26.jpg)
Vision Based Attribute Vision Based Attribute Segmentation from lists in Segmentation from lists in Web PagesWeb Pages
26
-by Rohit Kumar Saraf