special topics in computer science the art of information retrieval chapter 1: introduction...
TRANSCRIPT
![Page 1: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/1.jpg)
Special Topics in Computer ScienceSpecial Topics in Computer Science
The Art of Information RetrievalThe Art of Information Retrieval
Chapter 1: IntroductionChapter 1: Introduction
Alexander Gelbukh
www.Gelbukh.com
![Page 2: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/2.jpg)
2
MotivationMotivation
Info: representation, storage, organization, access Search Engines (IR systems) User information need
o Plain English description query
First for libraries, but now — WWW!!! Modern IR:
o modeling
o classification, categorization, filtering
o system architecture
o user interfaces, visualization, query languages
![Page 3: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/3.jpg)
3
Data vs. Information RetrievalData vs. Information Retrieval
Data Retrieval Precise description Well-structured data
Precise results Yes-or-no results
Science
Information Retrieval Vague information need Natural Language, images, ... Semantic interpretation Approximate results Relevance ranking
Art!
![Page 4: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/4.jpg)
4
Basic ConceptsBasic Concepts
User task (search)o Can formulate what they need: Retrieval (classical)o Can’t (or does not know): Browsing (new to IR)
Still not very well integrated
o Filtering (user passive, contents active) Logical view of docs
o ... (Added linguistic info)o Full texto Text operations: reduce complexity to index terms
Keywords, stopwords Stemming, noun groups. Linguistic processing!
o Categories
Slow, good
Fast, bad
![Page 5: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/5.jpg)
5
Past, Present, and FuturePast, Present, and Future
Since clay tabletso Alphabetical index (formal)o Table of Contents (by order)o Classifications (by meaning)
Librarieso Automation of classical techniques. Catalogs.o Search by fields (author, title, keywords)
Web. Digital Libraries: interactiveo Cheaper huge amount of datao Networks remote access, wider audienceo Free publishing unprepared, heterogeneous data
Artificial Intelligence and Linguistic methods
![Page 6: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/6.jpg)
6
Main concernsMain concerns
Open audienceo Help people to formulate their information need
o Improve retrieval quality. Intelligent methods
Efficiency (speed)o Development of fast techniques
Interactiono Watch user behavior to improve quality
o Privacy!
Open contento Legal issues. Copyright. Responsibility for info quality
o Intelligent methods
![Page 7: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/7.jpg)
7
Retrieval processRetrieval process
Databaseo Define the logical view: text operations, text model
Index (e.g., inverted file)
User queryo Query operations (users are not good at this!)
Retrieved docso Ranked by likelihood (relevance)
Feedback cycle
![Page 8: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/8.jpg)
![Page 9: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/9.jpg)
9
The BookThe Book
Topicso Text IR
o Interfaces
o Multimedia IR
o Applications
We will not consider:o Parallel and Distributed IR
o Multimedia IR: Models and Languages
o Multimedia IR: Indexing and Searching
![Page 10: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/10.jpg)
![Page 11: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/11.jpg)
11
Chapters: Text IRChapters: Text IR
Models and Evaluationo Modeling (basic concepts)o Retrieval Evaluation
Improvements on Retrievalo Query Languageso Query Operations o Text Languages and Properties o Text Operations
Efficiencyo Indexing and Searching
![Page 12: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/12.jpg)
12
Chapters: Interfaces, ApplicationsChapters: Interfaces, Applications
Interfaceso User Interfaces and Visualization
Applicationso Searching the Web o Libraries and Bibliographical Systemso Digital Libraries
![Page 13: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/13.jpg)
13
Book’s web pageBook’s web page
sunsite.dcc.uchile.cl/irbook/ Errata Test data Other courses, papers, and a lot more
Korean version is NOT recommended.
Read in English!
![Page 14: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/14.jpg)
14
ConferencesConferences
General conferences on text processingo ACL
o COLING
o CICLing
o DEXA (databases)
o NLDB
Confs on IRo ACM SIGIR
o TREC
o SPIRE
![Page 15: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/15.jpg)
15
ConclusionsConclusions
User Information Needo Vague
o Semantic, not formal
Document Relevanceo Order, not retrieve
Huge amount of informationo Efficiency concerns
o Tradeoffs
Art more than science
![Page 16: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh](https://reader035.vdocuments.mx/reader035/viewer/2022062618/551477a3550346b2598b460c/html5/thumbnails/16.jpg)
16
Thank you!
Till September 25