wmes3103 information retrieval week 1 and 2. what is information retrieval? information retrieval...

35
WMES3103 WMES3103 INFORMATION RETRIEVAL INFORMATION RETRIEVAL WEEK 1 AND 2 WEEK 1 AND 2

Post on 21-Dec-2015

251 views

Category:

Documents


9 download

TRANSCRIPT

Page 1: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

WMES3103WMES3103

INFORMATION RETRIEVALINFORMATION RETRIEVAL

WEEK 1 AND 2WEEK 1 AND 2

Page 2: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

WHAT IS INFORMATION WHAT IS INFORMATION RETRIEVAL?RETRIEVAL?

Information Retrieval – IRInformation Retrieval – IR InformationInformation RetrievalRetrieval

Lancaster (1968) : Lancaster (1968) :

An information retrieval system does An information retrieval system does not inform (I.e change the knowledge) of not inform (I.e change the knowledge) of the user on the subject of his inquiry. It the user on the subject of his inquiry. It merely inform on the existence (or non-merely inform on the existence (or non-existence ) and whereabouts of existence ) and whereabouts of documents relating to his requestdocuments relating to his request

Page 3: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

IR – process of getting/retrieving informationIR – process of getting/retrieving information Now : a lot of information – print and electronicNow : a lot of information – print and electronic Requirement : obtain information quickly and Requirement : obtain information quickly and

accuratelyaccurately IR – aims to provide fast , effective and efficient IR – aims to provide fast , effective and efficient

methods of representing, managing , searching, methods of representing, managing , searching, retrieving and presenting such informationretrieving and presenting such information

IR = the representation , storage, organization IR = the representation , storage, organization of and access to information items of and access to information items

Page 4: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Computer science perspectiveComputer science perspectiveDesign and build a large scale system that Design and build a large scale system that

will store, manipulate, retrieve and display will store, manipulate, retrieve and display electronic information of any kindelectronic information of any kind

Text, audio, image and graphics that are Text, audio, image and graphics that are stored in such a way that they are stored in such a way that they are available for interaction with human or available for interaction with human or machinemachine

Library and information perspectivesLibrary and information perspectivesSearch features – au, ti, su, keywordsSearch features – au, ti, su, keywordsRelevance of retrieve itemsRelevance of retrieve items

Page 5: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Examples of IRSExamples of IRS

Page 6: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Examples of IRSExamples of IRS

Page 7: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

3 challenges for IR researchers and 3 challenges for IR researchers and practitionerspractitioners

Technical challenge : what tools should IR systems Technical challenge : what tools should IR systems provide to allow effective and efficient provide to allow effective and efficient manipulation of information within such diverse manipulation of information within such diverse media as text, image, video and audio?media as text, image, video and audio?

Interaction challenge : what features should IR Interaction challenge : what features should IR systems provide in order to support a wide variety systems provide in order to support a wide variety of users in their search for relevant information.of users in their search for relevant information.

Evaluation challenge : how can we evaluate which Evaluation challenge : how can we evaluate which tools and features are effective and usable, given tools and features are effective and usable, given the increasing diversity of end-users and the increasing diversity of end-users and information seeking situations?information seeking situations?

Page 8: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

3 basic areas of research3 basic areas of research

Content analysis – describing the Content analysis – describing the contents of the documents in a form contents of the documents in a form suitable for computer processingsuitable for computer processing

Information structures – exploiting Information structures – exploiting relationships between documents to relationships between documents to improve the efficiency and improve the efficiency and effectiveness of retrieval strategieseffectiveness of retrieval strategies

Evaluation – measurement of Evaluation – measurement of effectiveness of retrievaleffectiveness of retrieval

Page 9: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Information Retrieval SystemInformation Retrieval System

Information Retrieval System = IRSInformation Retrieval System = IRSBefore :index document and retrieveBefore :index document and retrieveEg. OPAC of library – cataloguingEg. OPAC of library – cataloguingNow: modelling, document Now: modelling, document

classification and categorization, classification and categorization, system architecture, user interface, system architecture, user interface, data visualization, filtering languagesdata visualization, filtering languages

Eg. WWWEg. WWW

Page 10: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Basic Information Retrieval Basic Information Retrieval ProcessProcess

Translate into query OR keywords which summarizes the description

of user information needs

Query processed by a search engine or IRS

IRS retrieves information which is useful/relevant to the user

Question OR Full description of user information needs

Page 11: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Basic Concepts in Information Basic Concepts in Information RetrievalRetrieval

User TaskUser TaskLogical View of documentsLogical View of documents

Page 12: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

User TaskUser Task

A user has to translate his A user has to translate his information needs into query in the information needs into query in the language provided by the systemlanguage provided by the system

Specify a set of wordsSpecify a set of wordsEnglish Language Statement : English Language Statement :

I want a book by J. K Rowling titled I want a book by J. K Rowling titled The Chamber of Secrets The Chamber of Secrets

Page 13: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Query entered in a computer systemQuery entered in a computer systemAu = RowlingAu = RowlingTi = Chamber of SecretsTi = Chamber of Secrets““Chamber of Secret”Chamber of Secret”Rowling AND StoneRowling AND StoneAu rowling ti chamber of secrets ti stoneAu rowling ti chamber of secrets ti stone

Page 14: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

2 User Task2 User Task 2 user task – browsing and retrieval2 user task – browsing and retrieval BrowsingBrowsing – the process of retrieving info. – the process of retrieving info.

Whereby the main objective is not clearly Whereby the main objective is not clearly defined from the beginning and whose defined from the beginning and whose purpose might change during the purpose might change during the interaction with the system.interaction with the system.

Eg. User search the internet for info about Eg. User search the internet for info about marine organism marine organism look for info. About look for info. About Australian aborigines Australian aborigines user is said to be user is said to be browsing in the collection and not searchingbrowsing in the collection and not searching

Eg. Searching for a book in the library Eg. Searching for a book in the library shelvesshelves

Page 15: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Retrieval Retrieval – process of retrieving info – process of retrieving info whereby the main obj. is clearly whereby the main obj. is clearly defined from the onset of searching defined from the onset of searching process – eg. Eg. Searching for a process – eg. Eg. Searching for a book in the library shelvesbook in the library shelves

Page 16: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

2 actions when user interacts with 2 actions when user interacts with an IRSan IRS

2 actions can be identified when a user 2 actions can be identified when a user interacts with an IRSYS – pulling and interacts with an IRSYS – pulling and pushing actions.pushing actions.

Pulling actionPulling action user request for info in user request for info in interactive way eg browsing and retrievalinteractive way eg browsing and retrieval

Pushing actionPushing action push info towards the push info towards the user periodically through the use of a user periodically through the use of a specified or specially designed s/ware specified or specially designed s/ware also known as filtering also known as filtering

eg. Yahoo Msgr Service eg. Yahoo Msgr Service alert user each alert user each time new message arrivetime new message arrive

Online Stock ExchangeOnline Stock Exchange

Page 17: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Interaction of the user with Interaction of the user with IRSYS through distinct taskIRSYS through distinct task

IR

Browsing

DB

USER

Page 18: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Logical View of Documents Logical View of Documents

Documents in a collection are Documents in a collection are represented by a set on index terms represented by a set on index terms or keywordsor keywords

KeywordsKeywordsAbstractAbstractFull textFull text

Page 19: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Logical View of DocumentsLogical View of Documents

Documents

Indexing Process

Assigned by humans

Extracted from text of document

Keywords/subject headings = Logical view of document

•Documents in a collection are represented by a set of index term/keywords

Page 20: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

LISANET – search by abstractLISANET – search by abstract

Page 21: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

MJLIS - EJournalMJLIS - EJournal

Page 22: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

If full text :If full text :Each word in the text is a keywordEach word in the text is a keywordMost complex formMost complex formExpensiveExpensiveIf full text is too large, there are If full text is too large, there are

mechanisms built into the IRS to reduce mechanisms built into the IRS to reduce the number of keyword :the number of keyword :

Page 23: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Logical view of documents - continueLogical view of documents - continue

1.1. Stop words (eg articles and connectives Stop words (eg articles and connectives – a, the , an, and, of, etc)– a, the , an, and, of, etc)

2.2. Stemming (reduce distinct words to their Stemming (reduce distinct words to their common grammatical root) eg diary** common grammatical root) eg diary** will find diary or diarieswill find diary or diaries

3.3. Truncation – eg catalog* will retrieve Truncation – eg catalog* will retrieve catalog, catalogs, catalogue, cataloguescatalog, catalogs, catalogue, catalogues

4.4. Noun words (eliminates adjectives, Noun words (eliminates adjectives, adverbs, verbs) eg run will represent adverbs, verbs) eg run will represent runs, runningruns, running

5.5. compressioncompression

Conversion Process

Page 24: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Logical view of documents - continueLogical view of documents - continue

This conversion process is known as text This conversion process is known as text operation or transformationoperation or transformation

It reduce the complexity of the document It reduce the complexity of the document representation and allow the logical view representation and allow the logical view from that of a full text to a set of index from that of a full text to a set of index termsterms

On the other hand, the human assigned On the other hand, the human assigned keywords provides the most concise keywords provides the most concise logical view of a document but might lead logical view of a document but might lead to retrieval of poor quality – different to retrieval of poor quality – different interpretations, limited keywords if using interpretations, limited keywords if using thesaurus thesaurus

Page 25: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

2 modes of retrieval2 modes of retrieval

Ad-Hoc – the documents in the IRS Ad-Hoc – the documents in the IRS remains static but new queries are remains static but new queries are submitted to the system – eg. CD-submitted to the system – eg. CD-ROM DatabaseROM Database

Filtering – the queries remain Filtering – the queries remain relatively static but new documents relatively static but new documents come into the IRS eg. Stock marketcome into the IRS eg. Stock market

Page 26: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

FilteringFiltering Construct a user profile that reflects the user’s Construct a user profile that reflects the user’s

preferences and profile is matched against preferences and profile is matched against incoming documents to find a match or a hitincoming documents to find a match or a hit

Retrieve only documents of interest to the Retrieve only documents of interest to the user and as specified in the user profileuser and as specified in the user profile

User select relevant documents from the list.User select relevant documents from the list. Filtered documents can also be ranked to Filtered documents can also be ranked to

further assist the user as to relevancefurther assist the user as to relevance Construction of a user profile - user provide Construction of a user profile - user provide

necessary keywords or collect info about necessary keywords or collect info about preferences from the user and use this to preferences from the user and use this to construct a user profile dynamically construct a user profile dynamically

Page 27: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

INFORMATION RETRIEVAL INFORMATION RETRIEVAL PROCESSPROCESS

A.A. DEFINE TEXT DATABASEDEFINE TEXT DATABASE The text database has to be defined before the The text database has to be defined before the

retrieval process beginsretrieval process begins Done by database manager – documents to be Done by database manager – documents to be

used, operations to be performed on the text, used, operations to be performed on the text, text modeltext model

Original documents is transformed into a logical Original documents is transformed into a logical view of the documents via the various text view of the documents via the various text operationsoperations

The database manager will then build up the The database manager will then build up the index of the text – manually / computer index of the text – manually / computer generatedgenerated

The retrieval system is testedThe retrieval system is tested

Page 28: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

B. RETRIEVAL PROCESSB. RETRIEVAL PROCESS The IRS can be used once the document The IRS can be used once the document

database has been indexeddatabase has been indexed User puts or present his question/ user User puts or present his question/ user

need to the IRSneed to the IRS Question is change to a logical view of the Question is change to a logical view of the

document via the text operationdocument via the text operation The query operation will present this to The query operation will present this to

the system in a form understandable by the system in a form understandable by the systemthe system

Query is processed to obtain the retrieved Query is processed to obtain the retrieved documents.documents.

Page 29: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Continue…Continue… The retrieved document are ranked The retrieved document are ranked

according to relevanceaccording to relevance Retrieved document are sent to the userRetrieved document are sent to the user User looks through at the ranked User looks through at the ranked

documents and can modify question/user documents and can modify question/user need/ query via the user feedback cycleneed/ query via the user feedback cycle

Same process repeatedSame process repeated

Page 30: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

DEVELOPMENTDEVELOPMENT For the past 4000 years , man has always been For the past 4000 years , man has always been

organizing information for retrieval and usage.organizing information for retrieval and usage. It started out with a table of contents for a book. It started out with a table of contents for a book.

Then, the amount of information extended over a Then, the amount of information extended over a number of booksnumber of books

A specialized data structure is needed to ensure A specialized data structure is needed to ensure faster access to the stored info.faster access to the stored info.

The oldest and the most popular data form of data The oldest and the most popular data form of data structure for fast IR is a collections of words or structure for fast IR is a collections of words or concept with which are associated pointers to the concept with which are associated pointers to the related info = INDEXrelated info = INDEX

Previously – ManualPreviously – Manual

Page 31: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

Development…continueDevelopment…continue

Now, with the advent of computers, large Now, with the advent of computers, large indexes can be generated automatically. This indexes can be generated automatically. This automatic indexes provide the logical view of automatic indexes provide the logical view of the document as perceived by the system the document as perceived by the system and not the userand not the user

2 different views of the IR problems:2 different views of the IR problems:Computer-centeredComputer-centered building efficient building efficient

indexes , processing user queries with high indexes , processing user queries with high performance, develop ranking algorithm which performance, develop ranking algorithm which will improve the quality of the answer setwill improve the quality of the answer set

Human-CenteredHuman-Centered studying the behavior of studying the behavior of the user , understand his main needs, and of the user , understand his main needs, and of determining how such understanding affects determining how such understanding affects the organization and the operation the the the organization and the operation the the IRSYS.IRSYS.

Page 32: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

IR in the LibraryIR in the Library Libraries are the first users of IRSYS to retrieve Libraries are the first users of IRSYS to retrieve

informationinformation Usually develop by academic institution and Usually develop by academic institution and

later by commercial vendorslater by commercial vendors 11stst generation – automation of the card catalog generation – automation of the card catalog

and allowed searches based on author and titleand allowed searches based on author and title 22ndnd generation – increased search functionality generation – increased search functionality

- searching by subject headings, keywords, - searching by subject headings, keywords, complex queries -OPACcomplex queries -OPAC

33rdrd generation – graphical interfaces, electronic generation – graphical interfaces, electronic forms, hypertext features, open system forms, hypertext features, open system architecture – Digital Librariesarchitecture – Digital Libraries

Page 33: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

The Web and Digital The Web and Digital LibrariesLibraries

Search engine on the web are still using Search engine on the web are still using indexes which are similar to the ones used indexes which are similar to the ones used by libraries years ago.by libraries years ago.

So, what has change?So, what has change? Advances in computer technology has led to:Advances in computer technology has led to:

Cheaper access to various sources of Cheaper access to various sources of informationinformation

Greater access to network due to Greater access to network due to advances in all kind of digital advances in all kind of digital communicationcommunication

Freedom to post information on the webFreedom to post information on the web

Page 34: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

ProblemsProblems

People still find it difficult to People still find it difficult to retrieve info relevant to their retrieve info relevant to their information needs from the webinformation needs from the web

Issues to address:Issues to address:Dynamic world on the webDynamic world on the webDemand for access and quick Demand for access and quick

responseresponseQuality of retrieval task is affected Quality of retrieval task is affected

by user interaction with the systemby user interaction with the system

Page 35: WMES3103 INFORMATION RETRIEVAL WEEK 1 AND 2. WHAT IS INFORMATION RETRIEVAL? Information Retrieval – IR Information Retrieval – IR Information Information

THANK YOUTHANK YOU