hypertext
TRANSCRIPT
![Page 1: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/1.jpg)
Web-Based Information Retrieval
Patrick Alfred Waluchio Ongwen
Knowledge Management Officer
African Research and Resource Forum
![Page 2: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/2.jpg)
Web as Agent of Change
"ICT is not an end in itself or an agent of change by itself but when incorporated into a well managed change process it is a powerful enabler and amplifier."
Bryn Jones,
Effective use of Web therefore, requires content, content management, hyperlinks and navigation tools, retrieval approaches, among others
![Page 3: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/3.jpg)
Introduction
A critical goal of successful information retrieval on the web is to identify which pages are of high quality and relevance to user’s query.
Each search engine index web page, representing it by a set of weighted keywords
A crawler (robot or spider) performs traversal of the web with goal of fetching high quality pages for indexing and retrieval.
![Page 4: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/4.jpg)
Introduction
Search engines must filter the most relevant information matching a user’s query and present retrieved information in a way a user will understand.
Hyperlinks provide a valuable source of information for web retrieval.
Hypertext and hypermedia enables searching the web in non-sequential manner
![Page 5: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/5.jpg)
Challenges of Web Information Retrieval
Management of huge amount of hyperlinked pages
Crawling the web to find appropriate web sites to index
Accessing documents Measuring the quality or authority of available
information
![Page 6: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/6.jpg)
Hypertext Non-Linear arrangements of textual material is called
hypertext. The term hyper means extension to other dimensions.
Converting text into a multidimensional space The term was invented by Ted Nelson in 1965 Hypertext
“non-sequential writing” Nelson, T. 1987. Literary Machines.
Non-linear sequences of information (dictionary, encyclopaedia, newspaper)
Hypertext are systems to manage collection of information that can be accessed non-sequentially.
![Page 7: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/7.jpg)
Hypertext
Consists of a network of nodes and logical links between nodes
Node refer to chunks of content or web page The variety of nodes and links make hypertext a
flexible structure in which information can be provided by what is stored in nodes and links to each node.
Hypertext retrieval systems are the products of emerging technology that specifies alternative approach to the retrieval of information from web
![Page 8: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/8.jpg)
Hypertext and Non-Linearity
Allows a user to follow their own path in a non-sequential manner to access information
Hypertext not usually read linearly Links encourage branching off
History and back button permit backtracking The immediacy of following links by clicking
creates a different experience from traditional non-linearity
![Page 9: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/9.jpg)
Structure of Hypertext
In a hypertext system objects in a database called nodes, are connected to one another by machine-supported links. Users follow links to access information.
Text augmented with links Link: pointer to another piece of text in same or different
document Hypertext systems can accessed by selecting link to icons
and following links from node to node By searching the database for some key word in the
normal way Use of browser to view and navigate hypertext
![Page 10: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/10.jpg)
Hierarchical Structure
![Page 11: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/11.jpg)
Hierarchical Structure
Hierarchy is the basis of almost all websites as well as hypertext
They are orderly and provide ample navigational freedom
Users start at the home page, descend the branch that most interest them, and continue making further choices as the branch divides
![Page 12: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/12.jpg)
Web-like Structures
![Page 13: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/13.jpg)
Web-like structures
Relatively unsystematic and difficult to navigate
Mostly used in works of short stories and fiction in which artistic considerations may override desire for efficient navigation
![Page 14: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/14.jpg)
Multipath Structures
![Page 15: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/15.jpg)
Multipath Structures
Largely linear and to some extent hierarchical but offers alternative pathways hence multipath structures
![Page 16: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/16.jpg)
Hypertext Component
Hypertext model: The run-time layer, which controls the user
interface The storage layer, which is a database
containing a network of nodes connected by links
The within-component layer, which is the content structure inside the node.
![Page 17: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/17.jpg)
Hypermedia
(Hypermedia = Hypertext + Multimedia) Hypermedia integrate text, images, video,
graphics, sound within Web page or node Hyper- representation of textual and non-
textual information in a non-sequential manner.
Allows embedding bitmapped images (GIF, JPEG, PNG)
![Page 18: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/18.jpg)
History of Hypertext
1945: Vannevar Bush describes “memex” (Atlantic Monthly)
1965: Ted Nelson coins the term “hypertext” 1985: Peter Brown, University of Kent,
develops first commercially available hypertext - Guide
1986-1990: More sophisticated hypertext systems developed
![Page 19: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/19.jpg)
History …
1991: Tim Berners-Lee builds IP-based distributed hypertext system at CERN
Develops UDI/URI, HTTP, and HTML… 1993: Mosaic, first graphical Web browser,
released 2002: Work begins on Semantic Web
![Page 20: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/20.jpg)
Hypertext and Hypermedia- In Information retrieval
Browsing – retrieve information by association Follow links, backtrack Maintain history, bookmarks
Searching – retrieve information by content Construct indexes of URLs Search by keyword/description of page
![Page 21: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/21.jpg)
Cont… Hypertext and Hypermedia with new standards,
HTML, XHTML, have brought tremendous revolution in the creation and delivery of content, as well as access and processing of information.
Allows users to navigate within or across a range of documents from several computer networks.
Allows browsers and other software to interpret and process information for different purposes
Search engines use the links among pages to select information resources from the Internet.
Google use the link data to rank pages in order of their relevance to query.
![Page 22: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/22.jpg)
Underlying Principles and Challenges of IR Information retrieval is a complex task Query-based IR system must be able to accept
a query about any topic and find texts that contain the specified information of query.
IR systems are required to operate in real-time, which demand they should be fast and efficient.
Most searches are conducted on the natural language text, which inherently have all the ambiguities and imprecision.
The following are some of the challenges IR systems face in natural language processing:
![Page 23: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/23.jpg)
Synonyms Synonym occurs when different words of phrases
mean essentially the same thing. For example, the words: “finance”, “fund”,
“support”, may be related depending on context of inquiry.
Natural language is filled with many words and phrases that have similar meanings, and it is often impossible for users to provide all the words which might be relevant to the query.
To address this problem, some IR systems expand the query to include all the synonymous words for a given word with the help of thesaurus.
![Page 24: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/24.jpg)
Polysemy
Polysemy occurs when a single word has more than one meaning. For example, the word “shot” can refer to following meanings:
A shooting, in - He shot at a tiger. An attempt, in - I took a shot at playing. A photograph, in - He took a nice shot
![Page 25: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/25.jpg)
Phrases in Information Retrieval
Expressions consisting of multiple words often have a meaning that is substantially different from the meaning of the individual words.
The phrase “Artificial Intelligence” is different from the individual word “Artificial” and “Intelligence”, and “Operating System” is different from: “Operating” and “System”.
One method for phrase-based indexing is to use proximity measures to specify the acceptable distance between the words. WITH or NEAR
![Page 26: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/26.jpg)
Object Recognition
Certain types of information require special procedures to identify them. For example, dates come in various forms such as: July 3, 2001, 3.7.2001, as well as 7.3.2001 (American System). (greater and less than logic <>)
![Page 27: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/27.jpg)
Semantics and Role- Relationships
Some information can only be identified through semantics.
If a user is interested in finding out the names of lecturers teaching the courses in the area of: “Artificial Intelligence”.
First, the system must be able to know the courses related to Artificial Intelligence. These can be AI, Fuzzy logic, Genetic Algorithms, Neural Networks, Machine learning.
This should then be linked to lecturers allocated the courses.
![Page 28: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/28.jpg)
Computable Values
Determining whether information is relevant some times depends on a specific calculation.
Suppose a user is interested in news paper article about merger in corporation that occurred after January, 1995.
The IR system must identify the documents using search logic operator LESS THAN or GREATER THAN <>
![Page 29: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/29.jpg)
Text Representation Techniques
The purpose of IR system is to search the text database for relevant documents in real-time.
Consequently, the text database is preprocessed and stored in a structure which helps in fast searching.
This preprocessed form is called text representation
![Page 30: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/30.jpg)
Inverted File Approach
The inverted file approach is used in text representation. It allows an IR system to quickly determine what documents contain a given set of words, and how often each word appears in the document.
In inverted file system, each database contains two files Text file –normal form in which documents appear in a
database and, inverted file- which contain all index terms drawn
automatically from the document records. Provides indirect file access
![Page 31: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/31.jpg)
Using Probability Methods
All IR systems draw conclusions about the content of a document by examining source representation
IR must base its conclusions about the document features, such as the present or absence of particular word or phrases.
IR system must take into account these uncertain relationships to determine the strength of the relevance of a document/s to a particular request.
![Page 32: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/32.jpg)
Relevance feedback
Relevance feedback is a technique used by some IR systems to improve performance on query by asking the user for feedback about retrieved texts.
Evaluation forms given to users to seek their views on performance of information retrieval systems.
![Page 33: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/33.jpg)
CAT TWO- Search Engines
Compare the assigned search engines and competently comment on the following:
Structure of the search engines Indexing techniques used Information resources offered Links between nodes Search facilities and retrieval approaches used Ease of use by novice and experienced users Interface design and display of screen layout
![Page 34: Hypertext](https://reader036.vdocuments.mx/reader036/viewer/2022062513/5551816ab4c9057f478b54b1/html5/thumbnails/34.jpg)
Search Engines
Group 1: Yahoo and Lycos Group 2: Google and Alta Vista Group 3: Ask Jeeves and Excite Group 4: All the Web and HotBot Group 5: Web crawler and MSN Group 6: Dogpile and EBay