computer science 1000 information searching i permission to redistribute these slides is strictly...

30
Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Upload: vivien-may

Post on 29-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Computer Science 1000

Information Searching I

Permission to redistribute these slides is strictly prohibited without permission

Page 2: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

World Wide Web – The Basics our next topic examines how to find information on

the web we consider a few basic terms here (which you’re

probably familiar with): page/web page link/hyperlink site/web site

later in semester, we will revisit web technologies in much more detail

Page 3: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

World Wide Web a system of linked documents accessed via the

internet often simply referred to as the web sometimes used interchangeably with the internet,

but this isn’t exactly correct the internet is the global network of interconnected

devices (computers, routers, etc) that exchange data the web refers to the documents being stored, the

software that broadcasts and receives them, and the protocols used for transmission

Page 4: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Web Page a document stored and accessed on the web identified by a unique URL (Uniform Resource Locator) often referred to simply as a page today’s web pages are very rich in content

text images hyperlinks videos

Page 5: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Web Site a collection of related webpages on the internet typically belong to a common organization or event example

all pages served by the University of Lethbridge make up its website

Page 6: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Hyperlink a part of a web page that refers to a different

location often just called a link hyperlinks can reference:

another place on the same page another webpage

hypertext: text containing hyperlinks

Page 7: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

The Age of Information the computer, internet, and web have changed how

we interact with information information storage

the amount of available information is significantly greater (and growing rapidly) than even a generation ago

information transmission large amounts of information are available with a single

mouse click, and transfer almost immediately

Page 8: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Information Age – Rapid Onset the situation has transformed tremendously in your

lifetimes consider the global information capacity:

in 1986: 2.6 exabytes (< 1 CD per person) in 1993: 15.8 exabytes in 2000: 54.5 exabytes in 2007: 295 exabytes (61 CDs per person)

how does one successfully navigate such a mountain of digital content?

Martin and Lopez. The World’s Technological Capacity to Store, Communicate, and Compute Information. Science 332:6025 2011

Page 9: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Information Access even in pre-internet days, there was a

wealth of information large-scale: library medium-scale: Encyclopaedia set small-scale: newspaper

strategies developed to manage information

categories hierarchies indices

Page 10: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Classificationsystematic arrangement in groups or

categories according to established criteria – Merriam Webster

in other words, the information is categorized according to relevant features

consider our course notes: terminology (4 sets of slides) information searching (2-3 sets of slides)etc ...

Page 11: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Classificationclassification is not specific to digital

information library classification:

Dewey Decimal Classification Library of Congress Classification

Page 12: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Classificationclassification is not specific to digital

informationnewspaper classification

Page 13: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Classificationclassification level of detail leads to

tradeoffsconsider a coarse level of detail

e.g. taxonomy of living organisms classify organisms according to Domain

(Archaea, Bacteria, Eukarya) advantage: small number of groups disadvantage: each group is massive

Page 14: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Classificationclassification level of detail leads to

tradeoffsconsider a fine level of detail

e.g. taxonomy of living organisms classify organisms according to Genus

(Canis, Felis) advantage: each group reasonably small disadvantage: massive number of groups

solution: hierarchy

Page 15: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Hierarchya decomposition of classifications according

to detailhierarchies contain levels

at the top (root) level, there is typically a small number of broad categories

each category is decomposed into small categories

a classification group is defined by categorization at each level

Page 16: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Hierarchyorganism taxonomy hierarchy:

each Domain categorized into Kingdoms

Eukarya

Fungi PlantaeAnimalia Protista

Domain:

Kingdom:

Page 17: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Hierarchyorganism taxonomy hierarchy:

each Kingdom classified in Phylum each Phylum classified into Class and so on ..

http://ag.arizona.edu/pubs/garden/mg/entomology/intro.html

Page 18: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Hierarchyan object is still categorized, but by multiple

levels (instead of one)

http://schoolworkhelper.net/scientific-taxonomy/

Page 19: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Hierarchy facilitates efficient searching through exclusion

example (text): suppose you have a collection of a million items these items organized into 10 equal-sized groups each top-level group is also organized into 10 equal

subgroups choosing first category eliminates 900000 items choosing second category eliminates 90000 items and so on …

Page 20: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Hierarchy hierarchies are very popular consider our previous examples:

Library of Congress Classification

Page 21: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Hierarchy hierarchies are very popular consider our previous examples:

Newspaper

Page 22: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Index a detailed list of words, phrases, and/or topics

indicating place of occurrence in essence, it maps keywords of interest to their

location e.g. a page number

a bottom-up approach to information organization as opposed to the top-down structure of a hierarchy

particularly popular in printed material books, magazines, volumes, etc

Page 23: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Index - Example

Page 24: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Index typically used on small-scale

books and volumes vs. libraries

made efficient through organizational scheme alphabetical is very common

some overlap with hierarchies e.g. subtopics

Page 25: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Finding Information – The Webas discussed, the amount of information on

the web is immensemany of the discussed techniques for

information finding also apply digitallyclassification/hierarchies indexing

Page 26: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Classificationmany commercial websites have a

classification structurenavigation bars

Page 27: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Hierarchiesmany websites, especially large ones, will

also arrange their categories in hierarchical fashion

Page 28: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Partition a hierarchy where every object occurs only once

organism taxonomy – every species appears only once

some hierarchies are necessarily partitions e.g. a particular book will only occur at one point in a

library classification

however, a partition in some case is not natural an object might have an inherent fit in more than one

classification

Page 29: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Partitions digital content is often stored using overlapping

hierarchies (non-partition) potentially more intuitive with hyperlinking, it’s easy to accomplish (two links to the

same page)

example (text): Three Books for Frugal Fashionistas was stored on NPR’s

website under: Home > Arts & Life > Books > Three Books for Frugal Fashionistas Home > Listen > Latest Program > Three Books for Frugal Fashionistas

Page 30: Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

Indexes for the Web unlike hierarchies, indexes are much less common

on individual websites site maps might be considered an index of sorts

however, there are analogous technologies to indexes that pertain to the web as a whole

Search Engines!