mining the michael hunter reference librarian hobart and william smith colleges for western new york...
Post on 21-Dec-2015
214 views
TRANSCRIPT
Mining theMining the
Michael HunterMichael HunterReference LibrarianReference Librarian
Hobart and William Smith CollegesHobart and William Smith Colleges
ForFor
Western New York Library Resources Western New York Library Resources CouncilCouncil
Member Libraries’ StaffMember Libraries’ Staff
Sponsored by the Sponsored by the Western New York Library Resources Western New York Library Resources
Council Council
For today . . .For today . . .
From Web to Deep WebFrom Web to Deep Web Search Services: Search Services: Genres and Genres and
DifferencesDifferences The Topography of the InternetThe Topography of the Internet Mining the Deep Web:Mining the Deep Web: Techniques and Techniques and
TipsTips Hands-on SessionHands-on Session Evaluating Deep Web ResourcesEvaluating Deep Web Resources Using Proprietary Software Using Proprietary Software
Web to Deep WebWeb to Deep Web
1991 – Gopher1991 – Gopher• Menu-based text onlyMenu-based text only• You had to KNOW the sitesYou had to KNOW the sites
1992 – Veronica1992 – Veronica• Menus of menusMenus of menus• Difficult to accessDifficult to access
Web to Deep WebWeb to Deep Web
1991 - 1991 - HHyper-yper-TText ext MMarkup arkup LLanguageanguage• Linkage capability leads you to Linkage capability leads you to
related information elsewhererelated information elsewhere ““Classic” Web SiteClassic” Web Site
• Relatively stable content of static, Relatively stable content of static, separate documents or filesseparate documents or files
• Typically no larger than 1,000 Typically no larger than 1,000 documents navigated via static documents navigated via static directory structuresdirectory structures
Web to Deep WebWeb to Deep Web
1994 – Lycos launched1994 – Lycos launched• First crawler-based search engine with First crawler-based search engine with
database of 54,000 html documents database of 54,000 html documents (CMU)(CMU)
Growth of html documents Growth of html documents unprecedented and unanticipatedunprecedented and unanticipated• 2000 (April) “The Web is doubling in 2000 (April) “The Web is doubling in
size every 8 months” (FAST)size every 8 months” (FAST)
Web to Deep WebWeb to Deep Web 1996 – Three phenomena pivotal for 1996 – Three phenomena pivotal for
the development of the Deep Web:the development of the Deep Web: HTML-based database technology HTML-based database technology
introducedintroduced• Bluestone’s Sapphire/Web, OracleBluestone’s Sapphire/Web, Oracle
Commercialization of the WebCommercialization of the Web• Growth of home PC-users and e-commerceGrowth of home PC-users and e-commerce
Web Servers adapted to embrace Web Servers adapted to embrace “dynamic” serving of data“dynamic” serving of data• Microsoft’s ASP, Unix PHP and othersMicrosoft’s ASP, Unix PHP and others
Web to Deep WebWeb to Deep Web
1998 – Deep Web comes of Age1998 – Deep Web comes of Age
Larger sites redesigned with a Larger sites redesigned with a database orientation rather than database orientation rather than static directory structurestatic directory structure• U.S Bureau of the CensusU.S Bureau of the Census• Securities and Exchange CommissionSecurities and Exchange Commission• Patent and Trademark OfficePatent and Trademark Office
Search Services:Search Services:Genres and DifferencesGenres and Differences
Exclusively crawler-createdExclusively crawler-created• Search enginesSearch engines• Meta search enginesMeta search engines
Human created and/or influencedHuman created and/or influenced• DirectoriesDirectories• Specialized search enginesSpecialized search engines• Subject metasitesSubject metasites• Deep Web gateway sitesDeep Web gateway sites
Search Services:Search Services:Exclusively Crawler CreatedExclusively Crawler Created
Database compiled through Database compiled through automated, automated, link-dependentlink-dependent crawling and site submissioncrawling and site submission
Unable to accessUnable to access• Dynamically-created pagesDynamically-created pages• Proprietary, non-html filetypesProprietary, non-html filetypes• MultimediaMultimedia• SoftwareSoftware• Password-protected sitesPassword-protected sites• Sites prohibiting crawlers (robots.txt Sites prohibiting crawlers (robots.txt
exclusion)exclusion)
Dynamically-created Web Dynamically-created Web pagespages
Created at the moment of the query Created at the moment of the query using the most recent version of the using the most recent version of the database.database.
Database-drivenDatabase-driven Require interactionRequire interaction
• Amazon.comAmazon.com What titles are available? At what price? What titles are available? At what price? Are there recent reviews? What about shipping?Are there recent reviews? What about shipping?
Used widely in e-commerce, news, Used widely in e-commerce, news, statistical and other time-sensitive sites.statistical and other time-sensitive sites.
Dynamically-created Web Dynamically-created Web pagespages
Why can’t crawlers download them?Why can’t crawlers download them?
TechnicallyTechnically they they cancan interact, within interact, within limits of programming capabilitylimits of programming capability
Very costly and time-consuming for Very costly and time-consuming for general search servicesgeneral search services
Dynamically-created Web Dynamically-created Web pagespages
How can a crawler detect a How can a crawler detect a dynamically-created page?dynamically-created page?• From any of the following in the URLFrom any of the following in the URL
? , % , $ , = , ASP , PHP , CFM ? , % , $ , = , ASP , PHP , CFM and othersand others
proquest.umi.com/pqdweb?proquest.umi.com/pqdweb?Did=000000209668731&Fmt=1&Deli=1&Mtd=1&IdxDid=000000209668731&Fmt=1&Deli=1&Mtd=1&Idx=5&Sid=1&RQT=309=5&Sid=1&RQT=309
Proprietary FiletypesProprietary Filetypes
PDFPDF SpreadsheetsSpreadsheets Word-processed documentsWord-processed documents
Google does it! Why can’t Google does it! Why can’t you?you?
Google’s Deep Web Components: Google’s Deep Web Components: Non-html filetypes (1.75%)Non-html filetypes (1.75%)
SEARCH SYNTAXSEARCH SYNTAX
“california power shortage” filetype:pdf “california power shortage” filetype:pdf Adobe Portable Document Adobe Portable Document
Format (pdf) Format (pdf)
Adobe PostScript (ps) Adobe PostScript (ps)
Lotus 1-2-3 (wk1, wk2, Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wkwk3, wk4, wk5, wki, wk
Lotus WordPro (lwp) Lotus WordPro (lwp)
MacWrite (mw) MacWrite (mw)
Microsoft Excel (xls) Microsoft Excel (xls)
Microsoft PowerPoint Microsoft PowerPoint (ppt) (ppt)
Microsoft Word (doc) Microsoft Word (doc)
Microsoft Works (wks, Microsoft Works (wks, wps, wdb) wps, wdb)
Microsoft Write (wri) Microsoft Write (wri) Rich Text Format (rtf)Rich Text Format (rtf) Text (ans, txt) Text (ans, txt)
Google Non-html FiletypesGoogle Non-html FiletypesWarning!Warning!
FOR NON-HTML FILESFOR NON-HTML FILES• Clicking on a title in the results list Clicking on a title in the results list
opens the application as well, involving opens the application as well, involving risk of a virus or worm that may be risk of a virus or worm that may be attached to the fileattached to the file
• INSTEADINSTEAD, , click the “View as HTML” click the “View as HTML” option; no applications will be opened option; no applications will be opened and no risk of virus or wormand no risk of virus or worm
• NOTE: Titles for non-html files are NOTE: Titles for non-html files are frequently not descriptive of contentfrequently not descriptive of content
Search ServicesSearch ServicesHuman created or influencedHuman created or influenced
Directories – general and Directories – general and specializedspecialized
Specialized search enginesSpecialized search engines Subject metasites or gatewaysSubject metasites or gateways Deep Web gatewaysDeep Web gateways
Search ServicesSearch ServicesHuman created or influencedHuman created or influenced
Content of sites is examined and Content of sites is examined and categorized or crawling is human-focused categorized or crawling is human-focused and refinedand refined
CAN CAN include sites with dynamically include sites with dynamically created pagescreated pages
CANCAN be limited to database-driven sites be limited to database-driven sites (Deep Web)(Deep Web)
CANCAN include non-html files include non-html filesNOTE: Some specialized search engines NOTE: Some specialized search engines may include little human influence eg. may include little human influence eg. Search.eduSearch.edu
The Topography of the InternetThe Topography of the Internetoror The Layers of the Web The Layers of the Web
Mapping the web is challengingMapping the web is challenging• Unregulated in natureUnregulated in nature• Influences from all over the globeInfluences from all over the globe• Fulfills many purposes, from personal Fulfills many purposes, from personal
to commercialto commercial• Changes rapidly and unexpectedlyChanges rapidly and unexpectedly
Divisions and terminology are Divisions and terminology are inherently ambiguous eg. “Deep” inherently ambiguous eg. “Deep” vs “Invisible” Webvs “Invisible” Web
May I suggest a biological, nautical May I suggest a biological, nautical metaphor, perhaps metaphor, perhaps the oceanthe ocean??
SURFACE WEBSURFACE WEB
SHALLOW WEBSHALLOW WEB
OPAQUE WEBOPAQUE WEB
DEEP WEBDEEP WEB
Surface WebSurface Web
Static html documentsStatic html documents
Crawler-accessibleCrawler-accessible
Shallow WebShallow Web Static html documents loaded on Static html documents loaded on
servers that use ColdFusion or Lotus servers that use ColdFusion or Lotus Domino or other similar softwareDomino or other similar software
A different URL for the same page is A different URL for the same page is created each time it is served.created each time it is served.
Crawlers skip these to avoid multiple Crawlers skip these to avoid multiple copies of the same page in their copies of the same page in their databasedatabase
TechnicallyTechnically human accessible via human accessible via directories, Deep Web gateways or links directories, Deep Web gateways or links from other sitesfrom other sites
Opaque WebOpaque Web
Static html documentsStatic html documents Technically Technically crawler accessiblecrawler accessible 2 types: 2 types:
• Downloaded and indexed by crawlerDownloaded and indexed by crawler• Not downloaded or indexed by crawlerNot downloaded or indexed by crawler
Opaque WebOpaque Web Downloaded and indexed by crawlerDownloaded and indexed by crawler
• Buried in search results you never look atBuried in search results you never look at• A casualty of “relevance” rankingA casualty of “relevance” ranking
Not downloaded or indexed by crawler Not downloaded or indexed by crawler due to programmed download limitsdue to programmed download limits• Document buried deep in the siteDocument buried deep in the site• Part of a large document that did not get Part of a large document that did not get
downloaded (Typical crawl per page is downloaded (Typical crawl per page is 110 K or less)110 K or less)
• Document added since last crawler visit Document added since last crawler visit (Even the best revisit on an average of (Even the best revisit on an average of every 2 weeks, depending on amount of every 2 weeks, depending on amount of change at a site)change at a site)
Opaque WebOpaque Web
Access to the Opaque Web Access to the Opaque Web • Specialized search enginesSpecialized search engines• General and specialized directoriesGeneral and specialized directories• Subject metasitesSubject metasites
These services typically index more These services typically index more thoroughly and more often than thoroughly and more often than large, general search engineslarge, general search engines
Deep WebDeep WebTwo CategoriesTwo Categories
TechnicallyTechnically inaccessible to inaccessible to crawlerscrawlers
TechnicallyTechnically accessible to accessible to crawlerscrawlers
Deep WebDeep Web TechnicallyTechnically inaccessible to inaccessible to
crawlerscrawlers•Dynamically created pagesDynamically created pages•DatabasesDatabases•Non-textual filesNon-textual files•Password protected sitesPassword protected sites•Sites prohibiting crawlersSites prohibiting crawlers
Deep WebDeep Web
TechnicallyTechnically accessible to accessible to crawlerscrawlers•Textual files in non-html Textual files in non-html
formatsformats
(Google does it!)(Google does it!)•Pages excluded from crawler Pages excluded from crawler
by editorial policy or biasby editorial policy or bias
How large is the Deep Web?How large is the Deep Web?
White Paper by Michael K. White Paper by Michael K. Bergman published in the Journal Bergman published in the Journal of Electronic Publishing in 2000.of Electronic Publishing in 2000.• http://www.brightplanet.com/http://www.brightplanet.com/
deepcontent/deepcontent/tutorials/DeepWeb/index.asp tutorials/DeepWeb/index.asp
Currently a scarcity of unbiased Currently a scarcity of unbiased research due to its fluid nature, research due to its fluid nature, dynamic content and multiple dynamic content and multiple points of accesspoints of access
How large is the Deep Web?How large is the Deep Web?Bergman StudyBergman Study
Over 150,000 databasesOver 150,000 databases Over 95% publicly availableOver 95% publicly available Perhaps 500 times larger than the Perhaps 500 times larger than the
Surface WebSurface Web Growth rate currently greater than Growth rate currently greater than
the Surface Webthe Surface Web
What’s in the Deep Web?What’s in the Deep Web?
Information likely to be stored in a Information likely to be stored in a databasedatabase• People, address, phone number People, address, phone number
locatorslocators• PatentsPatents• LawsLaws• Dictionary definitionsDictionary definitions• Items for sale or auctionItems for sale or auction• Technical reportsTechnical reports• Other specialized dataOther specialized data
What’s in the Deep Web?What’s in the Deep Web?
Information that is new and Information that is new and dynamically changingdynamically changing• NewsNews• Job postingsJob postings• Travel schedules and pricesTravel schedules and prices• Financial dataFinancial data• Library catalogs and databasesLibrary catalogs and databases
Topical coverage is Topical coverage is extremely extremely varied.varied.
Mining the Deep WebMining the Deep WebA world different from search engines . . .A world different from search engines . . .
Hunter’s Maxim for Searching the Deep Hunter’s Maxim for Searching the Deep WebWeb
Plan to first Plan to first locate the categorylocate the category of of information you want, then browse. Don’t information you want, then browse. Don’t be too specific in your searches. Cast a wide be too specific in your searches. Cast a wide net.net.
Brush up on your Gopher-type search skills (if Brush up on your Gopher-type search skills (if you were searching the ‘Net back then). We’ve you were searching the ‘Net back then). We’ve become accustomed to search engine free-text become accustomed to search engine free-text searching. searching. This is a different world.This is a different world.
Basic Strategies for Basic Strategies for Mining the Deep WebMining the Deep Web
Using directories, general and specializedUsing directories, general and specialized Using general search enginesUsing general search engines Using specialized (subject-focused) search Using specialized (subject-focused) search
enginesengines Using subject metasites (link-oriented)Using subject metasites (link-oriented) Using Deep Web gateway sites (database-Using Deep Web gateway sites (database-
oriented)oriented)NOTE: Many sites contain elements of all of the NOTE: Many sites contain elements of all of the
above, in varying degrees and combinationsabove, in varying degrees and combinations
Using directoriesUsing directories Yahoo! > “web directories” > 840 Yahoo! > “web directories” > 840
category matchescategory matches Yahoo! > database > 22 categories Yahoo! > database > 22 categories
and 7423 site matchesand 7423 site matches Google Directory > link collections > Google Directory > link collections >
493,000493,000 Databases may also be found under Databases may also be found under
general subject categoriesgeneral subject categories Also use research directories such Also use research directories such
as Infomine, LII, WWWVL and othersas Infomine, LII, WWWVL and others
Using general search enginesUsing general search engines
Combine subject terms with one or Combine subject terms with one or more of these possibilities:more of these possibilities:• directorydirectory• crawlercrawler• search enginesearch engine• databasedatabase• webring or web ringwebring or web ring• link collection link collection • blogblog
Using general search enginesUsing general search engines
Google (11/4/02)Google (11/4/02)““toxic chemicals database” > 45toxic chemicals database” > 45
““punk rock search engine” > 77punk rock search engine” > 77
““science fiction webring” > 97science fiction webring” > 97
(web rings are cooperative subject metasites, (web rings are cooperative subject metasites, maintained by experts or aficionados)maintained by experts or aficionados)
Remember, when using a search engine you Remember, when using a search engine you must must match words on the page.match words on the page.
Using specialized (subject-Using specialized (subject-focused) search enginesfocused) search engines
AKAAKA• Limited-area enginesLimited-area engines• Targeted search enginesTargeted search engines• Expert search servicesExpert search services• Vertical PortalsVertical Portals• VortalsVortals
Using specialized (subject-Using specialized (subject-focused) search enginesfocused) search engines
Non-html textual filesNon-html textual files• http://searchpdf.adobe.com/http://searchpdf.adobe.com/• GoogleGoogle
Non-textual filesNon-textual files• Image, MP3 search enginesImage, MP3 search engines• Media search at Google, et. al.Media search at Google, et. al.
SoftwareSoftware BlogsBlogs
• Blogdex Blogdex http://blogdex.media.mit.edu/http://blogdex.media.mit.edu/
Web logs or blogsWeb logs or blogs
Online personal journalsOnline personal journals Postings are often centered around a Postings are often centered around a
particular topic or issue and may particular topic or issue and may contain links to recent relevant contain links to recent relevant informationinformation
Frequently updatedFrequently updated Differ from newsgroups in that they Differ from newsgroups in that they
are generally by one authorare generally by one author
Web logs or blogsWeb logs or blogs
How do you search them?How do you search them?• Blogdex Blogdex http://blogdex.media.mit.eduhttp://blogdex.media.mit.edu• Open Directory Open Directory http://dmoz.orghttp://dmoz.org
Computers / Internet / On the Web / WeblogsComputers / Internet / On the Web / Weblogs
Are they part of the Deep Web?Are they part of the Deep Web?• Yes and NoYes and No
Web logs or blogsWeb logs or blogs Google (5/23/02 and 11/4/02)Google (5/23/02 and 11/4/02)
allinurl:blogspot 171,000 | 301,000 allinurl:blogspot 171,000 | 301,000 53%53%
mostly blog home pagesmostly blog home pages
allinurl:oxblog 2 | 39 allinurl:oxblog 2 | 39 1900%1900%
home page and 1 postinghome page and 1 posting FAST (5/23/02 and 11/4/02)FAST (5/23/02 and 11/4/02)
URL:blogspot > 355,671 | 2,434,871 URL:blogspot > 355,671 | 2,434,871 146%146%
mostly blog home pagesmostly blog home pages
URL:oxblog > 0 | 5,510 URL:oxblog > 0 | 5,510 Start your own at http://blogspot.comStart your own at http://blogspot.com
Using subject metasites Using subject metasites (link-oriented)(link-oriented)
Locate subject metasites viaLocate subject metasites via• DirectoriesDirectories• Professional Organizations home pagesProfessional Organizations home pages• Specialized search engine gateways Specialized search engine gateways
(handout)(handout)• Colleagues/ResearchersColleagues/Researchers
Once into a subject metasite scan the Once into a subject metasite scan the page for search boxes and determine if page for search boxes and determine if they search the “surface web” of the site they search the “surface web” of the site only or embedded databases. (This is often only or embedded databases. (This is often not clearly indicated)not clearly indicated)
Using Deep Web gateway sites Using Deep Web gateway sites (database-oriented)(database-oriented)
Become familiar with several (see handout)Become familiar with several (see handout) Most search only the Most search only the home pageshome pages of the of the
databases they include. A few will actually databases they include. A few will actually enter your search terms and display results enter your search terms and display results
Explore their subject areas; some subjects Explore their subject areas; some subjects may not be included at all.may not be included at all.
Deep Web gateways are still in an early Deep Web gateways are still in an early stage of development, seeking broad appeal stage of development, seeking broad appeal rather than a narrow focus.rather than a narrow focus.
Using serendipityUsing serendipity
Sometimes the Deep Web “comes to Sometimes the Deep Web “comes to you”!you”!
Mine your bookmarks/favorites and Mine your bookmarks/favorites and add Deep Web resources when you add Deep Web resources when you come across them by chance.come across them by chance.
Evaluating Deep Web InformationEvaluating Deep Web Information Embedded databasesEmbedded databases Non-html textual files and password Non-html textual files and password
protected sitesprotected sites Non-textual filesNon-textual files SoftwareSoftware
Embedded DatabasesEmbedded Databases
Typically targeted, focused Typically targeted, focused informationinformation
Content usually generated and used Content usually generated and used by knowledgeable partiesby knowledgeable parties
Database creation and maintenance Database creation and maintenance requires expertise and commitmentrequires expertise and commitment
Site location is usually stableSite location is usually stable
Embedded DatabasesEmbedded Databases
Check author and/or sponsorCheck author and/or sponsor Check for freshnessCheck for freshness Check for breadth or range of Check for breadth or range of
coveragecoverage Compare with other Deep Web Compare with other Deep Web
sources offering similar information, sources offering similar information, especially for online shopping or especially for online shopping or other e-commerce uses.other e-commerce uses.
Non-html textual files and Non-html textual files and password protected sitespassword protected sites
Evaluate as you would any other Evaluate as you would any other information from the Internetinformation from the Internet
BEWARE: If using Google, open non-BEWARE: If using Google, open non-html textual files html textual files as htmlas html when when possible. Opening the file and its possible. Opening the file and its application may transmit a virus.application may transmit a virus.
Image, audio, multimedia filesImage, audio, multimedia files
Check for image/audio qualityCheck for image/audio quality Check for plug-in requirementsCheck for plug-in requirements Check for depth of coverage in the Check for depth of coverage in the
area of your queryarea of your query FEE or FREE???FEE or FREE???
SoftwareSoftware
Check for Check for sponsor/source/maintainersponsor/source/maintainer• Is there a contact person?Is there a contact person?
Check for freshnessCheck for freshness• Latest versions available?Latest versions available?
Check for stability and reliabilityCheck for stability and reliability• Has any virus scanning been done?Has any virus scanning been done?
Check for breadthCheck for breadth• Are programs available for all Are programs available for all
operating systems?operating systems? FEE or FREE???FEE or FREE???
Directed Query Engines or Directed Query Engines or Intelligent AgentsIntelligent Agents
Designed to access distributed Designed to access distributed Deep Web resourcesDeep Web resources
Can be configured to search Can be configured to search specific URL’sspecific URL’s• DatabasesDatabases• Subject metasitesSubject metasites• report collectionsreport collections• dynamic pagesdynamic pages• online newslettersonline newsletters
Directed Query Engines or Directed Query Engines or Intelligent AgentsIntelligent Agents
Several DQE’s can be “nested” – one Several DQE’s can be “nested” – one query launches several others in a query launches several others in a cascading fashioncascading fashion
Publicly-available examples:Publicly-available examples:• PubMedPubMed• Department of Energy’s Information BridgeDepartment of Energy’s Information Bridge• NASA’s Technical Report ServerNASA’s Technical Report Server
Apple’s Apple’s Sherlock Sherlock (bundled with Mac OS 8.5 or (bundled with Mac OS 8.5 or higher)higher)• Searches Deep Web databases that you specifySearches Deep Web databases that you specify
Directed Query Engines for Directed Query Engines for purchasepurchase
Simultaneous search of Deep Web and Simultaneous search of Deep Web and other resources with many additional other resources with many additional featuresfeatures
Lexibot Lexibot http://www.lexibot.comhttp://www.lexibot.com• If you complete survey: $189 upgrades $15If you complete survey: $189 upgrades $15• If you don’t:If you don’t: $289 upgrades $50 $289 upgrades $50
BullsEye BullsEye http://info.intelliseek.comhttp://info.intelliseek.com• BullsEye Pro:BullsEye Pro: $199 with free upgrades for 6 $199 with free upgrades for 6
monthsmonths
How does the Deep Web fit into my How does the Deep Web fit into my overall search strategy?overall search strategy?
What types of queries are well-What types of queries are well-suited to the Deep Web?suited to the Deep Web?
Information stored in databasesInformation stored in databases• ““One of many similar things”One of many similar things”• Statistics, census dataStatistics, census data• City, county, state, national and City, county, state, national and
international public records, data and international public records, data and lawslaws
• Online reference booksOnline reference books
What types of queries are well-What types of queries are well-suited to the Deep Web?suited to the Deep Web?
Information that is new and Information that is new and dynamically changingdynamically changing• NewsNews• Pricing and availability of goods and Pricing and availability of goods and
servicesservices• Financial data, national and internationalFinancial data, national and international• Job postingsJob postings• Travel schedules and pricingTravel schedules and pricing• Library catalogs and databasesLibrary catalogs and databases
What types of queries are well-What types of queries are well-suited to the Deep Web?suited to the Deep Web?
Non-html textual filesNon-html textual files Non-textual filesNon-textual files SoftwareSoftware Searching blogsSearching blogs
A few words from A few words from Sherman and Price …Sherman and Price …
Authors of Authors of The Invisible WebThe Invisible Web Cyber Age Cyber Age Books, 2000Books, 2000
Datamine your Bookmark/Favorites Datamine your Bookmark/Favorites CollectionCollection
Explore reviewed sites Explore reviewed sites thoroughlythoroughly; ; • They often contain Deep Web resources not They often contain Deep Web resources not
mentioned by the reviewermentioned by the reviewer Subscribe to lists that are focused and Subscribe to lists that are focused and
relevant to your needsrelevant to your needs• No main Deep Web list existsNo main Deep Web list exists• Resources appear in subject-based listsResources appear in subject-based lists
A few words from A few words from Sherman and Price …Sherman and Price …
Create your own “monitoring Create your own “monitoring service”service”• Identify “What’s New” pages and key Identify “What’s New” pages and key
sites you find valuablesites you find valuable• Use C4U to alert you to changes at Use C4U to alert you to changes at
these sites. Gives you the these sites. Gives you the typetype of of change and keywords from the new change and keywords from the new text. Enables you to determine whether text. Enables you to determine whether it’s worth checking or notit’s worth checking or not
• Available FREE at http://www.c4u.comAvailable FREE at http://www.c4u.com
Remember Hunter’s MaximRemember Hunter’s Maximfor the Deep Webfor the Deep Web
Plan to first Plan to first locate the categorylocate the category of of information you want, then browse.information you want, then browse.
Don’t be too specific in your Don’t be too specific in your searches.searches.
Cast a wide net.Cast a wide net.
Thank you and best of luck in Thank you and best of luck in discovering and taming this new discovering and taming this new
Cyber Frontier!!!Cyber Frontier!!!
Michael HunterMichael HunterReference LibrarianReference Librarian
Warren Hunting Smith LibraryWarren Hunting Smith LibraryHobart and William Smith CollegesHobart and William Smith Colleges
Geneva, NY 14456Geneva, NY 14456
(315) 781-3552(315) 781-3552 [email protected]@hws.edu