ILS 501 Unit 3 Searching Issues ILS 501 / Dr. Liu, ILS SCSU

Download ILS 501 Unit 3 Searching Issues ILS 501 / Dr. Liu, ILS SCSU

Post on 29-Dec-2015

213 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • ILS 501 Unit 3 Searching Issues ILS 501 / Dr. Liu, ILS SCSU

    ILS 501 / Dr. Liu, ILS SCSU

  • The Rise of Search

    Search is the 2nd most popular online activity, after email.Percentage of net users who search on a typical day grew 70% from 2002 to 2009

    Pew Internet and Americal Life ProjectILS 501 / Dr. Liu, ILS SCSU

    ILS 501 / Dr. Liu, ILS SCSU

  • The Rise of SearchILS 501 / Dr. Liu, ILS SCSU

    ILS 501 / Dr. Liu, ILS SCSU

  • What is a search engine?A program that searches documents for specified keywords and returns a list of the documents where the keywords were found.Typically, a search engine works by sending out a spider to fetch as many documents as possible. Another program, called an indexer, then reads these documents and creates an index based on the words contained in each document. *ILS 501 / Dr. Liu, ILS SCSU*

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*What is web search engine?A search engines is a huge database of web page files that have been assembled automatically by the machine.

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*What a search engine does?It uses software indexers (spiders or "robots") to crawl around the Web and,

    Build indexes based on what they find in available Web pages.

    ILS 501 / Dr. Liu, ILS SCSU

  • How Do Search Engines Work?

    Crawling:A spider or robot explores your site, following links from page to page.

    Indexing:Data from the crawl is stored in the search engine index. The stored copy is referred to as the cached page.

    Ranking:The Search Engine algorithm looks at a variety of factors (over 200) to determine the importance of a web page and where it should rank for any given keyword phrase.

  • *ILS 501 / Dr. Liu, ILS SCSU*How search engines work? Crawler-Based Search EnginesThey "crawl" or "spider" the web, then people search through what they have found.

    Human-Powered Directories

    Hybrid Search Engines (Source:http://www.searchenginewatch.com)

    ILS 501 / Dr. Liu, ILS SCSU

  • Web site to explain PageRankb1a1b3b4d1d2e1e2c1b2

  • PageRank - MotivationThe number incoming links to a page is a measure of importance and authority of the page.Also take into account the quality of recommendation, so a page is more important if the sources of its incomoing links are important.

    ILS 501 / Dr. Liu, ILS SCSU

  • Expanding the Root Set

  • PageRank

  • *ILS 501 / Dr. Liu, ILS SCSU*Three elements of Crawler-Based Search EngineThe spider (crawler). The spider visits a web page, reads it, and then follows links to other pages within the site. The spider returns to the site on a regular basis, such as every month or two, to look for changes.The index. It is like a catalog containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.Search engine software. It is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what is most relevant. (Source: http://www.searchenginewatch.com)

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*A search engine is an index compilerSearch engines compile their databases by employing "spiders" or "robots" to crawl through web space from link to link, identifying and pages.

    Once the spiders get to a web site, they typically index most of the words on the publicly available pages at the site.

    ILS 501 / Dr. Liu, ILS SCSU

  • Two earch MethodsThe Searchable Subject Index, Search Title & Meta, i.e. YahooThe Full-Text Search Engine Use Spider to search Title but also Content , i.e. Google*ILS 501 / Dr. Liu, ILS SCSU*

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*What are top 10 search providers in 2009They are ..?Ranked by Nielsen MegaView Search:Top 10 Search Providers for August 2009, Ranked by Searches (U.S.)

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*How many types of search engines exist?Three common search engines:

    Directory Subject SearchIndividual search engine Keyword search Metasearch engine Meta search through multi-engines

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*DIRECTORY by SubjectsGalaxyGoGuidesLookSmartNexTagOpenDirectoryYahoo*Zeal

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*INDIVIDUAL SEARCH ENGINES by KeywordsAllTheWebAltaVistaEntirewebGoogleWistNutHotBotLycosYahooNexTagOverTure

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*What is metasearch engine?It does not crawl the web compiling their own searchable databases. Instead, they search the databases of multiple sets of individual search engines simultaneously.It provides a quick way of finding out which engines are retrieving the best results for you in your search.

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*What Are "Meta-Search" Engines? How Do They Work? In a meta-search engine, you submit keywords in its search box, and it transmits your search simultaneously to severalindividual search engines and their databases of web pages. Within a few seconds, you get back results from all the search engines queried. Meta-search engines do not own a database of Web pages; they send your search terms to the databases maintained by search engine companies. From http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/MetaSearch.html

    ILS 501 / Dr. Liu, ILS SCSU

  • Better Meta-SearchersUC Berkeley - Teaching Library Internet Workshops *ILS 501 / Dr. Liu, ILS SCSU*

    Meta-Search ToolWhat's Searched (As of date at bottom of page. They change often.)Complex Search AbilityResults DisplayClusty clusty.com Currently searches a number of free, search engines and directories, not Google or Yahoo. Accepts and "translates" complex searches with Boolean operators and field limiting.Results accompanied with subject subdivisions based on words in search results, intended to give the major themes. Click on these to search within results on each theme.Dogpile www.dogpile.com Searches Google, Yahoo, LookSmart, Ask.com, MSN search, and more. Sites that have purchased ranking and inclusion are blended in. Watch for Sponsored by... links below search results. Accepts Boolean logic, especially in advanced search modes.

    ILS 501 / Dr. Liu, ILS SCSU

  • Meta-Search Engines for SERIOUS Deep DiggingUC Berkeley - Teaching Library Internet Workshops *ILS 501 / Dr. Liu, ILS SCSU*

    Meta-Search ToolWhat's Searched (As of date at bottom of page. They change often.)Complex Search AbilityResults DisplaySurfWax www.surfwax.com A better than average set of search engines. Can mix with educational, US Govt tools, and news sources, or many other categories.Accepts " ", +/-. Default is AND between words. I recommend fairly simple searches, allowing SurfWax's SiteSnaps and other features to help you dig deeply into results.Click on source link to view complete search results there. Click on to view helpful "SiteSnap" extracted from most sites in frame on right. Many additional features for probing within a site.Copernic Agent www.copernic.com Select from list of search engines by clicking the Properties button following Advanced Search search box. ALL, ANY, Phrase, and more. Also Boolean searching within results under Refine (powerful!). Must be downloaded and installed, but Basic version is free of charge. Table comparing versions.

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*METASEARCH ENGINESDogpileClustyIxquickMamma MetaCrawler Metor Profusion qbSearchSurfwaxVivisimo

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Search engine watchFind major search engines, meta search engines, news search engines and more at:1. http://www.searchenginewatch.com/ 2. http://www.searchengineguide.com/

    ILS 501 / Dr. Liu, ILS SCSU

  • Free search engine for your site?For your websiteFreefindAtomzFor your desktopGoogle Desktop 5 (search engine)Microsoft desktop search engineCopernic Desktop Search Professional 3.1Everything*ILS 501 / Dr. Liu, ILS SCSU*

    ILS 501 / Dr. Liu, ILS SCSU

  • Atomz SearchAdd site search to your site in minutes. Create an account. Crawl your site. Add search box to your site.

    *ILS 501 / Dr. Liu, ILS SCSU*

    ILS 501 / Dr. Liu, ILS SCSU

  • http://www.freefind.com/*ILS 501 / Dr. Liu, ILS SCSU*Add a site search engine to your websiteEasy to install:Enter your website address Enter your email addressClick the button. You're done!

    ILS 501 / Dr. Liu, ILS SCSU

  • http://www.freefind.com/*ILS 501 / Dr. Liu, ILS SCSU*

    ILS 501 / Dr. Liu, ILS SCSU

  • Why so many search engines?Because of different .*ILS 501 / Dr. Liu, ILS SCSU*

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Why so many search engines

    Different CoverageThey vary in coverage. In fact coverage is very much incomplete, with the largest search engine providing access to only a minor portion of the web.

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Why so many search engines

    Different Search CapabilitiesThey have different tools and capabilities. Some have NEAR as an operator, some can search by different parameters, and so forth.

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Why so many search engines

    Different Spider or CrawlersThey have different spider or crawlers indexing the web. They go out at different intervals, they crawl to different depths (only the first page, the first three pages, or perhaps all pages), and the spiders differ in indexing techniques.

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Why so many search engines

    Different Ways of Ranking They differ in how they rank items for display after the items are retrieved. Most rank on the basis of how many times the terms you search for are found or where they are found (more weight to higher placement) in the target websites.

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Why so many search engines

    Different Ranking ProtocolsThey differ with protocols. Google, for example, uses an algorithm that ranks output on the basis of the number of other websites that have linked to the websites your search retrieves.

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Portals and ISPsGeneral & Niche portalsfor certain interests/communitiesISPsInternet Service ProvidersProvide domains of online services and portals

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Portal definitions .

    A portal is a Web site that is commonly used as a gateway to other Web sites.(Source: http://www.searchenginewatch.com)

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*What is a Portal?A portal is a client-server application (including web-based interface pages, related java applets, configuration files, and Perl and C-CGI scripts) for use on a organizations web server.

    It is a set of support materials for target community members.

    It is designed to facilitate substantive communication between members in the community(ies).

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Top general portals-by WebHancer.comYahooAltaVistaMSNExciteInfoseekLycosAmerican Online

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Top niche portals by clienthelpdesk.comTop food portals, websites Top health and parenting portals, websites Top baby boomer portals Top portals for seniors Top teen portals and popular Web sites Top women's portals, types of Web sites

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Boolean Search?Boolean search is named after 19th century mathematician George Boole, who developed theories for working with sets of information.

    Boolean search allows you to specify the relationships among your keywords and phrases.

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*What are Boolean search commandsANDORNOTNEARNESTING

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Boolean AND commandThe Boolean AND command is used to require that all search terms be present on the web pages listed in results.

    Your example command is?Cats AND dogs

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Boolean OR commandThe Boolean OR command is used to allow any of the specified search terms to be present on the web pages listed in results.

    Your example command is?house OR home

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Boolean NOT commandThe Boolean NOT command is used to require that a particular search term NOT be present on web pages listed in results. Examples:

    Cats NOT dogs canine NOT dog

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Be careful using the NOT Boolean operator.

    If seek documents on the Mustang automobile, there are many documents retrieved might be about the mustang horse. "Mustang NOT horse?"

    Whats the problem? This search strategy would reject articles or websites that mentioned the term "horse power."

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*The NEAR commandThe NEAR command is used in order to specify how close terms should appear to each other.

    You use the command like this

    moon NEAR river

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Boolean Nesting commandNesting ( ) allows you to build complex queries. You nest queries using parentheses

    Example: impeachment AND (clinton OR johnson)

    ILS 501 / Dr. Liu, ILS SCSU

  • YAHOO! CONFIDENTIAL | *Advanced Search-GoogleSearch with quotes for better phrase matching Keyword + site:www.site.com - Search only a specific siteKeyword + site:www.site.com/folder/ - Search Folder intitle:keyword phrase Titles only with kw Keyword + filetype:ppt/doc/mp3/pdf/etc Search by filetypeKw + site: + folder + filetype Starting to see the power! Kw + site: + folder + filetype + downthemall + prefs = Research Powerhouse - Check a specific folder on a website for a specific file typethen show them all and with one click down load everything in the folder! SEO

  • *ILS 501 / Dr. Liu, ILS SCSU*Why do we need a search strategy?A well designed search strategy:

    saves you time in the long run allows you to search for information in many different places helps you to find a larger amount of relevant information

    ILS 501 / Dr. Liu, ILS SCSU

  • SEO = Search Engine OptimizationUsing targeted keywords and phrases so a websites pages will rank high on SERPs. Note that SEO also stands for Search Engine OptimizerSERP = Search Engine Results PageDefinition

  • *ILS 501 / Dr. Liu, ILS SCSU*What do you need to do before searching?Find the focus of your questionClarify the key conceptsDetermine the key terms for the concepts Prepare alternative terms to describe these conceptsChose a way to start looking

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Search strategy 1

    Use Key Search TermsSubject search or key word search. Remember to keep an eye out for synonyms (other related keywords)

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Search strategy 2

    Use Alternative TermsExperts know how to find and use alternative terms.

    ILS 501 / Dr. Liu, ILS SCSU

  • *ILS 501 / Dr. Liu, ILS SCSU*Search strategy 3

    Use Boolean OperatorsAND finds items containing both terms, and reduces the size of the setOR finds items containing e...