the players the majors dead search engines international search engines metasearch engines

36
The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Post on 22-Dec-2015

268 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

The Players

The MajorsDead Search EnginesInternational Search EnginesMetasearch Engines

Page 2: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Google

Developed as BackRub by Stanford University students Larry Page and Sergey Brin

Became a private company, and changed name to Google in 1998

One of largest databases >8 billion (they include pages their robots have searched, even if their indexing program hasn’t fully indexed it)

Indexes 3 billion pages every 28 days; 3 million every day

Makes money through powering over 130 portals and Corporate Web sites, and AdWords

Page 3: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines
Page 4: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Google

Google Spidering Uses its own ‘bots to spider web Generally ignores meta keywords and

description tags.

Page 5: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Google

Google Indexing Descriptions (snippets) are formed automatically

by extracting the most relevant portions of pages Finds the first instance of the search term on a

page, then includes the words that appear around this term

Only indexes first 100K or so Some pages don’t have a description - Google

will include a “botted” page even if it has not been “indexed”

Page 6: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Google

Indexes: Web - Indexed Web pages and other file types Ads - Paid advertisements appear on the right side or above search

results under a "Sponsored Links" heading Images - 880 million+ images searched Groups - 845 million+ usenet messages searched News Directory - A ranked version of the Open Directory using Google's

PageRank Froogle - Shopping and product search Catalog Search - Scanned, searchable retail catalogs

Page 7: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Google

Web index subsets: Government sites Military sites University sites Linux sites Apple/Macintosh sites Microsoft sites

Page 8: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Google

New! “Google teams with the libraries of Harvard, Stanford, the University

of Michigan, the University of Oxford, and The New York Public Library to digitally scan books from their collections so that users worldwide can search them in Google…Users searching with Google will see links in their search results page when there are books relevant to their query. Clicking on a title delivers a Google Print page where users can browse the full text of public domain works and brief excerpts and/or bibliographic data of copyrighted material. Library content will be displayed in keeping with copyright law.”

http://www.google.com/press/pressrel/print_library.html

Page 9: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Yahoo! Search

Originally just a subject directory Search engine launched Feb. 2004 Indexes first 500 KB of a Web page Includes some pay for inclusion sites

Page 10: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines
Page 11: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Teoma

Founded in 2000 by a team of scientists from Rutgers University

Teoma means "expert" in Gaelic Acquired by Ask Jeeves, Inc. in

September 2001.

Page 12: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines
Page 13: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Teoma

More than 2 billion English-only web documents Spam, duplicates and pornographic results

removed from index Indexes whole page; no stop words Considers meta-tag descriptions Aims to re-index every month (freshness) Sponsored links from Google Adwords

Page 14: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Teoma

Establishing authority and relevancy: Refine - organizes sites into naturally occurring

communities that are about the subject of each search query

Results - analyzes the relationship of sites within a community, ranking a site based on the number of same-subject pages that reference it (Subject-Specific Popularity)

Resources - identifies expert resources about a particular subject

Page 15: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Gigablast

Founded in 2000 Built and operated by sole proprietor Matt Wells Created to index up to 200 Billion pages with the least

amount of hardware possible Currently indexes 650 million Provides "Gigabits” to help searchers refine their search

based upon related topics from search results Makes money by selling search services to private

companies

Page 16: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines
Page 17: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Wisenut

Newer database ~2001 850 million pages indexed 1.5 billion – identified not crawled/indexed Few advanced search features Spider capable of fetching more than 100 million a day Often months out of date Smart/Relevant: all words on page, text or referring links

and words around them, significance and content of pages with the links

Generates automatic semantic searches called WiseGuide categories

Page 18: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines
Page 19: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

MSN Search

New, improved ~4.2 billion pages search/indexed? Formerly used Inktomi, now has

proprietary robots, indexer, and retrieval engine

Page 20: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines
Page 21: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Dead Search Engines

What ever happened to…?

Direct Hit - defunct, redirecting to Teoma Infoseek – defunct, redirecting to Go Magellan - dead, redirects to WebCrawler Northern Light - defunct Openfind - Under "reconstruction" as of 2003 WebTop - Dead

Page 22: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Dead Search Engines

The search engine formerly know as… AlltheWeb - uses Yahoo! database AltaVista - uses Yahoo! database Excite - uses an InfoSpace meta search Go - took over Infoseek, but now just uses Overture iWon – now uses Google "sponsored" ads, web, and image

databases Looksmart - uses Wisenut search engine Lycos - uses Yahoo!/Inktomi database and LookSmart directory NBCi (formerly Snap) - uses metasearch engine Dogpile WebCrawler - uses an InfoSpace meta search

Page 23: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

International Search Engines

There are hundreds of search engines all over the world. We will not be investigating any of these very closely, but you can use the resources below to locate and master international search engines:

All Search Engines: foreign search engines Search Engines Worldwide Search Engine Colossus Country-specific Search Engines

Page 24: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Metasearch Engines

A search engine that queries other search engines and then combines the results that are received from all

Allows user is not using just one search engine but a combination of many search engines at once to optimize Web searching

Page 25: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Metasearch Engines

The difference among them: Engines covered (many pay-for-placement) # of engines that can be searched at once Sophistication of search query # of records from each search engine Length of time it will search each search engine Delete duplicates (de-duping)

Page 26: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Metasearch Engines

Dogpile Metacrawler Mamma Kart00 Clusty Surfwax Ixquick Fazzle InfoGrid Gimenei

Page 27: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Metasearch Engines

Good for getting a lay of the land: What is out there? Is there anything out there? Who covers a topic best? Learning the names of new or emerging

search engines

Page 28: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Metasearch Engines

Otherwise, usually better off searching multiple SE’s individually:

Syntax varies among search engines and metasearch engines may not allow you to make use of all search engines

May not translate your query well into different SE’s

Page 29: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Metasearch Engines

Check out some cool, value-adding features emerging is metasearch engines

Page 30: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Clusty

Clusty (using Vivisimo clustering engine): Clustering: uses algorithm to put search

results together based on textual and linguistic similarity. Groups further refined using heuristics (i.e., human knowledge) designed to show what users wish to see when they examine clustered documents.

Page 31: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Clusty

“Vivísimo's Clustering Engine lets you see deeper and farther--with less effort--into a large number of search results to:

Get a quick overview of the main themes that relate to the query.

See similar results grouped together for faster access. Find results that are buried in the ranked list and would

otherwise be missed. Discover unexpected results and relationships between

items.”

Page 32: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Mamma

rSort Considers each listing duplicated in more than

one SE as a “vote” for that page. Uses votes to rank pages per the "Condorcet

Method“ One of the big advantages of this ranking

method is the elimination of search engine spam.

Page 33: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Kart00

Interactive Mapping display for results Uses proprietary algorithm to sort pages Relevance of results are displayed as different-sized

pages When you move the pointer over these pages, the

relevant keywords are illuminated and a brief description of the site appears on the left side of the screen

Click keywords to refine the search Refined or further results also displayed on a map

Page 34: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Surfwax

Targeted multi-source searching Searches only sources from specific domains or

topics determined as relevant SurfWax can spider deeper in any site public

site, including pages or parts that are invisible to traditional search engines

Uses a site's existing search syntax to uncover “deeper” content

Page 35: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Ixquick

Understands and translates, when possible, complex syntax

Complete Boolean searching Truncation/wildcard searching

Page 36: The Players The Majors Dead Search Engines International Search Engines Metasearch Engines

Fazzle

Meta-searches SE’s, plus unique searches in news and other invisible web resources

Ranks everything together Delivers timely resources from news

sources Delivers dynamic content missing from

other metasearch engines