efficient online information searching 251111 internet and online community week 3

44
EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

Upload: oscar-thomas

Post on 26-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

EFFICIENT ONLINE INFORMATION SEARCHING251111 Internet And Online CommunityWeek 3

Page 2: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

REVIEW

• Computer Technologies & The Modern World• Evolution of Communication & Technology

• Telecommunication• Input Devices• Output Devices

• Future Technology• Context Aware Computing

• Breakthrough Technologies

Page 3: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

10 BREAKTHROUGH TECHNOLOGIES 2014

1. Agricultural Drones

2. Ultraprivate Smartphones

3. Brain Mapping

4. Neuromorphic chips

5. Genome Editing

6. Microscale 3-D Printing

7. Mobile Collaboration

8. Oculus Rift

9. Agile Robots

10.Smart Wind & Solar Powerhttp://www.technologyreview.com/lists/technologies/2014/

Page 4: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

THIS WEEK

• Efficient Online Information Searching• How do you search for information?• Search Engines• Search Engine Optimisation (SEO)

Page 5: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

A SEARCH ENGINE

Page 6: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

SEARCH

Documents

DocumentRepresentations

Indexing MatchingRelevance /Feedback

Query

RetrievedDocuments

Page 7: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

MEASURING SEARCH EFFICIENCY

• Recall• (a.k.a. Sensitivity)• Fraction of relevant instances retrieved

• Precision• (a.k.a. Positive Predictive Value)• Fraction of retrieved instances that are relevant

Page 8: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

RECALL & PRECISION

(Walber)

Page 9: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

RETURNED RESULTS

• The “Blue” area represents all the relevant articles

• The “Orange” area represents other articles that could be returned

A A BCBC

Page 10: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

RECALL

• A = Relevant Returned Articles

• C = Relevant Unreturned Articles

Page 11: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

PRECISION

• A = Relevant Returned Articles

• B = Irrelevant Returned Articles

Page 12: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

RECALL & PRECISION

• Suppose there are 200 relevant articles

• A search engine returns 40 articles, of which 25 are relevant…

• What is the recall?

• What is the precision?

Page 13: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

GOOGLE SEARCH REFINEMENT

• Quotes! “…”• Force Google to look for something

• Star Wars I vs Star Wars “I”• Jobs is central LA vs Jobs in central “LA”

• -• Stop Google from looking something• Dolphins –football

• ~• “Is Similar to” – look for synonyms• ~inexpensive

Page 14: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

GOOGLE SEARCH REFINEMENT

• OR or “|”• Bangkok | Chiangmai

• ..• Specify a range

• *• Replaces one or more words• google * my life

Page 15: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

GOOGLE SEARCH REFINEMENT

• allintitle:• makes sure the search appears in the title• allintitle: ken cosh

• cache:• returns cached copy of page

• link:• returns pages that link to the specified page

• site:• restrict results to a particular website

Page 16: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

SEARCH ENGINE MARKET SHARE

• Which Search Engine do you use?• Which is the most popular?

Page 17: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

VIDEO BREAK!

• How Search Works• https://www.youtube.com/watch?v=BNHR6IQJGZs

Page 18: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

SEARCH ENGINES

• A great source of traffic for your site.

• But, how do they decide which sites to display, and which order to display them on their SERPs?• SERPs = Seach Engine Results Pages

• Obviously being #1 in Google for a popular search term will bring you lots of traffic.

Page 19: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

RANKING ALGORITHM

• We don’t know, but it takes plenty of factors into account;• Page Content• Meta tags• Age• Keyword density• Links

• And the algorithm appears to evolve over time.

Page 20: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

GOOGLE’S MAGIC

• Gone are the days when you can just say what your page is about, now its much more technical…

• Much of Google’s magic comes from their patented “PigeonRank” algorithm• http://www.google.com/technology/pigeonrank.html

Page 21: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

PIGEON -> PAGERANK

• PageRank is a numeric value that represents how important a page is on the web.

Page 22: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

PAGERANK

• Google figures that when one page links to another page, it is effectively casting a vote for the other page. • The more votes that are cast for a page, the more important the page

must be.

• The importance of the page that is casting the vote determines how important the vote itself is. • Google calculates a page's importance from the votes cast for it. • How important each vote is is taken into account when a page's

PageRank is calculated.

Page 23: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

PAGERANK

• PageRank is Google's way of deciding a page's importance.

• It matters because it is one of the factors that determines a page's ranking in the search results.

• It isn't the only factor that Google uses to rank pages, but it is an important one.

Page 24: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

LINK FARMS ETC.

• Not all links are counted by Google. For instance, they filter out links from known link farms. Some links can cause a site to be penalized by Google. They rightly figure that webmasters cannot control which sites link to their sites, but they can control which sites they link out to. For this reason, links into a site cannot harm the site, but links from a site can be harmful if they link to penalized sites. So be careful which sites you link to. If a site has PR0, it is usually a penalty, and it would be unwise to link to it.

Page 25: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

CALCULATING PAGERANK

• To calculate the PageRank for a page, all of its inbound links are taken into account. These are links from within the site and links from outside the site. • PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn)) • That's the equation that calculates a page's PageRank. It's the

original one that was published when PageRank was being developed, and it is probable that Google uses a variation of it but they aren't telling us what it is. It doesn't matter though, as this equation is good enough.

Page 26: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

CALCULATING PAGERANK

• PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn))

• 't1 - tn' are pages linking to page A

• 'C' is the number of outbound links that a page has

• 'd' is a damping factor, usually set to 0.85.

Page 27: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

PAGERANK SIMPLIFIED

• We can think of it in a simpler way:- • a page's PageRank = 0.15 + 0.85 * (a "share" of the PageRank of

every page that links to it)

• “share” = the linking page’s PageRank divided by the number of outbound links on the page. • A page "votes" an amount of PageRank onto each page that it

links to. The amount of PageRank that it has to vote with is a little less than its own PageRank value (its own value * 0.85). This value is shared equally between all the pages that it links to.

Page 28: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

PAGERANK

• From this, we could conclude that a link from a page with PR4 and 5 outbound links is worth more than a link from a page with PR8 and 100 outbound links.

• The PageRank of a page that links to yours is important but the number of links on that page is also important.

• The more links there are on a page, the less PageRank value your page will receive from it.

Page 29: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

OR PERHAPS NOT…

• If the PageRank value differences between PR1, PR2,.....PR10 were equal then that conclusion would hold up, but many people believe that the values between PR1 and PR10 (the maximum) are set on a logarithmic scale, and there is very good reason for believing it.

• Nobody outside Google knows for sure one way or the other, but the chances are high that the scale is logarithmic, or similar.

• If so, it means that it takes a lot more additional PageRank for a page to move up to the next PageRank level that it did to move up from the previous PageRank level.

• The result is that it reverses the previous conclusion, so that a link from a PR8 page that has lots of outbound links is worth more than a link from a PR4 page that has only a few outbound links.

Page 30: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

EITHER WAY…

• Whichever scale Google uses, we can be sure of one thing. A link from another site increases our site's PageRank. Just remember to avoid links from link farms.

Page 31: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

SEO

• Search Engine Optimisation• Become an important job for website owners

Page 32: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

WHAT IS SEO?

• Search Engine Optimisation• Making webpages more search engine friendly.

• SEO should be considered from the start.• Domain Name• Site Structure• Site Design• Site Navigation• Site Topics• Headings• Subheadings• Content• Links• Usability• Accessibility

Page 33: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

WHY IS IT IMPORTANT?

• 24% of marketers said that >75% of their traffic comes from search engines• 60% of students use search engines to find online

retailers• 55% of online purchases were made on sites found

through search engines• 80% of users reach sites through search engines• 48% of websites depend on search engines for the

majority of their traffic(Various sources)

Page 34: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

WHY IS IT IMPORTANT?

• Following Search Engine rules.• If your webpage fits the criteria for a certain search term, you’ll get top

ranking.

• Search Engine Optimisers• Modify webpages to fit the criteria to give a page a better chance of being

selected.

Page 35: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

DESIGN WITH SEO IN MIND

• It’s tempting to build a website, and then think about SEO.

• Better to design with SEO in mind

Page 36: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

DOMAIN NAME

• Get a domain name that contains your keywords

• But make sure it is still memorable…

• www.AAA1-Chiang-Mai-Travel-Hotel-Guide-Bookings-Tourist.com• Is not a good domain name!

Page 37: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

WEBSITE STRUCTURE

• Usability• It doesn’t matter how good the content is if the site is frustrating to use.

• Linkability• Remember the internal linking structure, and its effect on PageRank

Page 38: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

WEBSITE DESIGN

• Flash?• NO! Search Engines rely on keywords to classify pages, while

flash is mostly for entertainment. • Search Engines do not index flash files.

• HTML• Yes! It’s easy and spiders have no problem indexing it.• But PHP etc. is fine so long as you use search engine friendly

urls & links

Page 39: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

WEBPAGE CONTENT

• Spiders use the content to know where to categorise each page.• A page with no text (flash site)

• Where should it be put?• A page with lots of text on lots of topics

• Where should it be put? There are too many competing keywords.• The amount of content is also important.

Page 40: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

LINKS

• After content, links are the most important thing…• Some would even argue it’s the opposite way around.• PageRank

• The link text is just as important as the link.• It is tempting to use an attractive graphical button for the link –

but how can the spider associate keywords with the link?

Page 41: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

HOW MANY KEYWORDS?

• Keyword Frequency• The number of times a keyword, or phrase, appears within a page.

• Keyword Density• The ratio of keywords contained in the page within the number of total

indexable words• Perhaps 1-3%

Page 42: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

KEYWORD DENSITY

• Is more complicated than that.• Different search engines have different preferences• Different search engines will also calculate a different density for your page;

• Stop words?• Word Stemming?• Keywords in particular HTML tags

Page 43: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

KEYWORD PROMINENCE

• As well as frequency and density, prominence is also a factor• Words appearing near the beginning of the page, paragraph, sentence.• Certain HTML tags (title)

Page 44: EFFICIENT ONLINE INFORMATION SEARCHING 251111 Internet And Online Community Week 3

KEYWORD PROXIMITY

• How close keywords are together could also be a factor.

• Consider a search for ‘dog biscuits’• “We sell delicious biscuits for all breeds of dogs!”• “We sell the most delicious dog biscuits in the world!”