searching the internet - what patent searchers should know
TRANSCRIPT
![Page 1: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/1.jpg)
searching the internet what patent searchers should know
Eric Sieverts
WON, 11-12-2012
UB Utrecht HvA-MIC GO Opleidingen
![Page 2: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/2.jpg)
agenda
• searching the web• the volatile google landscape• smart searching• dating and back to the past• reliability • google options• beyond google • beyond general web search• the social landscape
![Page 3: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/3.jpg)
agenda
generalweb
search
specificmaterialsearch
importance of specificmaterialtypes?
the generalweb?=?
everything
how to … how to …
when& why
![Page 4: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/4.jpg)
an ever changing google landscapean ever changing google landscape
• unreliable numbers • irreproducible results• disappearing functions• changing interfaces
![Page 5: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/5.jpg)
"coping" with numbers of results
in structured databases the effect on the number of results of how you combine terms, generally meets expectations, but:
• with Google (and other web search) numbers are not stable, irreproducible, unreliable, with inexplicable effects– refine with an AND-relation may increase number of results – expand with an OR-relation may decrease number of results– numbers are only extrapolations from small part of search index– depends on distribution of the index over servers– depends on Google version, browser, whether logged in, history, ...– not just Google: Bing results also depend on geographic setting
• Danny Sullivan explains why Google can not calculate: http://searchengineland.com/why-google-cant-count-results-properly-53559
Why Google Can’t Count Results Properly
![Page 6: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/6.jpg)
Google as a vanishing machine
some services and options disappear completely– timeline, wonder wheel, toolbar, ...– + operator– real time results, code search – google buzz, google wave, google directory, ...
others are only hidden– links for advanced search and for settings hidden under “cog
wheel” (sometimes dependent on browser)– Scholar, Patents and Groups no longer mentioned in menus– backlink search no longer in advanced search– search for "similar" pages & "cache"-link are hidden in "invisible"
pop-up page preview – …
![Page 7: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/7.jpg)
![Page 8: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/8.jpg)
![Page 9: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/9.jpg)
like faceted search in for instance Scopus
![Page 10: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/10.jpg)
refinements and additional functionslike in modern "web scale discovery" systems
but meanwhile
this is already
an "old" interface !
![Page 11: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/11.jpg)
google.nl [until 2 weeks ago]
tools & facets from clear left columnto blurry top menu (for mobile's sake?)
google.com
![Page 12: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/12.jpg)
all options by material type, in old interface
![Page 13: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/13.jpg)
Google tries outsmarting us
Google tries to improve and to broaden your queries• automatic spelling corrections (veilgheid >> veiligheid)
• search for words with same word stem (singular/plural, verb, conjugation, inflection, …)
• expands acronyms (jfk >> john f kennedy | wwii >> world war II)
• adds synonyms (vaccination >> immunization)
• transforms separate words to compound term & vice versa (veiligheid maatregel >> veiligheidsmaatregel | catfood >> cat food)
• may leave out term as optional if not differentiating enough
more often and elaborate in English than in Dutch
• personalises search, based on previous search behaviour
and if you don't like all of this ........
never sure what/when or not
>> "verbatim"
![Page 14: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/14.jpg)
![Page 15: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/15.jpg)
new option introduced early 2012verbatim
on google.nl: "woord voor woord"
option
recently
moved to
top menu
![Page 16: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/16.jpg)
![Page 17: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/17.jpg)
![Page 18: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/18.jpg)
standard semantic codingallowed Google to make arecipe search engine"embedded metadata"
standardisation of property descriptions in HTML
of recipe pages, with"microformats"/"rich snippets markup"
![Page 19: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/19.jpg)
Google's "Knowledge Graph"knows 500 million objects with 3,5 billion properties(but only in English)
![Page 20: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/20.jpg)
dates
??no
![Page 21: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/21.jpg)
publication dates
• limitation while searching google– before search: only "past day/week/month/year"– after search: also limitation on custom range "from .. to .."
search tools:
![Page 22: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/22.jpg)
publication dates
• limitation while searching google– before search: only "past day/week/month/year"
– after search: also limitation on custom range "from .. to .."
• how reliable are google's dates? NOT
• how else to determine date?– look at page text (especially top and bottom or blogging date)
– look in page source (HTML) for metadata
– try entering javascript in browser URL bar
but does NOT work for CMS generated pages
– look for indexing date in Google cache
– try to find recent time stamped version in Web Archive (waybackmachine)
javascript:alert(document.lastModified)
![Page 23: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/23.jpg)
![Page 24: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/24.jpg)
![Page 25: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/25.jpg)
![Page 26: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/26.jpg)
disappeared / old versions of pages
• recently disappeared: try search engine cache
not just google! :
Bing
Yahoo
Exalead
![Page 27: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/27.jpg)
disappeared / old versions of pages
for older versions: try web archive (waybackmachine)http://archive.org
• links within same site are mostly working
• if particular page has not been crawled, they show which other pages on that site have been crawled
• some pages/sites have only recently been crawled
• other pages/sites go far back in time
• if domain name has changed, you must use the old name
• some sites don't want to be crawled
![Page 28: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/28.jpg)
![Page 29: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/29.jpg)
![Page 30: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/30.jpg)
![Page 31: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/31.jpg)
but sometimes:
![Page 32: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/32.jpg)
intermezzo aboutintermezzo abouttrust and integritytrust and integrity
![Page 33: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/33.jpg)
reliability & integrity - general
general website assessment criteria• professional lay-out
• indication of author/organisation (“about us”)
• data about organisation: address, telephone, map/driving directions
• indication of targeted audience
• not too many advertisements and pop-ups (although every site has them)
• clear navigation
• internal search option
• speed of web server
• backlinks from well known organisations **
• up to date-ness (with date given)
• language use
• interpret the URL/domain-name (eg: edu, edu.au, edu.sg, edu.ng, edu.lb, ac.uk, gov, gov.uk, gov.hk, gov.au, gov.on.ca, gob.es, gob.mx, gob.ve, gob.ec, ...)
![Page 34: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/34.jpg)
reliability & integrity - organisation
Information about organisation• Google pagerank (backlinks)
use for instance: http://www.prchecker.info/
http://www.checkpagerank.net/
• Alexa rank (web traffic)see for instance: http://www.alexa.com/
http://www.seomastering.com/alexa-rank-checker.php
• domain owneruse for instance: http://centralops.net/co/DomainDossier.aspx
http://whois.domaintools.com/
• search for "backlinks"
![Page 35: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/35.jpg)
![Page 36: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/36.jpg)
![Page 37: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/37.jpg)
reliability & integrity - backlinks
search backlinks to particular web-page/-site
• Google: link:http://www.domain.zz/folder/file.htmlvery incomplete result
• Yahoo site explorer: died last year
• DuckDuckGo: link:http://www.domain.zz/folder/file.htmloften > google; no total numbers given
• OpenSiteExplorer: linking pages + linking domainsvery complete; also domain & page authoritypaid subscription if more than 3 queries /day
• Exalead: link:http://www.domain.zz/no backlinks to specific page, but to whole site
• Alexa: 100 most important domains backlinking to site
![Page 38: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/38.jpg)
after9
nomore
results
the 35 sites mentioned under
"reputation"
![Page 39: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/39.jpg)
![Page 40: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/40.jpg)
![Page 41: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/41.jpg)
totallist:30
results
![Page 42: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/42.jpg)
![Page 43: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/43.jpg)
![Page 44: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/44.jpg)
backlinks - variable ratios
reported # backlinks google DDG OSE
homepage1 17 9 2016
deeppage1 4 0 30
deeppage2 9 30 224
![Page 45: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/45.jpg)
some more "how to"
• domain search: site:edu OR site:edu.* [for all edu
(sub)domains]site:shell.com OR site:philips.com
• url search: inurl:novelty
• title search: intitle:catalytic
• filetype search: filetype:pdffiletype:xls OR filetype:xlsxfiletype:doc OR filetype:docxfiletype:rss
• exact search: "greenhouses“ [or VERBATIM for all words]
more than shown inadvanced searchdrop-down menu
jjuusstt
![Page 46: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/46.jpg)
![Page 47: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/47.jpg)
![Page 48: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/48.jpg)
search engines besides google• Bing microsoft, large• Yahoo! content=Bing, large• Blekko uses hashtags to search more [domain-] selective
also many predefined hashtags; e.g. /likes for Facebook• DuckDuckGo assures privacy, no personalisation, no filter-bubble,
rather small, !Bang-function offers many extras • Gigablast green search engine, rather small, some unique functions• Exalead french, many advanced functions, primarily demo system• Millionshort leaves out results from most popular sites the long tail• WolframAlpha knowledge engine, facts, calculations
together, these others have 30% market share in US; in NL only 3%
• Yandex in Russia more popular than Google• Baidu in China more popular than Google• Naver, Daum in South Korea more popular than Google• Seznam in Czechia more popular than Google
general
![Page 49: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/49.jpg)
material type specific search
blogs google blogs, icerocket, technorati[rss] CTRLQ, RSS SearchHub
video google video, youtube, youtube edu channel, bing video, blinkx, voxalead-news
images google image, yahoo image, bing image, flickr,tineye (ip-check), panoramio (geo-search)
science google scholar, microsoft academic, scirus,oaister, scientific commons, science.gov
nieuws google news, yahoo news, bing news, cnn, bbc,historische kranten KB, historic american newspapers (LOC)
tweets twitter search, topsy, tweetzi, postpost, snapbird
social socialsearcher, socialmention, samepoint, whostalkin, kurrently
forums google groups, omgili, boardtracker
![Page 50: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/50.jpg)
![Page 51: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/51.jpg)
![Page 52: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/52.jpg)
![Page 53: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/53.jpg)
![Page 54: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/54.jpg)
![Page 55: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/55.jpg)
![Page 56: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/56.jpg)
![Page 57: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/57.jpg)
57
tweets & social search
• Twitter in 140 characters – often with shortened links– often with photo- or video-link – often with hashtags (#agreeduponkeyword)
search (often limited to last 1 - 2 weeks, and .... to those 140
characters)– twitter-search (also advanced search), tweetzi, …– topsy (also older messages)– postpost (your own timeline - i.e. everything you're following)– snapbird (full tweet history of 1 person – by his/her twittername)– twicsy (photo's on twitter) – ...
overview/review of tools: All the easiest ways to search old tweets
![Page 58: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/58.jpg)
![Page 59: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/59.jpg)
![Page 60: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/60.jpg)
![Page 61: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/61.jpg)
![Page 62: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/62.jpg)
![Page 63: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/63.jpg)
![Page 64: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/64.jpg)
![Page 65: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/65.jpg)
![Page 66: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/66.jpg)
66
tweets & social search
• “Real time / social search engines”– socialsearcher, socialmention, samepoint, whostalkin, kurrently,
… (tweets + blogs + facebook + …)
– Google personal results / Google+ ("search plus your world") – real-time pictures: skylines
• Forum discussions– omgili, boardtracker, ...– Google groups (also old newsgroup discussions)
for research methods:– advice from Henk van Ess (dutch): "de digitale detective" (2012)– How to: use social media in newsgathering (2012)– 100+ Social Media Monitoring Tools (2010)
![Page 67: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/67.jpg)
![Page 68: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/68.jpg)
![Page 69: Searching the internet - what patent searchers should know](https://reader033.vdocuments.mx/reader033/viewer/2022052823/55579a57d8b42aa3378b5077/html5/thumbnails/69.jpg)
the end
any questions?