google hacking university of sunderland csem02 harry r erwin, phd peter dunne, phd
DESCRIPTION
Google Queries Non-case sensitive * in a query stands for a word ‘.’ in a query is a single character wildcard Automatic stemming Ten-word limit AND (+) is assumed, OR (|) and NOT (-) must be entered “” for a phraseTRANSCRIPT
Google Hacking
University of SunderlandCSEM02
Harry R Erwin, PhDPeter Dunne, PhD
Basics
• Web Search• Newsgroups• Images• Preferences• Language Tools
Google Queries
• Non-case sensitive• * in a query stands for a word• ‘.’ in a query is a single character wildcard• Automatic stemming• Ten-word limit• AND (+) is assumed, OR (|) and NOT (-) must be
entered• “” for a phrase
More Queries
• You can control the language of the pages and the language of the reports
• You can restrict the search to specific countries
Controlling Searches• Intitle, allintitle• Inurl, allinurl• Filetype• Allintext• Site• Link• Inanchor• Daterange• Cache• Info
• Related• Phonebook• Rphonebook• Bphonebook• Author• Group• Msgid• Insubject• Stocks• Define
Controlling Searches (II)
• These operators can be used to restrict searches.
• To restrict the search to the university: site:sunderland.ac.uk
• Or to search for seventh moon merlot in the uk: “seventh moon” merlot site:uk
Typical Filetypes
• Pdf• Ps• Xls• Ppt• Doc• Rtf• Txt
Why Google
• You access Google, not the original website.
• Most crackers access any site, even Google via a proxy server.
• Why? If you access the cached web page and it contains images, you will get the images from the original site.
Directory Listings
• Search for intitle:index.of• Or intitle:index.of “parent directory”• Or intitle:index.of name size• Or intitle:index.of inurl:admin• Or intitle:index.of filename• This can then lead to a directory traversal• Look for filetype:bak, too, particularly if you want
to expose sql data generated on the fly
Commonly Available Sensitive Information
• HR files• Helpdesk files• Job listings• Company information• Employee names• Personal websites and blogs• E-mail and e-mail addresses
Network Mapping
• Site:domain name• Site crawling, particularly by indicating
negative searches for known domains• Lynx is convenient if you want lots of hits:
– lynx -dump “http://www.google.com/search?\– q=site:name+-knownsite&num=100” >\– test.html
• Or use a Perl script with the Google API
Link Mapping
• Explore the target site to see what it links to. The owners of the linked sites may be trusted and yet have weak security.
• The link operator supports this kind of search.
• Also check the newsgroups for questions from people at the organization.
Web-Enabled Network Devices
• The Google webspider often encounters web-enabled devices. These allow an administrator to query their status or manage their configuration using a web browser.
• You may also be able to access network statistics this way.
Searches to Worry About
• Site:• Intitle:index.of• Error|warning• Login|logon• Username|userid|
employee.ID| “your username is”
• Password|passcode| “your password is”
• Admin|administrator• -ext:html -ext:htm
-ext:shtml -ext:asp -ext:php
• Inurl:temp|inurl:tmp| inurl:backup|inurl:bak
• Intranet|help.desk
Protecting Yourselves
• Solid security policy• Public web servers are Public!• Disable directory listings• Block crawlers with robots.txt• <META NAME=“ROBOTS”
CONTENT=“NOARCHIVE”>• NOSNIPPET is similar.
More Protection
• Passwords• Delete anything you don’t need from the
standard webserver configuration• Keep your system patched.• Hack yourself• If sensitive data gets into Google, use the
URL removal tools to delete it.