mining the “deep web” - mebs.files.wordpress.com · reluctant-entrepreneur.com the “deep...

24
Reluctant-Entrepreneur.com Mining the “Deep Web” Mary Ellen Bates Reluctant–Entrepreneur.com August 16, 2017

Upload: ngothien

Post on 13-Nov-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Reluctant-Entrepreneur.com

Mining the “DeepWeb”

Mary Ellen BatesReluctant–Entrepreneur.comAugust 16, 2017

Reluctant-Entrepreneur.com

What’s the “Dark Web”?

Accessible only through anonymousnetworks

Black-market content (drugs, hackingsoftware, porn)Free-speech forumsDrop-sites for leaks

6

Reluctant-Entrepreneur.com

Search engines are blocked by…

8

Reluctant-Entrepreneur.com

The “Deep Web” is…

Info that search engines can’t easilyfind, get into or read

DatabasesImages, multimedia, statisticsBooks, articlesFacebook and other social media*

*tomorrow at 10:15

9

Reluctant-Entrepreneur.com

Deep Web strategies

Look for the next lead, not The Answer

Treat it as a treasure huntWatch for clues, lists of resources

Build your own “library” of sources

10

Reluctant-Entrepreneur.com

Horticultural resources

plants.usda.gov

garden.org

catalog.extension.org

11

Reluctant-Entrepreneur.com

Business resources

SCORE.org

SBA.gov

inc.com/grow

Pixabay.com (free images)

13

Reluctant-Entrepreneur.com

Finding deep web content

Use search engines for leads:Keywords (database OR dataset ORarchive)Keywords (portal OR resources OR“online tool”)

15

Reluctant-Entrepreneur.com

Finding deep web content

Start with one known source (ass’n,agency, non-profit, library, etc.)

Then find their links to other resources

Look for mentions OF that sitee.g. “consumerhort.org”

16

Reluctant-Entrepreneur.com

Find “similar” sites

SimilarSites.comBased on content, link analysis, userbehavior, etc.Use to find other good sites

17

Reluctant-Entrepreneur.com

We librarians

Librarians build “libguides”Road map for better research on a topicLoaded with deep web links

inurl:libguides ("garden center" ORgardening OR horticulture)

19

Reluctant-Entrepreneur.com

Libraries have deep web!

Stick around for “HiddenDatabases: Accessing PricelessMarket Research… WithoutSpending a Dime”

21

Reluctant-Entrepreneur.com

Insights from other shows

ID relevant conferences("garden show" OR "horticulturalshow") trends

Be sure to limit the search to this yearThen….

22

Reluctant-Entrepreneur.com

Insights from other shows

Find the web site for that showScan workshops, keynotesLook up those speakers, see theirweb pages

23

Reluctant-Entrepreneur.com

Insights from expen$ive reports

ID the title of a useful report

Google mentions of that report andthe word trends

"according to the 2017 NationalGardening Survey" trends

24

Reluctant-Entrepreneur.com

Has a page disappeared?

Try the Wayback Machine(archive.org)

Copies of the page over time

25

26

Reluctant-Entrepreneur.com

Has a page disappeared?

Try the Google cached copyCopy the URL into Google’s search boxClick the next to the link

27

28

Reluctant-Entrepreneur.com

The Deep Web = 2nd pageof Google search results

Go deeper!

Change your settings to 100 results

30

31

Reluctant-Entrepreneur.com

“Hidden” search results

Use Millionshort.com to see otherresults

“Long-tail” search engineEliminates the most popular sitesFind obscure and less-commercialsites

32

33

Reluctant-Entrepreneur.com

Slide deck is at

Reluctant–Entrepreneur.com/extras(or just text me)

Mary Ellen Bates+1 303 772 [email protected]: @mebs

34