semantic searchmonkey

Post on 15-May-2015

5.967 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Semantic Search + SeachMonkey talk given at Yahoo! Hacku event.http://developer.yahoo.com/hackuhttp://developer.yahoo.com/searchmonkey

TRANSCRIPT

Monkey with the Semantic Web

SearchMonkey

Presentation by:

Paul Tarjan, Chief Technical Monkey

(ptarjan@yahoo-inc.com)

Online at:

http://www.slideshare.net/ptarjan/semantic-searchmonkey

The web was / is fragmented

University event page

Friend’s website

Cool bookmarks

Super secret military site

Funny pictures

So we added search to find stuff

University event page

Friend’s website Cool

bookmarks

Super secret

military site

Funny pictures

Google Yahoo

But there are many similar sites

Facebook Events Evite Events Upcoming Events

Youtube Metacafe Vimeo

Digg Reddit Technorati

Let’s treat these as “views” onto “objects”

Wouldn’t it be cool if you could do:

• object:video creator:”Paul Tarjan” length<=60s

Wouldn’t it be cool if you could do:

• object:video creator:http://paulisageek.com/ length<=60s

Wouldn’t it be cool if you could do:

• object:game name:”Desktop Tower Defense” version:1.5 publishdate:”May 2 2005”

Wouldn’t it be cool if you could do:

• object:video author:”The Escapist” game:”Left 4 Dead”

It gets even cooler

Aggregation:

• object:review type:camera make:canon model:D40

Aggregation:

• object:event date:”May 16, 2008” type:party price<$5

Aggregation:

• object:photo person:“Paul Tarjan”

Aggregation:

• object:photo person:http://paulisageek.com

The Semantic What?

• Web pages are views of data for people to read

• Search Engines are a hack • They treat pages as a bucket of words • Lets turn the web into a database • APIs are good, but there is no “web” of APIs •  If you figure out a good way of doing that, let

me know

Ok, I want to do it. Now what?

Recommendation: µF

•  If there is a microformat for your data, use it –  hcard –  hreview –  hresume –  hcalendar –  rel-tag –  rel-licence –  xfn –  hatom –  geo

µF in a nutshell

•  Change your @class to something that is known •  <div>

–  <span class=“name”>Paul Tarjan</span> –  <span class=‘email’>spam@paulisageek.com</span>

•  </div> •  BECOMES •  <div class=“vcard”>

–  <span class=“fn”>Paul Tarjan</span> –  <span class=“email”>spam@paulisageek.com</span>

•  </div>

Recommendation: RDFa

•  If you have data that doesn’t really fit in a µF

• Examples: –  Markup APIs (YUI, javadoc, etc) –  Media (Audios, Videos, Games, Presentations) –  Job Postings

RDFa in a nutshell

• Make a namespace • Use @property, @rel and @resource • For DATA: @property makes the node

contents into the value • For URLs: @rel makes the @resource into

the value

Normal HTML

•  <html> …

<div class="private”> private static String <strong>_createCookieHash </strong> (hash) …

RDFa: example

•  <html xmlns:yui="http://yuilibrary.com/rdf/1.0/yui.rdf#"> …

<div class="private” rel="yui:method" resource="#method__createCookieHash"> private static String <strong property="yui:name"> _createCookieHash </strong> (hash) …

That’s it!

• Automatically picked up by semantic parsers / crawlers

• Can build a SearchMonkey app on it • Can make a mashup way easier than screen

scraping • Can get the data from Yahoo! BOSS

an open platform for using structured data to build more useful and relevant search results

Before After

What is SearchMonkey?

Enhanced Result: Zagat

Key/Value Pairs or Abstract

Links Image

Infobar: Wikipedia Preview

Summary Blob

Part of the puzzle

SearchMonkey

Semantic markup on web pages

Semantic vocabularies

Vocabularies

• Need to speak the same language •  I like to see girls of that... caliber. • English, French, Spanish, Esparanto? • URLs to the rescue

–  Dublin Core (http://purl.org/dc/elements/1.1/) –  Friend of a Friend (http://xmlns.com/foaf/0.1/) –  X-Friend Network (http://gmpg.org/xfn/11/) –  … (many more)

Syntax

• Nouns, Verbs, and Adjectives, oh my! • All phrases become lots of triples •  (Subject, Verb / Adj. / Prep. / etc, Object) • Key / Value pairs ++

–  Everything is a URL or String –  Subject doesn’t have to be the document

Syntax 2

• Key / Value pair –  Title = Awesome SearchMonkey Presentation –  Homepage =

http://search.yahoo.com/searchmonkey

• Triples –  (self, http://purl.org/dc#title, “Awesome

SearchMonkey Presentation”) –  (self, http://vcard#url,

http://search.yahoo.com/searchmonkey)

Decompose to triples

• My friend “Bob” is an idiot. –  (self, http://xmlns.com/foaf/0.1/knows,

genid:Ui__152310312_366) –  (genid:Ui__152310312_366, http://

www.w3.org/2001/vcard-rdf/3.0#fn, “Bob”) –  (genid:Ui__152310312_366, http://

example.org/ptarjan/isInstanceOf, http://example.org/ptarjan/idiot)

• Unnamed nodes are O.K.

Writing URLs takes a lot of work!

•  xmlns:foaf=http://xmlns.com/foaf/0.1/ •  xmlns:vcard=http://www.w3.org/2001/vcard-rdf/

3.0# •  xmlns:junk=http://example.org/ptarjan/ •  My friend “Bob” is an idiot.

–  (self, foaf:knows, genid:Ui__152310312_366) –  (genid:Ui__152310312_366, vcard:fn, “Bob”) –  (genid:Ui__152310312_366, junk:isInstanceOf, junk:idiot)

•  Unnamed nodes are O.K.

RDFa

•  <html xmlns:foaf=“http://xmlns.com/foaf/0.1” xmlns:vcard=http://www.w3.org/2001/vcard-rdf/3.0# xmlns:junk=http://example.org/ptarjan/> <div rel=“foaf:knows”> <span property=“vcard:fn”>Bob</span> <span rel=“junk:isInstanceOf” resource=“junk:idiot” /> </div> </html>

•  </SemanticWeb>

• Questions?

Innards of SearchMonkey

• You build a web-service inside our framework

• When a search page renders –  We check which SM apps are enabled –  We call them

• 50ms for in-page • Long time for AJAX

–  They return data in our template –  We render them (and cache)

Prototyping with XSLT

• What if I don’t have structured data? –  I don’t own the site –  I do own the site, but I want to prototype first

• Build an XSLT custom data service first –  Write some XSLT to extract the data and

transform it into DataRSS –  Mostly about finding the right XPath (use

Firebug or XPather ) –  Quick to implement, but brittle –  Can’t do a good Enhanced Result

Do it for real

• Demo

Examples

• Rubic’s cube • VTA Bus • API Monkey • BugMeNot • RetailMeNot • Amazon

questions?

top related