semweb install-fest presentation

Download SemWeb install-fest presentation

If you can't read please download the document

Upload: andraz-tori

Post on 16-Apr-2017

1.888 views

Category:

Technology


0 download

TRANSCRIPT

Zemanta Getting Personal

Building upon the Zemanta API

Andraz Tori, [email protected]: andraz

Overview

General purpose

Functionality

Examples, demos & use-cases

What does it do?

A Stargate

ComputerProcessableDataHumanUnderstandableText

=

Initial design

Input: a chunk of text

Domain agnostic!

Avoid proprietary entity identifiers or taxonomies

Standard response formats: JSON, XML, RDF/XML

Cross domain we didn't start with financial or health domain and then expanded our algorithms, we started from day one with cross domain capabilities

Most used What gives?

Most interesting Most obvious Tags

Categories

Concepts and entities

Related articles

Related images

Tags

Words, phrases

Interesting tags Explicitely mentioned

What the text is about as a whole

What concepts were not mentioned, but could be relevant (for SEO)

Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose fromFor example text mentioning Apple, Android and Google might get iPhone as a tagAnd mobile web as a tag, even when it wasn't mentioned anywhere.

Categories

Deep hirarchy (100k categories)

Customized smaller taxonomies

Good for content organization, ad-targeting, etc

Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose fromFor example text mentioning Apple, Android and Google might get iPhone as a tagAnd mobile web as a tag, even when it wasn't mentioned anywhere.

Categories example

Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.

An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.

First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.

Categories

Top/Society/History/By_Time_Period/Twentieth_Century/Cold_War (0.11)

Top/Arts/Comics/Reviews (0.10)

Top/Society/History/By_Time_Period (0.08)

Top/Arts/Comics (0.08)

Top/Society/History/By_Time_Period/Twentieth_Century (0.08)

Top/Society/History (0.08)

Top/Shopping/Publications/Books (0.08)

Top/Shopping/Publications/Books/Fiction (0.08)

Categories example

Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.

An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.

First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.

Concepts and entities

Identify relevant concepts and entities

All disambiguated!

At least one URL for each concept, possibly more

Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer

How we disambiguate

Use knowledge from Wikipedia, Freebase, Dmoz, third party databases...

Mine the web

Use knowledge from choices of our users

Use both semantic data and statistics based methods

Linking to...

Traditional

.........

Semantic

How to build upon this

Step 1: We give you exact identifiers

Step 2: Then you look up the information about them (connections, images, ) in your or third party databases

Step 3: ?

Step 4: Profit!

We are big fans of Freebase and Linking Open Data project

Discovery example

A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York, the Federal Aviation Administration says.

Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.

The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte, is partially submerged.

It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike.

You get

A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York, the Federal Aviation Administration says.

Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.

The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte, is partially submerged.

It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike.

entitiesconcepts

Or more precisely...

LaGuardia Airport http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000018f654
LaGuardia Airport http://dbpedia.org/resource/LaGuardia_Airport
Federal Aviation Administrationhttp://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000017df0
Federal Aviation Administration http://dbpedia.org/resource/Federal_Aviation_AdministrationHudson River http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000005ebb5
Hudson River http://dbpedia.org/resource/Hudson_River
Airbus A320 family http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000012f918
Airbus A320 family http://dbpedia.org/resource/Airbus_A320_family
Bird strike http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000004744df
Bird strike http://dbpedia.org/resource/Bird_strike
US Airways http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000001b4dc5
US Airways http://dbpedia.org/resource/US_Airways
New York http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000054dd5d
New York http://dbpedia.org/resource/New_York
Charlotte, North Carolina http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000006e148
Charlotte, North Carolina http://dbpedia.org/resource/Charlotte%2C_North_Carolina
Ferr http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000063292
Ferry at http://dbpedia.org/resource/Ferry

You can query relationships

http://test.infoblow.zemanta.com/infoblow/galaxy/

Or more complex ones...

Concepts and entities
use cases

Quick 'overviews' of topics

Discovery-supporting user interfaces

Automatic deep information delivery (hoovers, widgets)

Balloons example

Deliver deep information on exact concepts and entities

Fantastic public graph

Information about concepts/entities

Types: human, building, location...

Relationships with other entities

Hard data: dates, places, amounts

Connected Dream?
September 2008

Connected Dream?
July 2009

Opportunities in leveraging linked data

There are internal and external benefits of linking into larger pool of exact data

Pulling together custom data becomes orders of magnitude easier

However we still miss strong success stories

Related articles

20k blogs and media sites

You can provide your own list of feeds to recommend from

Or use our 'global whitelisted pool'

Related articles use cases

Better experience for the readers

Information discovery (for authors)

Creating interlinked mini-comunities (example: bloggers using our tool to discover others in the niche)

Related images

From Wikipedia, Flickr, Daylife, Amazon, Last.fm, Snooth, social networks

We filter totally unacceptable licenses out, keep the rest

Each image has a license spelled out, developer/author choses

Zemanta API

http://developer.zemanta.com

Examples in Java, Javascript, Python, Ruby, PHP, Perl, C#...

JavaScript SDK for quick custom CMS integration

Up to 10.000 requests/day free!

Ease of API use

import urllib, simplejson, pprint

args = {'format': 'json',

'method': 'zemanta.suggest',

'api_key': 'np9cbnby9x8tsc47recwuhqm',

'return_categories': 'dmoz',

'return_rdf_links': 1,

'text': ''' Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credit An ageing vigilante, The Comedian, is attacked ...

'''}

args_enc = urllib.urlencode(args)

response_raw = urllib.urlopen(http://api.zemanta.com/services/rest/0.0/, args_enc).read()

response = simplejson.loads(response_raw)

pprint.pprint(response)

Works for

All kinds of texts (not just financial or journalistic articles)

Tweets!

Wherever you need to go from text documents to something structured to put into your algorithm/data store

Some API users

How the API is used?

Place extraction and disambiguation used by Outside.in

Analysis of tweets used by Klout.net

Custom categorization used by Slideshare

Semantic tagging used by Faviki

CommonTag

Initiative by AdaptiveBlue, DERI (NUI Galway), Faviki, Freebase, Yahoo!, Zemanta, and Zigtag

Exact tagging

RDFa as a transport layer

Freebase & LOD as vocabularies

Full-circle ecosystem from day one (publishers, services, better search, better browsing)

Zigtag, Faviki, AdpativeBlue, Zemanta, Yahoo, Freebase

The next web

... the next web will be like a great party host, introducing us to each other and bringing us together into meaningful conversation.Marta Strickland, Organic

The future?

Zemify me up, Scotty!

Andraz [email protected]: andraz

Image attributions

http://www.flickr.com/photos/constanzavolare/2475833775/in/photostream/
CC by Constanza Volare

Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level