semantic wiki, great candidate for knowledge acquisition

46
From Text and Data to Knowledge: Via Semantic Wikis The Social Semantic Web in the Small Jesse Wang

Upload: jesse-wang

Post on 29-Oct-2014

623 views

Category:

Technology


0 download

DESCRIPTION

this is a high-level pitch deck for knowledge acquisition (KA), beside the textual part. We already decide on matter that we need low level textual entailment based KA, while the high-level part involving more human computation is partially ignored at the point of presentation. This deck is an introduction to social semantic web and let people know how it can help with our KA tasks.

TRANSCRIPT

Page 1: Semantic Wiki, Great Candidate for Knowledge Acquisition

From Text and Data to Knowledge: Via Semantic Wikis

The Social Semantic Web in the Small

From Text and Data to Knowledge: Via Semantic Wikis

The Social Semantic Web in the Small

Jesse Wang

Page 2: Semantic Wiki, Great Candidate for Knowledge Acquisition

2

The Bottleneck of AI is Knowledge AcquisitionThe Bottleneck of AI is Knowledge Acquisition

Human Intelligenc

e

Computer Intelligenc

e

Page 3: Semantic Wiki, Great Candidate for Knowledge Acquisition

3

COMPUTER INTELLIGENCE IS IN THE CONNECTIONSCOMPUTER INTELLIGENCE IS IN THE CONNECTIONS

Page 4: Semantic Wiki, Great Candidate for Knowledge Acquisition

Connecting both Information and PeopleConnecting both Information and People

Connections between people

Conn

ectio

ns b

etwe

en In

form

atio

n

Email

Social Networking

Groupware

JavascriptWeblogs

Databases

File Systems

HTTPKeyword Search

USENET

Wikis

Websites

Directory Portals

2010 - 2020

Web 1.0

2000 - 2010

1990 - 2000

PC Era1980 - 1990

RSSWidgets

PC’s

2020 - 2030

Office 2.0

XML

RDF

SPARQLAJAX

FTP IRC

SOAP

Mashups

File Servers

Social Media Sharing

Lightweight Collaboration

ATOM

Web 3.0

Web 4.0

Semantic SearchSemantic Databases

Distributed Search

Intelligent personal agents

JavaSaaS

Web 2.0 Flash

OWL

HTML

SGML

SQLGopher

P2P

The Web

The PC

Windows

MacOS

SWRL

OpenID

BBS

MMO’s

VR

Semantic Web

Intelligent Web

The Internet

Social Web

Web OS

Page 5: Semantic Wiki, Great Candidate for Knowledge Acquisition

5

At Multiple Levels of UnderstandingAt Multiple Levels of Understanding

Signal entity (Words)

Signal form (Syntax)

Signal semantics (Concepts)

Categories (taxonomy)

Statements

Models

Decision-making

Page 6: Semantic Wiki, Great Candidate for Knowledge Acquisition

6

HOW DO WE CAPTURE ALL? HOW DO WE CAPTURE ALL? At least, the semantics?

Page 7: Semantic Wiki, Great Candidate for Knowledge Acquisition

Two Paths for Semantics (>>KB Construction)Two Paths for Semantics (>>KB Construction)

“Bottom-Up” – Add semantic metadata to pages and databases all over the Web

• Alternatively train models to extract above info (machine-assisted)– Every Website becomes semantic

• except for those not tagged, trained, or errors

“Top-Down”– Experts build models and rules for semantics– Create services that provide this as an overlay to non-semantic

Web– Every website becomes semantic

• except for those not covered

-- Alex Iskold

Page 8: Semantic Wiki, Great Candidate for Knowledge Acquisition

Five Approaches to SemanticsFive Approaches to Semantics

Tagging

Statistics

Linguistics

Semantic Web

Artificial Intelligence

Page 9: Semantic Wiki, Great Candidate for Knowledge Acquisition

The Tagging ApproachThe Tagging Approach

Pros– Easy for users to add and read tags– Tags are just strings– No algorithms or ontologies to deal with– No technology to learn

Cons– Easy for users to add and read tags– Tags are just strings– No algorithms or ontologies to deal with– No technology to learn

Technorati

Del.icio.us

Flickr

Wikipedia

YouTube

Page 10: Semantic Wiki, Great Candidate for Knowledge Acquisition

The Statistical ApproachThe Statistical Approach

Pros: – Pure mathematical algorithms– Massively scalable with good training data– Language independent

Cons: – No understanding of the content– Hard to craft good queries– Best for finding really popular things – not good at finding needles in haystacks– Limited by data (esp. quality training data)– Not great for sparse structured data with strong inherent semantics

Google

Lucene

Autonomy

Farecast (Bing Travel)

Page 11: Semantic Wiki, Great Candidate for Knowledge Acquisition

The Linguistic ApproachThe Linguistic Approach

Pros:– Almost-true language understanding– Extract knowledge from text– Best for search for particular facts or relationships– More precise queries

Cons:– Computationally intensive– Difficult to scale– Lots of special case and other errors– Language-dependent

Powerset

Hakia

Inxight, Attensity, and others…

Page 12: Semantic Wiki, Great Candidate for Knowledge Acquisition

The Semantic Web ApproachThe Semantic Web Approach

Pros:– More precise queries– Smarter apps with less work– Not as computationally intensive– Share & link data between apps– Works for both unstructured and structured data

Cons:– Lack of tools– Difficult to scale– Who makes all the metadata?

Radar Networks

DBpedia Project

Metaweb (Freebase)

Page 13: Semantic Wiki, Great Candidate for Knowledge Acquisition

The Artificial Intelligence ApproachThe Artificial Intelligence Approach

Pros:– Smart in narrow domains– Answer questions intelligently– Reasoning and learning

Cons:– Computationally intensive– Difficult to scale– Extremely hard to program– Does not work well outside of narrow domains– Training takes a lot of work

Cycorp

AURA (Project Halo)

Page 14: Semantic Wiki, Great Candidate for Knowledge Acquisition

The Approaches ComparedThe Approaches Compared

Make the software smarter

Make the Data Smarter

Statistics

Linguistics

SemanticWeb

A.I.

Tagging

Page 15: Semantic Wiki, Great Candidate for Knowledge Acquisition

In PracticeIn Practice

TaggingSemantic WebStatisticsLinguisticsArtificial intelligence

Page 16: Semantic Wiki, Great Candidate for Knowledge Acquisition

16

From Tagging to AIFrom Tagging to AI

Data Structure

Intelligence

Page 17: Semantic Wiki, Great Candidate for Knowledge Acquisition

The Semantic Web is a Key EnablerThe Semantic Web is a Key Enabler

Moves the “intelligence” out of applications, into the data

Data need special structures

becomes self-describing; Meaning of data becomes part of the data

Apps can become smarter with less work, because the data carries knowledge about what it is and how to use it

Data can be shared and linked more easily

Page 18: Semantic Wiki, Great Candidate for Knowledge Acquisition

The Semantic Web = Open Database Layer for the WebThe Semantic Web = Open Database Layer for the Web

User

ProfilesWeb

ContentData

RecordsApps &

ServicesAds &

Listings

Open Data Mappings

Open Data Records

Open Rules

Open Ontologies

Open Query Interfaces

Page 19: Semantic Wiki, Great Candidate for Knowledge Acquisition

And The Web IS the Database!And The Web IS the Database!

Application A Application B

ColdplayBand

Palo AltoCity

JanePerson

IBMCompany

DavePerson

BobPerson

DesignTeamGroup

StanfordAlumnae

Group

IBM.comWeb Site

123.JPGPhotoDave.com

Weblog

SuePerson

JoePerson

Dave.comRSS Feed

Lives in

Publisher of

Friend of

Depiction of

Depiction of

Member of

Married to

Member of

Member of

Member of

Fan of

Lives in

Subscriber to

Source of

Author of

Member of

Employee of

Fan of

Page 20: Semantic Wiki, Great Candidate for Knowledge Acquisition

20

BUT THERE IS STILL SOMETHING MISSINGBUT THERE IS STILL SOMETHING MISSING

Page 21: Semantic Wiki, Great Candidate for Knowledge Acquisition

21

We Need Put Ourselves Into The Semantic Web!

Page 22: Semantic Wiki, Great Candidate for Knowledge Acquisition

22

In Every Part or Layer of the Semantic Web, We NeedIn Every Part or Layer of the Semantic Web, We Need

People’s Involvement(Wisdom of the Crowd)

Page 23: Semantic Wiki, Great Candidate for Knowledge Acquisition

23

Now a Complete WebNow a Complete Web

Social Semantic WebHuman Machine Web

Page 24: Semantic Wiki, Great Candidate for Knowledge Acquisition

24

Crowd Wisdom To Best Map Human Knowledge for HumanCrowd Wisdom To Best Map Human Knowledge for Human

Page 25: Semantic Wiki, Great Candidate for Knowledge Acquisition

25

Clear Semantics for Machine to Understand KnowledgeClear Semantics for Machine to Understand Knowledge

Page 26: Semantic Wiki, Great Candidate for Knowledge Acquisition

26

Semantic Wikis: the Social Semantic Web in Action!Semantic Wikis: the Social Semantic Web in Action!

Semantic Wikis

Page 27: Semantic Wiki, Great Candidate for Knowledge Acquisition

What is a Wiki? A Key Feature of Wikis isWhat is a Wiki? A Key Feature of Wikis is

27

Consensus

This distinguishes wikis from other publication tools

Page 28: Semantic Wiki, Great Candidate for Knowledge Acquisition

Consensus in Wikis Comes fromConsensus in Wikis Comes from

Collaboration– ~17 edits/page on average in

Wikipedia (with high variance)– Wikipedia’s Neutral Point of View

Convention– Users follow customs and

conventions to engage with articles effectively

28

Page 29: Semantic Wiki, Great Candidate for Knowledge Acquisition

Software Support Makes Wikis SuccessfulSoftware Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes, one-step

rollback Every article has a “Talk” page for

discussion Notification facility allows anyone

to “watch” an article Sufficient security on pages,

logins can be required A hierarchy of administrators,

gardeners, and editors Software Bots recognize certain

kinds of vandalism and auto-revert, or recognize articles that need work, and flag them for editors 29

Page 30: Semantic Wiki, Great Candidate for Knowledge Acquisition

Success of WikisSuccess of Wikis

30

One of human’s greatest inventions

Actual number of articles on en.wikipedia.org (thick blue line) compared with a Gompertz model that leads eventually to a maximum of about 4.4 million articles

(thin green line)

Page 31: Semantic Wiki, Great Candidate for Knowledge Acquisition

Summary: What Wiki Is Really AboutSummary: What Wiki Is Really About

Quick and Easy – No download

Layered Community Authoring

Interlinked Hierarchical Content

Revision Control

Notification

Softw

are

Supp

ort

Page 32: Semantic Wiki, Great Candidate for Knowledge Acquisition

What is a Semantic WikiWhat is a Semantic Wiki

A wiki that has an underlying model of the knowledge described in its pages.

To allow users to make their knowledge explicit and formal Semantic Web Compatible

32

Semantic Wiki

Page 33: Semantic Wiki, Great Candidate for Knowledge Acquisition

Combining Human Knowledge and Data Structures Combining Human Knowledge and Data Structures

Wikis for Metadata

Metadata for Wikis

33

Page 34: Semantic Wiki, Great Candidate for Knowledge Acquisition

Basics of Semantic WikisBasics of Semantic Wikis

Still a wiki, with regular wiki features– E.g. Category/Tags, Namespaces, Title, Versioning, ...

Typed Content– E.g. Page/Card, Date, Number, URL/Email, String, …

Typed Links– E.g. “capital_of”, “contains”, “born_in”…

Querying Interface Support– E.g. “[[Category:Person]] [[Age::<30]]”

34

Page 35: Semantic Wiki, Great Candidate for Knowledge Acquisition

Advanced Semantic Wiki FeaturesAdvanced Semantic Wiki Features

Semantic forms or templates Auto-completion based on semantics Powerful visualizations based on semantics/structures/types Rules and reasoning support Advanced search and queries (faceted search, SPARQL,

etc.) Semantic notifications (personalized information filtering) Import and Export of Semantic Data Data Integration: identification, disambiguation, merging,

trust, security/privacy, …

35

Page 36: Semantic Wiki, Great Candidate for Knowledge Acquisition

Characteristics of Semantic WikisCharacteristics of Semantic Wikis

36

Semantic Wikis

Page 37: Semantic Wiki, Great Candidate for Knowledge Acquisition

What is the Promise of Semantic Wikis?What is the Promise of Semantic Wikis?

Semantic Wikis facilitate Consensus over Data (Knowledge)

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed, user-maintained, user-defined

Easy to use as an extension of text authoring

37

The ultimate Knowledge aggregator

Page 38: Semantic Wiki, Great Candidate for Knowledge Acquisition

One Key Helpful Feature of Semantic WikisOne Key Helpful Feature of Semantic Wikis

Semantic Wikis are “Schema-Last”Databases require DBAs and schema design;

Semantic Wikis develop and maintain the schema in the wiki

Page 39: Semantic Wiki, Great Candidate for Knowledge Acquisition

39

Great Candidate for Knowledge AcquisitionGreat Candidate for Knowledge Acquisition

Combining both unstructured and semi-structured data High connectivity on both information and social dimensions Collaboration with sophisticated software support Expected low-cost for crowd-sourcing Evolving category and template systems

But…

Page 40: Semantic Wiki, Great Candidate for Knowledge Acquisition

BUT – Plain Wikis Are Not Good Enough for Deep Knowledge AcquisitionBUT – Plain Wikis Are Not Good Enough for Deep Knowledge Acquisition

40

Knowledge is represented MOSTLY in unstructured and semi-structured ways• Plain text• Templates• Infoboxes• Tables• Section headers• Links• References• Redirects• …

Page 41: Semantic Wiki, Great Candidate for Knowledge Acquisition

41

Software/Feature Enhancements Are NeededSoftware/Feature Enhancements Are Needed

Quick and easy way to view and edit schema

Machine assistence (NLP, Auto-suggest…)

Better visualizations with structured data

More user layers for better KB construction

Better targeted (semantic) notifications

Page 42: Semantic Wiki, Great Candidate for Knowledge Acquisition

42

K.A. is the well-known Artificial Intelligence Problem– AI authoring is too expensive, too slow, not scalable

Three Possible Solutions– Automatic Machine Parsing (e.g. NELL, ReVerb)

• Quality (depth) not good enough for textbook sentences• Error rates are too high• Still need humans in the loop for training data

– Crowd Sourced Authoring (e.g. AMT)• Biology and Knowledge Engineering expertise is difficult to get• Mechanical Turk uses individuals, but the Knowledge Entry tasks appear to require

coordination, judgment, discussion, and working together

– Social Authoring and Crowdsourcing with Intelligence Software Assistance

• Wikipedia showed this could work for text• Semantic Wiki software R&D to make it work for more structured knowledge

Best Bet for Knowledge Acquisition?Best Bet for Knowledge Acquisition?

Page 43: Semantic Wiki, Great Candidate for Knowledge Acquisition

43

With All These Features…With All These Features…

Effective Knowledge acquisition

via Semantic Wikis

Combine the strength of human and machines

Connecting Human and Machines

High Quality while low cost

Page 44: Semantic Wiki, Great Candidate for Knowledge Acquisition

44

Conclusion: To Bridge Machine and Human IntelligenceConclusion: To Bridge Machine and Human Intelligence

We Need Social Semantic Web

Page 45: Semantic Wiki, Great Candidate for Knowledge Acquisition

45

To Dive Into Social Semantic Web To Dive Into Social Semantic Web

Semantic Wiki is a Great Candidate

Page 46: Semantic Wiki, Great Candidate for Knowledge Acquisition

46

THANK YOU!THANK YOU!

Credits: some slides are originally from the following people, with little or no modifications:

Nova SpivackDenny VrandecicMark GreavesBao Jie