content management, metadata and semantic web

62
Confidential HP Content Management, Content Management, Metadata & Semantic Web Metadata & Semantic Web Keynote Address Keynote Address Net.ObjectDAYS 2001, Erfurt, Germany, September 11, Net.ObjectDAYS 2001, Erfurt, Germany, September 11, 2001 2001 Amit Sheth CTO/SrVP, Voquette (www.voquette.com) [formerly Founder/CEO, Taalee, www.taalee.com] Director, Large Scale Distributed Information Systems Lab, University Of Georgia (lsdis.cs.uga.edu) [email protected] Metadata Extraction is a patented pending technology of Taalee, Inc. Semantic Engine and WorldModel are trademarks of Taalee. Inc.

Upload: amit-sheth

Post on 06-May-2015

5.914 views

Category:

Education


1 download

DESCRIPTION

Keynote given at NetObjectDays conference, Erfurt, September 11, 2001. One of the earliest keynotes discussing commercial semantic web technologies, semantic web applications (including semantic search, semantic targeting, semantic content management). Prof. Sheth started a Semantic Web company Taalee, Inc. in 1999 (Product was MediaAnywhere A/V search engine),that merged to become Voquette in 2001 (product was called SCORE), Semagix in 2004 (product was called Semagix Freedom), and then Fortent in 2006 (products included Know Your Customers). Additional details can be found in U.S. Patent #6311194, 30 Oct. 2001 (filed 2000). Note: the commercial system used "WorldModel" as at the time, business customers were not yet warm to "Ontology" - the concept/intent is the same. More recent information at http://knoesis.org

TRANSCRIPT

Page 1: Content Management, Metadata and Semantic Web

Confidential HP

Content Management, Content Management, Metadata & Semantic WebMetadata & Semantic Web

Keynote AddressKeynote AddressNet.ObjectDAYS 2001, Erfurt, Germany, September 11, 2001Net.ObjectDAYS 2001, Erfurt, Germany, September 11, 2001

Amit ShethCTO/SrVP, Voquette (www.voquette.com)

[formerly Founder/CEO, Taalee, www.taalee.com]

Director, Large Scale Distributed Information Systems Lab, University Of Georgia (lsdis.cs.uga.edu)

[email protected]

Metadata Extraction is a patented pending technology of Taalee, Inc.Semantic Engine and WorldModel are trademarks of Taalee. Inc.

Page 2: Content Management, Metadata and Semantic Web

HP 2

Agenda

What is Traditional Content Management

New Content Management Challenges faced by Enterprises

Semantic Content Management

Metadata

Metadata Descriptions and Standards

(Automated) Metadata Creation/Extraction/Tagging

Metadata Usage/Applications

Semantics (and Semantic Web)

Current and Future

Page 3: Content Management, Metadata and Semantic Web

HP 3

Traditional Content Management: Core Objectives and Features

Primary Objective: Effectively create, manage and publish internal content, with

Existing content creation applications (MS-Office, Notes) and provide some new capabilities (Speech to text)

(Basic, Syntactic) metadata Workflow or lifecycle support (from author to Web publication

or distribution) Versioning and Rollback (Keyword-based/Syntactical) Search and Personalization Internal Distribution Web publishing

ContentCreation

andEdition

ContentManagement

ContentPersonalization

andServices

ContentDelivery

Page 4: Content Management, Metadata and Semantic Web

HP 4

Technology/Product Provider Landscape

Traditional Content Management Companies Interwoven, Vignette, Broadvision, Enprise, Documentum, Open

Market

Three of several upcoming companies focusing on metadata, semantics and/or semantic web Applied Semantics, Voquette (Taalee), Ontoprise See http://business.semanticweb.org for more

Page 5: Content Management, Metadata and Semantic Web

HP 5

Enterprise Content Management – sample user requirements (from a large Financial Svcs Company)

“If a new bond comes into inventory, then we should get a message, an alert...and be able to refine to say that I only have California, Oregon and Washington clients...."

“In the month of July, I received 95 e-mails from my subscriptions. These e-mails included 61 that had 143 attachments that had 67 more attachments. In total therefore, I received almost 400 documents including 5 different types (HTML,PDF, Word, Rich Media, …). Even with this volume, I had subscribed to only 10 categories in the Equities area. There are a total of 26 Equity Subscription areas and a total of 166 categories to which a user can subscribe across all Product Areas.”

Professional users of a traditional Content Management Product/Solution

Page 6: Content Management, Metadata and Semantic Web

HP 6

Enterprise Content Management – sample user requirements (from a large Financial Svcs Company)

The real question is, "Which sales ideas may have significant relevance to my book of business?" For example, an earnings warning on an equity rated Hold or Lower and not owned by any of my clients may not be of high relevance to me. Ideally, a relevance analysis would: Greatly reduce the volume of Product Area Ideas sent to every FA,

hopefully to perhaps 10% to 20% or less of today's volume with ideas that are potentially actionable for that FA and his/her client

Result in FAs reading and evaluating the Product Area Ideas, taking appropriate actions, and generating sales because the Product Area Ideas would be relevant

Result in customer satisfaction because clients would understand FAs are paying attention to their needs and developing focused ideas

Professional users of a traditional Content Management Product/Solution

Page 7: Content Management, Metadata and Semantic Web

HP 7

Enterprise Content Management – sample product requirements (from a large Financial Svcs Company)

“Content generation is a more complex and probably costly problem to solve ... we reportedly create about 9 million messages a month for field delivery. On average, this would mean 1,000 messages per month per ‘big user’ or perhaps only 500 to 600 per ‘little user’.…I strongly believe an analysis is in order of the nature and necessity of generated content , the establishment of content generation standards, themovement towards development and implementation of a relevance engine, … “

Director (Product Management) of a large company that uses a leading Content Management Product

Page 8: Content Management, Metadata and Semantic Web

HP 8

New Enterprise Content Management Challenges

1. More variety and complexity More formats (MPEG, PDF, MS Office, WM, Real, AVI, etc) More types (Docs, Images -> Audio, Video, Variety of text-

structured, unstructured) More sources (internal, extranet, internet, feeds)

2. Information Overload Too much data, precious little information (Relevance)

3. Creating Value from Content How to Distribute the right content to the right people as needed?

(Personalization -- book of business) Customized delivery for different consumption options

(mobile/desktop, devices) Insight, Decision Making (Actionable)

Page 9: Content Management, Metadata and Semantic Web

HP 9

New Enterprise Content Management Technical Challenges

1. Aggregation Feed handlers/Agents that understand content representation and

media semantics Push-pull, Web-DB-Files, Structured-Semi-structured-

Unstructured data of different types

2. Homogenization and Enhancement Enterprise-wide common view

Domain model, taxonomy/classification, metadata standards Semantic Metadata– created automatically if possible

3. Semantic Applications Search, personalization, directory, alerts, etc. using metadata and

semantics (semantic association and correlation), for improved relevance, intelligent personalization, customization

Page 10: Content Management, Metadata and Semantic Web

HP 10

Creating and Serving Metadata to Power the Life-cycle of Content

Where is the

content? Whose is

it?

ProduceAggregate

What is this

content about?

Catalog/Index

What other

content is it

related to?

Integrate Syndicate

What is the right

content for this user?

Personalize

What is the best way to

monetize this interaction?

Interactive Marketing

Broadcast,Wireline,Wireless,Interactive TV

Semantic Metadata

ApplicationsBack End

"A Web content repository without metadata is like a library without an index." - Jack Jia, IWOV“Metadata increases content value in each step of content value chain.” Amit Sheth

Page 11: Content Management, Metadata and Semantic Web

HP 11

A Metadata Classification

Data (Heterogeneous Types/Media)(Heterogeneous Types/Media)

Content Independent Metadata (creation-date, location, type-of-sensor...)(creation-date, location, type-of-sensor...)

Content Dependent Metadata (size, max colors, rows, columns...)(size, max colors, rows, columns...)

Direct Content Based Metadata (inverted lists, document vectors, LSI)(inverted lists, document vectors, LSI)

Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML(C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...)Document Type Definitions, C program structure...)

Domain Specific Metadata area, population (Census),area, population (Census), land-cover, relief (GIS),metadata land-cover, relief (GIS),metadata concept descriptions from ontologiesconcept descriptions from ontologies

OntologiesClassificationsClassificationsDomain ModelsDomain Models

User

More More

SemanticsSemantics

for for

Relevance Relevance

to tackleto tackle

InformationInformation

Overload!!Overload!!

Page 12: Content Management, Metadata and Semantic Web

HP 12

Semantics

“meaning or relationship of meanings, or relating to meaning”

(Webster)

is concerned with the relationship between the linguistic

symbols and their meaning or real-world objects

meaning and use of data (Information System)

Example: Palm -> Company, Product, Technology, Tree Name, part of location (Palm Spring, Palm Beach)

Semantics, Ontologies (Domain Models), Metamodels,

Metadata, Content/Data

Page 13: Content Management, Metadata and Semantic Web

HP 13

“The Web of data (and connections) with meaning in the sense that a computer program can learn enough about what the data means to process it. . . . Imagine what computers can understand when there is a vast tangle of interconnected terms and data that can automatically be followed.” (Tim Berners-Lee, Weaving the Web, 1999)

A Content Management centric definition ofSemantic Web: The concept that Web-accessible content can be organized and utilized semantically, rather than though syntactic and structural methods.

Semantics: The Next Step in the Web’s Evolution

Page 14: Content Management, Metadata and Semantic Web

HP 14

Next Generation:

Semantic Content Management

Page 15: Content Management, Metadata and Semantic Web

HP 15

Organizing Content

Different and Related Objectives: Search, Browse, Summarization, Association/Relationships

Indexing Clustering Classification Controlled Vocabulary, Reference Data/ Dictionary/Thesaurus Metadata Knowledge Base (Entities/Objects and Relationships)

Page 16: Content Management, Metadata and Semantic Web

HP 16

Statistical/AI Techniques

Customer Article Feed

4715

Classification of Article 4715

Customer Training

Set

Traditional Text Categorization

Routing/Distribution

Classify Place ina taxonomy

feed

Standard Metadata

Feed Source: iSyndicate  

Posted Date: 11/20/2000Most traditional Content Management Products support Categorization of unstructured content..

Page 17: Content Management, Metadata and Semantic Web

HP 17

Knowledge-base & Statistical/AI Techniques

Article Feed4715

Classification of Article 4715

Customer Training Set & KB

Routing/Distribution

ClassifyPlace ina taxonomy

Taalee Training Set & KB

Map to another taxonomy

MetadataCatalog

Semantic Engine™

Precise Personalization/Syndication/Filtering

Voquette/Taalee’s Categorization & Automatic Metadata Creation

feed

Article 4715 MetadataFeed Source: iSyndicate  

Posted Date: 11/20/2000

Company Name: France Telecom,

Equant

Ticker Symbol: FTE, ENT

Exchange: NYSE

Topic: Company News

Standard metadata

Semantic metadata

FTECompany AnalysisConference Calls

EarningsStock Analysis

ENTCompany AnalysisConference Calls

EarningsStock Analysis

NYSEMember Companies

Market NewsIPOs

Automated Content Enrichment (ACE)

Page 18: Content Management, Metadata and Semantic Web

HP 18

Technologies for Organizing Content

Information Retrieval/Document Indexing TF-IDF/statistical, Clustering, LSI Statistical learning/AI: Machine learning, Bayesian, Markov

Chains, Neural Network Lexical, Natural language Thesaurus, Reference data, Domain models (Ontology) Information Extractors Reasoning/Inferencing: Logic based, Knowledge-based, Rule

processing and

Most powerful solutions require combine several of these, addressing more of the objectives

Page 19: Content Management, Metadata and Semantic Web

HP 19

Multiple competitng standards!

Multiple heterogeneous metadata models with different tag names for the same data in the same GIS domain

FGDC Metadata ModelFGDC Metadata Model

Theme keywordsTheme keywords:: digital line graph,

hydrography, transportation...

TitleTitle: Dakota Aquifer

Online linkageOnline linkage::

http://gisdasc.kgs.ukans.edu/dasc/

Direct Spatial Reference Method:Direct Spatial Reference Method: Vector

Horizontal Coordinate System Definition:Horizontal Coordinate System Definition:

Universal Transverse Mercator

… … … ...

UDK Metadata ModelUDK Metadata Model

Search termsSearch terms:: digital line graph,

hydrography, transportation...

TopicTopic:: Dakota Aquifer

Adress Id:Adress Id:

http://gisdasc.kgs.ukans.edu/dasc/

Measuring Techniques:Measuring Techniques: Vector

Co-ordinate System:Co-ordinate System:

Universal Transverse Mercator

… … … ...

Kansas StateKansas State

Page 20: Content Management, Metadata and Semantic Web

HP 20

Basis for Semantics

A. Facts/Concepts/Terms/Entities Dictionary, Thesaurus, Reference Data,

Vocabulary

B. Facts with Relationships Taxonomy/(Categories), Ontology Domain Modeling (e.g., Golf = golfer, tournament name,

golf course, event) Knowledge Base

Page 21: Content Management, Metadata and Semantic Web

HP 21

Ontology

Standardizes meaning, description, representation of involved concepts/terms/attributes

Captures the semantics involved via domain characteristics, resulting in semantic metadata

“Ontological Commitment” forms basis for knowledge sharing and reuse

Ontology provides semantic underpinning.

Page 22: Content Management, Metadata and Semantic Web

HP 22

An OntologyAn Ontology

Disaster

eventDate

description

site => latitude, longitude

sitelatitude

longitude

Natural Disaster

Man-made Disaster

damage

numberOfDeaths

damagePhoto

Volcano

EarthquakeNuclearTest

magnitude

bodyWaveMagnitude

conductedBy

explosiveYield

bodyWaveMagnitude < 10

bodyWaveMagnitude > 0

magnitude < 10

magnitude > 0

Terms/Concepts(Attributes) Functional

Dependencies (FDs)

Domain Rules

Hierarchies

Page 23: Content Management, Metadata and Semantic Web

HP 23

Controlled Vocabularies/ Classifications/Taxonomies/Ontologies

WordNet Cyc The Medical Subject Headings (MeSH): NLM's controlled

vocabulary used for indexing articles, for cataloging books and other holdings, and for searching MeSH-indexed databases, including MEDLINE. MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts. Year 2000 MeSH includes more than 19,000 main headings, 110,000 Supplementary Concept Records (formerly Supplementary Chemical Records), and an entry vocabulary of over 300,000 terms.

Page 24: Content Management, Metadata and Semantic Web

HP 24

Open Directory Project (ODP): Classification/Taxonomy & Directory

Page 25: Content Management, Metadata and Semantic Web

HP 25

Metadata Specifications (MetaModels)

Metadata

Domain Independent (Dublin Core, RDF, DAML+OIL)

Frameworks/Infrastructures (XCM, XMI)

Function Specific

ICE (Syndication)

Domain (Application) SpecificMARC (Library), FGDC and UDK (Geographic), PRISM (Publishing), FXML

(Financial Transactions). RIXML (Buy-Sell Research/Financial Services), IMS

Learning Resource (Distance Learning). …..

Media Specific

MPEGx, VoiceXML

NewsML (News exchange)

Page 26: Content Management, Metadata and Semantic Web

HP 26

Types of Specs and Standards (or MetaModels)

Domain Independent: (MCF), RDF, (MOF), DublinCore

Media Specific: MPEG4, MPEG7, VoiceXML

Domain/Industry Specific (metamodels): MARC (Library), FGDC

and UDK (Geographic), NewsML (News), PRISM (Publishing),

RIXML (Buy-Sell Research/Financial Services)

Application Specific: ICE (Syndication), IMS Learning Resource

(Distance Learning)

Exchange/Sharing: XCM, XMI

Orthogonal/(Other): RDFS, namespaces, ontologies, domain

models, (DAML, OIL)

Page 27: Content Management, Metadata and Semantic Web

HP 27

Dublin Core Metadata Initiative

Simple element set designed for resource description

International, inter-discipline, W3C community

consensus

“Semantic” interface among resource description

communities (very limited form of semantics)

Source:www.desire.org

Page 28: Content Management, Metadata and Semantic Web

HP 28

Dublin Core RDF

<xml>

<?namespace href = "http://w3.org/rdf-schema" as = "RDF">

<?namespace href = "http://metadata.net/DC" as = "DC">

<RDF:Abbreviated>

<RDF:Assertion RDF:HREF = http://www.mysite.com/mydoc.html

DC:Title = "I've Never Metadata I've Never Liked“

DC:Creator = "Mary Crystal“

DC:Subject = "Metadata, Dublin Core, Stuff"/>

</RDF:Abbreviated>

</xml>

Page 29: Content Management, Metadata and Semantic Web

HP 29

NewsML

The content provider supplies NewsML packaged media content to the operator. The content can be categorized as current events, finance, sport, etc. (but no standards is specified) and updated hourly.

The operator receives NewsML data from the content provider. The content server automatically pushes updated news articles to all news service subscribers.

Consumers sign up for the news service directly on the device. When using the news service, the user browses through the categories and reads the news articles. The news articles are presented in a continuous flow (one after the other) without end-user interaction.

Source:http://www.mediabricks.com

Page 30: Content Management, Metadata and Semantic Web

HP 30

NewsML

Content-descriptive metadata:<HeadLine>Seattle attacked by Godzilla-like creature, Microsoft closes HQ</HeadLine>  

<DateLine>Seattle, Was., Aug 30, 2009 /AthensWire via COMTEX/ --</DateLine>  

<CopyrightLine>Copyright (C) 2009 AthensWire. All rights reserved.</CopyrightLine>  

Administrative metadata:<Provider><Party FormalName="Comtex" /></Provider>

<Source><Party FormalName="AthensWire" /></Source>

Rights metadata:<CopyrightDate>2009</CopyrightDate>

Descriptive metadata:<Language FormalName="en" />  

<Property FormalName="Location" Value=“Seattle, Washington, United States, North America" />

<Property FormalName="PublicCompany" Vocabulary="urn:newsml:comtexnews.net:20010201:DomesticPublicCompanies:1"> 

<Property FormalName="CompanyName" Value=“Microsoft Corp." />

<Property FormalName="StockSymbol" Value="MSFT"/><Property FormalName="StockExchange" Value="Nasdaq" />

</Property >

Page 31: Content Management, Metadata and Semantic Web

HP 31

RIXML

Financial metadata for Buy/Sell sides Highly domain-specific Schema (see next slide) [from UserGuide, p. 31] Example: MorningCall.xml

Page 32: Content Management, Metadata and Semantic Web

HP 32

RIXML Schema

Page 33: Content Management, Metadata and Semantic Web

HP 33

Metadata Creation and Semanticization

Automatic Content

Classification/Categorization

Metadata Creation/Extraction:

Types of metadata created

Semantic Engine and WorldModel are trademarks of Taalee, Inc.Metadata Extraction is a patented technology of Taalee, Inc.

Page 34: Content Management, Metadata and Semantic Web

HP 34

Content Handling/Ingest

Infrastructure/Exchange

Feed Handlers

Crawlers/Screen Scrapers/Bots

Software Agents

Centralized, Distributed, or Mobile/Migratory

Page 35: Content Management, Metadata and Semantic Web

HP 35

Information Extraction for Metadata Creation

WWW, EnterpriseRepositories

METADATAMETADATA

EXTRACTORSEXTRACTORS

Digital Maps

NexisUPIAPFeeds/

Documents

Digital Audios

Data Stores

Digital Videos

Digital Images. . .

. . . . . .

Key challenge: Create/extract as much (semantics)metadata automatically as possible

Page 36: Content Management, Metadata and Semantic Web

HP 36

Extracting a Text Document:Extracting a Text Document:Syntactic approachSyntactic approach

INCIDENT MANAGEMENT SITUATION REPORT

Friday August 1, 1997 - 0530 MDT

NATIONAL PREPAREDNESS LEVEL II

CURRENT SITUATION: Alaska continues to experience large fire activity. Additional fires have beenstaffed for structure protection.

SIMELS, Galena District, BLM. This fire is on the east side of the Innoko Flats, between Galena and McGrThe fore is active on the southern perimeter, which is burning into a continuous stand of black spruce. Thefire has increased in size, but was not mapped due to thick smoke. The slopover on the eastern perimeter is35% contained, while protection of the historic cabit continues.

CHINIKLIK MOUNTAIN, Galena District, BLM. A Type II Incident Management Team (Wehking) is assigned to the Chiniklik fire. The fire is contained. Major areas of heat have been mopped up. The fire iscontained. Major areas of heat have been mopped-up. All crews and overhead will mop-up where the fireburned beyond the meadows. No flare-ups occurred today. Demobilization is planned for this weekend,depending on the results of infrared scanning.

LAYOUT

Date => day month int ‘,’ int

Page 37: Content Management, Metadata and Semantic Web

HP 37

Extraction Agent

Web Page Enhanced Metadata Asset

Taalee Extraction and Knowledgebase Enhancement

Page 38: Content Management, Metadata and Semantic Web

HP 38

Automatic Categorization & Metadata Tagging (unstructured text/transcript of A/V)

ABSOLUTE CONTROL OF THE SENATE IS STILL IN QUESTION. AS OF TONIGHT, THE REPUBLICANS HAVE 50 SENATE SEATS AND THE DEMOCRATS 49. IN WASHINGTON STATE, THE SENATE RACE REMAINS TOO CLOSE TO CALL. IF THE DEMOCRATIC CHALLENGER UNSEATS THE REPUBLICAN IUMBENT THE SENATE WILL BE EVENLY DIVIDED. IN MISSOURI, REPUBLICAN SENATOR JOHN ASHCROFT SAYS HE WILL NOT CHALLENGE HIS LOSS TO GOVERNOR MEL CARNAHAN WHO DIED IN A CRASH THREE WEEKS AGO. GOVERNOR CARNAHAN'S WIFE IS EXPECTED TO TAKE HIS PLACE. IN THE HIGHEST PROFILE SENATE EVENT OF THE NIGHT, HILLARY CLINTON WON THE NEW YORK SENATE SEAT. SHE IS THE FIRST FIRST LADY TO RUN MUCH LESS WIN.

Video Segmentwith Associated Text

Segment Description

Semantic

Metadata

AutoCategorization

Page 39: Content Management, Metadata and Semantic Web

HP 39

Video withEditorialized Text on the Web

Automatic Categorization & Metadata Tagging (Web page)

AutoCategorization

AutoCategorization

Semantic MetadataSemantic Metadata

Page 40: Content Management, Metadata and Semantic Web

HP 40

TextFromBllomberg

AutoCategorization

AutoCategorization

Semantic MetadataSemantic Metadata

Automatic Categorization & Metadata Tagging (Feed)

Page 41: Content Management, Metadata and Semantic Web

HP 41

  Virage Search on football touchdown

Jimmy Smith Interview Part SevenJimmy Smith explains his philosophy on showboating. URL: http://cbs.sportsline...

Brian Griese Interview Part FourBrian Griese talks about the first touchdown he ever threw. URL: http://cbs.sportsline...

Metadata from Typical Cataloging of Football

Assets

   

Taalee Metadata on Football Assets

Rich Media Reference Page

Baltimore 31, Pit 24

http://www.nfl.com

Quandry Ismail and Tony Banks hook up for their third long touchdown, this time on a 76-yarder to extend the Raven’s lead to 31-24 in the third quarter.

ProfessionalRavens, SteelersBal 31, Pit 24Quandry Ismail, Tony BanksTouchdownNFL.com2/02/2000

League:Teams:Score:

Players:Event:

Produced by:Posted date:

Crawler provided text for indexing vs Agent provided semantic metadata

Page 42: Content Management, Metadata and Semantic Web

HP 42

TraditionalContent

Management

Agent

Push

Pull

InformationExtraction

Agents

DynamicKB

CustomWorldModel

RelevantMetadata

Enhancement

KnowledgeManagement

Aggregation&

MetadataExtraction

Knowledge Management (Knowledge Base, Domain Model, Metadata)

Agent

FrontEnd

Portal

Voquette Semantic

Applications

Feeds(proprietary

formats, standards-based,

NewsML)

CorporateRepositories

Web Sites

One Approach to Extending Traditional CM: Voquette’s Semantic Engine Technology

SearchPersonalizationAlertsNotificationsCustom “research” applications

Content

Metadata

Metadata

Metadata

Metadata

Page 43: Content Management, Metadata and Semantic Web

HP 43

Taalee/Voquette Semantic Platform Architecture

Content of all format, media, push/pull:Web sites/pages: static, dynamicContent Feeds (unstructured, semistructured/docs, tagged/XML)Corporate Repositories/databases

Homogenization/integration:with taxonomy (categorization)contextually relevant metadata wrt to domain model, automatically generated from content and inferenced

© Taalee Inc.

Page 44: Content Management, Metadata and Semantic Web

HP 44

Content which doescontain the wordsthe user asked for

Extractor Agents

Content which does not contain the words

the user asked for, but is about what he asked

for.

Value-added Metadata

Content the user did not think to ask for, but

which he needs to know.

Semantic Associations

+ +

Semantic ContentSemantic Content

End-User

Semantic Content

Page 45: Content Management, Metadata and Semantic Web

Confidential HP

Metadata andSemantic Technology enabledApplications

Page 46: Content Management, Metadata and Semantic Web

HP 46

Taalee’s Semantic Search

Highly customizable, precise and freshest A/V search

Context and Domain Specific Attributes Uniform Metadata for Content from Multiple Sources, Can be sorted by any field

Delightful, relevant information,exceptional targeting opportunity

Page 47: Content Management, Metadata and Semantic Web

HP 47

Cre

atin

g a

Web

of

rela

ted

info

rmat

ion

Wh

at c

an a

co

nte

xt d

o?

Page 48: Content Management, Metadata and Semantic Web

HP 48

Example (test on http://directory.mediaanywhere.com)

Search for company ‘Commerce One’

Links to news on companies

that compete against

Commerce One

Links to news on companies

Commerce One competes

against

(To view news on Ariba, click

on the link for Ariba)

Crucial news on

Commerce One’s

competitors (Ariba) can

be accessed easily and

automatically

Page 49: Content Management, Metadata and Semantic Web

HP 49

Wh

at e

lse

can

a c

on

text

do

?(a

co

mm

erci

al p

ersp

ecti

ve)

Sem

anti

c E

nri

chm

ent

Semantic Targeting

Page 50: Content Management, Metadata and Semantic Web

HP 50

Semantic/Interactive Targeting

Buy Al Pacino VideosBuy Russell Crowe VideosBuy Christopher Plummer VideosBuy Diane Venora VideosBuy Philip Baker Hall VideosBuy The Insider Video

Precisely targeted through the use of Structured Metadata and integration from multiple sources

Page 51: Content Management, Metadata and Semantic Web

HP 51

Example 1 – Snapshots (“Jamal Anderson”)

Click on first result for Jamal Anderson

View metadata. Note that Team name and League name are also included

in the metadata

Search for ‘Jamal Anderson’ in ‘Football’

View the original source HTML page. Verify that

the source page contains no mention of Team name and League name. They

were Taalee’s value-additions to the metadata to facilitate easier search.

Page 52: Content Management, Metadata and Semantic Web

HP 52

Example 2 – Snapshots (“Gary Sheffield”)

Click on first result for Gary Sheffield

View metadata. Note that Team name and League name are also included

in the metadata

Search for ‘Gary Sheffield’ in ‘Baseball’

View the original source HTML page. Verify that

the source page contains no mention of Team name and League name. They

were Taalee’s value-additions to the metadata to facilitate easier search.

Page 53: Content Management, Metadata and Semantic Web

HP 53

Related Stock

News

Related Stock

News

Semantic Web – Intelligent Content(supported by Taalee Semantic Engine)

IndustryNews

IndustryNews

Technology Products

Technology Products

COMPANYCOMPANY

SECEPAEPA

RegulationsRegulations

CompetitionCompetition

COMPANIES in Same or Related INDUSTRY

COMPANIES inINDUSTRY with Competing PRODUCTS

Impacting INDUSTRY or Filed By COMPANY

Important to INDUSTRY or COMPANY

Intelligent Content = What You Asked for + What you need to know!

Page 54: Content Management, Metadata and Semantic Web

HP 54

Focused relevantcontent

organizedby topic

(semantic categorization)

Automatic ContentAggregationfrom multiple

content providers and feeds

Related news not

specifically asked for(Semantic

Associations)

Competitive research inferred

automatically

Automatic 3rd party content

integration

Semantic Application – Equity Dashboard

Page 55: Content Management, Metadata and Semantic Web

HP 55

Internal Source 1Research

Internal Source 2

External feeds/Web(e.g. Reuters)

VoquetteMetabase

World Model

Third-partyContent Mgmt

AndSyndication

SemanticEngine

1

2

3

4

Cisco story from Source 1passed on to addsemanticassociations

ConsultsKnowledgeBasefor Cisco’scompetition

Returns result:Lucent is a competitor of Cisco

Lucent story from external

feeds picked for publishing as

“semantically related” to Cisco

story – passedon to Dashboard

Story onLucent

Story onCisco

XCM-compliant metadata, XML or other format

SemanticApplication

ASP/Enterprise hosted

Extractor Agent 1

Extractor Agent 2

Extractor Agent 3

Metadata centricContent Management Architecture

Page 56: Content Management, Metadata and Semantic Web

HP 56

Wireless Application of Semantic Metadata and Automatic Content Enrichment

MyStocks

News

Sports

Music

MyMedia

$

My Stocks

CSCO

NT

IBM

Market

CSCO

Analyst Call

Conf Call

Earnings

11/08 ON24 Payne

11/07 ON24 H&Q 11/06 CBS Langlesis

CSCO Analysis

Clicking on the link for Cisco Analyst Calls displays a listingsorted by date. Semantic filtering uses just the right metadata to meet screen and other constrains. E.g., Analyst Call focuses on the source and analyst name or company. The icon denote additional metadata, such as “Strong Buy” by H&Q Analyst.

Page 57: Content Management, Metadata and Semantic Web

HP 57

SceneDescriptionTree

Retrieve Scene Description Track

“NSF Playoff”

Node

Enhanced XML

Description

MPEG-2/4/7

Enhanced Digital Cable

Video

MPEGEncoder

MPEGDecoder

Node = AVO Object

Voqutte/TaaleeSemantic

Engine“NSF Playoff”

Produced by: Fox Sports   Creation Date: 12/05/2000 League: NFLTeams: Seattle Seahawks, Atlanta Falcons Players: John Kitna Coaches: Mike Holmgren, Dan Reeves Location: Atlanta

Object Content Information (OCI)

Metadata-richValue-added Node

Create Scene Description Tree

GREATUSER

EXPERIENCE

Metadata’s role in emerging iTV infrastructure

Channel salesthrough Video Server Vendors,

Video App Servers, and Broadcasters

License metadata decoder and semantic applications to

device makers

Page 58: Content Management, Metadata and Semantic Web

HP 58

Metadata for Automatic Content Enrichment

Interactive Television

This segment has embedded or referenced metadata that isused by personalization application to show only the stocksthat user is interested in.

This screen is customizablewith interactivity featureusing metadata such as whetherthere is a new ConferenceCall video on CSCO.

Part of the screen can beautomatically customized to show conference call specific information– including transcript,participation, etc. all of which arerelevant metadata

Conference Call itself can have embedded metadata to support personalization andinteractivity.

Page 59: Content Management, Metadata and Semantic Web

HP 59

Semantic Technology Features

Unstructured Text Content Semi-Structured Content Structured Content Audio/Video Content with associated text (transcript, journalist notes) Create a Customized "World Model" (Taxonomy Tree with customized domain

attributes) Automatically homogenize content feed tags Automatically categorize unstructured text Automatically create tags based on text Itself Create and maintain a Customized Knowledge Base for any domain Automatically enhance content tags based on information beyond text Build contextually relevant custom research applications Contextual Search (an order of magnitude better than keyword-based search) Support push or pull delivery/ingestion of content Personalization/Alerts/Notifications Real Time Indexing (stories indexed for search/personalization within a minute) Provide the user with relevant information not explicitly asked for (Semantic

Associations)

Page 60: Content Management, Metadata and Semantic Web

Confidential HP

Along with the evolution of metadata and semantic

technologies enabling the next generation of the Web, Content Management has entered the next generation of Enhanced

Content Management.

Page 61: Content Management, Metadata and Semantic Web

Resources/References

RDF:www.w3.org/TR/REC-rdf-syntax/ ICE: www.icestandard.org Meta Object Facility (MOF) Specification, Version 1.3, September 27, 1999:

http://cgi.omg.org/cgi-bin/doc?ad/99-09-05 XML Metadata Interchange (XMI) Specification, Version 1.1, October 25, 1999:

http://cgi.omg.org/cgi-bin/doc?ad/9910-02 http://cgi.omg.org/cgi-bin/doc?ad/99-10-03

DAML: www.daml.org NEWSML: newsshowcase.reuters.com PRISM: www.prismstandard.org/techdev/prismspec1.asp RIXML: www.rixml.org XCM: www.vignette.com OIL: www.ontoknowledge.org/oil SEMANTICWEB: www.semanticweb.org, business.semanticweb.org VOICEXML: www.voicexml.org MPEG7: www.darmstadt.gmd.de/mobile/MPEG7/ Taalee: www.taalee.com Applied Semantics: www.appliedsemantics.com Ontoprose: www.ontoprise.com

Page 62: Content Management, Metadata and Semantic Web

Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Amit Sheth & Wolfgang Klas, Eds., McGraw Hill, ISBN: 0-07-057735-8, 1998.

Information Brokering, Vipul Kashyap & Amit Sheth, Kluwer Academic Publishers, 2001.

Voquette Semantic Technology White Paper.

Mysteries of Metadata, Speaker – Amit Sheth, Workshop at Content World 2001.

Infoquilt Project, LSDIS lab.

http://www.taalee.com http://lsdis.cs.uga.edu/~amit