semantic integration patterns

61
M D Patterns of Semantic Integration Dan McCreary President Dan McCreary & Associates [email protected] (952) 931-9198 M D Metadata Solutions

Upload: dan-mccreary-associates

Post on 10-May-2015

3.106 views

Category:

Technology


0 download

DESCRIPTION

How software developers need to manage metadata and data dictionaries to make software integration faster and more cost effective. This presentation is a general overview of the concepts around data semantics for college-level students. This presentation was originally created for a seminar at Carleton College.

TRANSCRIPT

Page 1: Semantic Integration Patterns

M

D

Patterns of Semantic Integration

Dan McCrearyPresidentDan McCreary & [email protected](952) 931-9198

M

D

Metadata Solutions

Page 2: Semantic Integration Patterns

M

D 2 Licensed Under Creative Commons 3.0

Creative Commons 3.0

• Attribution. You must attribute the work in the manner specified by the author or licensor.

• Noncommercial. You may not use this work for commercial purposes.

• Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one.

$

BY:

Page 3: Semantic Integration Patterns

M

D 3

Patterns of Semantic IntegrationOur ever increasing understanding of solid-state physics has allowed Moore’s Law to proceed unabated for the last 40 years.  Exciting developments in quantum physics, nanotechnology and molecular self-assembly will continue this trend for the foreseeable future.  But why is it that an instructor can’t quickly import a database of 10,000 subject-appropriate lesson plans and quiz items into their learning-management system and dynamically adjust classroom content and assessments to individual student learning styles and interests?  The key to this and other computer-to-computer interoperability challenges lie in the difficulty computer systems have in finding and precisely exchanging data.  Enter the Semantic Web.  The designers of the current world-wide-web realized that the gateway to this does not require faster computers and networks but instead lies in the careful publishing and exchange of data semantics (or meaning) and the precise publishing data-that-describes-data (metadata) in a machine-readable structure.  This presentation will review patterns that researches around the world are using to make the job of computer integration easier allowing even ultimate frisbee™ coaches access to vast amounts of structured information.

Page 4: Semantic Integration Patterns

M

D 4

Background for Dan McCreary• Carleton Class of ’82• Physics Major• First year of “Computer Science Concentrations” ever

granted to a Carleton graduate• Worked in computer center and Carleton Library with

Les Lacroix doing VMS/RMS programming to create first on-line card catalog for science library

• Helped blow up lab equipment for Bruce Thomas• Semantic Solutions Consultant in Minneapolis

Page 5: Semantic Integration Patterns

M

D 5

Page 6: Semantic Integration Patterns

M

D 6

Physics 123• … intended to give students some

perspective on the kinds of work done by people with a physics background…discuss their work and work-related experiences

• Physics taught me how to create and use precise models of the world and to discover underlying patterns

• Computer to computer communication also requires precise models the discovery of underlying patterns

Page 7: Semantic Integration Patterns

M

D 7

Agenda• The steps required for precise exchange of

information between computer systems• Define “semantics” and key concepts in the semantic

web• HTML, XML, RDF

• Discuss limitations of current HTML web and XML• Show how Semantic Web technologies attempts to

solve many of these problems• Semantic patterns• Predictions• References

Page 8: Semantic Integration Patterns

M

D 8

Bruce’s Integration Challenge

The PDP-8

GammaRay

Spectrometer

Ohio Scientific6502

8=bitteletype port

CarletonVAX

1024 ChannelAccumulator

RS-232port

FFT(Fortran)

Tektronics4014

Terminal

Uranium samples from Columbia mines

Page 9: Semantic Integration Patterns

M

D 9

1970 Sci-Fi Classic: “The Forbin Project”

A NewIntersystemLanguage!

Lesson: Before you take over the world you mustexchange semantically precise metadata!

Page 10: Semantic Integration Patterns

M

D 10

Moore’s Law

Creative Commons 1.0 Courtesy of Ray Kurzweil and Kurzweil Technologies, Inc

Note:Log

Scale

Page 11: Semantic Integration Patterns

M

D 11

Thesis: We Need Semantics• For the next revolution in computing

– We don’t need faster CPUs– We don’t need larger hard drives– We don’t need faster networks– We don’t need more HTML linking

• We need to link our concepts using semantic technologies

• There are standard patterns that are used to solve these problems

Page 12: Semantic Integration Patterns

M

D 12

Patterns

• “Design Patterns” were developed by Christopher Alexander in 1979 in the building architecture domain

• Applied by “Gang of Four” to object-oriented software in 1994

• Each pattern has:– Name, Icon– Problem Description– Solution Description– Diagrams– Examples– Related Patterns

Page 13: Semantic Integration Patterns

M

D 13

The Agent VisionThe Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users.

The Semantic Web A new form of Web content that is meaningful to

computers will unleash a revolution of new possibilities By Tim Berners-Lee, James Hendler and Ora Lassila

Page 14: Semantic Integration Patterns

M

D 14

Overlapping Terminology

Data Warehouse

Data Mining

EnterpriseApplication Integration

(EAI)

MetadataDiscovery

Statistical Analysis

PatternDiscovery

Relational DatabaseMetadata

SemanticWeb

Business SemanticsData Dictionary

HTML Web

Page 15: Semantic Integration Patterns

M

D15

Computer Science Is About Abstraction

Time

Level ofAbstraction

10100101

MachineLanguage

MOV R0, A1BNE F32C

AssemblyLanguage

DO I=1, 100I=I+1

FORTRAN

Proc(i1, i2, o1)

StructuredProgramming

Object-orientedProgramming

XML

GUI

Page 16: Semantic Integration Patterns

M

D 16

Person to Person Dialog

SoundSound

WordsWords

ConceptsConcepts

SentencesSentences

ConversationConversation

Problem SolvingProblem Solving

higherabstraction

Page 17: Semantic Integration Patterns

M

D 17

Computer to Computer Dialog

InternetInternet

XML TagsXML Tags

Documents/XML SchemaDocuments/XML Schema

Graphs/Ontologies/RDF/OWLGraphs/Ontologies/RDF/OWL

Semantic IntegrationSemantic Integration

AgentsAgents

You AreHere

Page 18: Semantic Integration Patterns

M

D 18

Semantic Triangle

Concept

Referent

Refers ToSymbolizes

Stands For“cat”

Physical Objects

A pattern of neural activity in our brain

Symbol

Ogden, C. K., & Richards, I. A. (1923) The Meaning of Meaning

“katze” (German)

“gato” (Spanish)

Page 19: Semantic Integration Patterns

M

D 19

Symbols Can Only Directly Link to Concepts

Ogden, C. K., & Richards, I. A. (1923) The Meaning of Meaning

Concept

Referent

“cat”Symbol

• The link between a symbol is an INDIRECT link

• The referent MUST pass through the Concept

• Only symbols can be transmitted between computers

Page 20: Semantic Integration Patterns

M

D 20

The Problem of Semantic Ambiguity

Did you say you were looking for mixed nuts?

context=food context=hardware

People use context to derive the correct meaning.

Page 21: Semantic Integration Patterns

M

D 21

59 meanings of "run"

"run"

18 noun"senses"

41 verb"senses"

tally

test

footrace

streak

play

move fast

scat

go

operate

has form

"the kids ran to the store"

"the Yankees scored a run in the bottom of the 9th"

"The experiment ran for over an hour"

"her run of luck was just starting"

"she broke mile run record"

"the football 3rd down play was a run"

"13 other noun meanings…"

"I would run from a ticking bomb."

"The path runs up the hill."

"you need training to run this machine."

"the movie plot runs like this."

"36 other verb meanings…"

Source: WordNet at http://wordnet.princeton.edu/

Context

Page 22: Semantic Integration Patterns

M

D 22

Analogy: English Dictionary

Term

Metadata (data about data)

Definitions

source: www.m-w.com

Note: people usecontext to findthe correct meaning.

Page 23: Semantic Integration Patterns

M

D 23

Word Senses

“run”

tally

test

footracestreak

play

move fast

scat

gooperate

has form

duration

A single word mapsTo many concepts

Page 24: Semantic Integration Patterns

M

D 24

Synonym Ring

<Person>Joe Smith<Person><Individual>Joe Smith<Individual><Human>Joe Smith<Human>

Joe Smith

Many symbols forthe same object

Refers To

Symbolizes

Stands For

Page 25: Semantic Integration Patterns

M

D 25

I’m Thinking of an Animal…

• It has four legs• It has fur• It has whiskers• It chases mice• It goes “meow”

If you describe enough of the properties of a concept, you can havereasonable assurances that they are the same

Note: since “concepts” are neural patterns in the brain theconcept of “exact” is difficult to measure

Page 26: Semantic Integration Patterns

M

D 26

Concept Linking

Question: How can you tell if two concepts are the same if twosystems don’t share the same symbol?

Answer: If they have the same properties (and relationships)you can assume with reasonable probability they are

the same concepts

symbol

Page 27: Semantic Integration Patterns

M

D 27

Concept Overlap

Cat

Robo-Cat

Kitten

Page 28: Semantic Integration Patterns

M

D 28

Semantics is About Concept Linking• Wouldn’t it be nice…

– If computers could name things internally or on a web site however they liked (keep using the current web)

– But we could always link those names back to a centralized database of concepts

– Computers could do this automatically just like they translate domain names (www.google.com) into IP addresses (64.233.187.99)

– Then we could communicate precisely without dictating the names that are used inside a computer system or on a web page

Page 29: Semantic Integration Patterns

M

D 29

HTML Sample

<title>The Problem of Semantics</title><p>This is a standard document that is sent between two computers using the <a href="http://w3c.org/Protocols">HTTP<a> protocol. Note that other then the markup tags like <b>bold</b> there is very little that a computer can do to understand the meaning of the text.</p>

Unless computers "understand" the words in the English language it will be very difficult for them to understand the meaning or semantics of the web.

Page 30: Semantic Integration Patterns

M

D 30

What Computers "See" Today

<title> The Problem of Semantics</title><p> This is a standard document that is sent

between two computers using the <a href="http://w3c.org">HTTP<a> . protocol

Note that other then the markup tags like<b>bold</b> there is very little that a

computer can do to understand the meaning .of the text </p>

• Today computers see the web as linked opaque strings with keywords

• Unless computers "understand" the words in the English language it will be very difficult for them to understand the meaning or semantics of the web

Page 31: Semantic Integration Patterns

M

D 31

XML allows you to create new “tags”

<PersonGivenName>Joe</PersonGivenName><PersonFamilyName>Smith</PersonFamilyName><Address>123 Main Street</Address><City>Anytown</City><State>Minnesota</State><Phone>(651) 555-1234</Phone>

Without a data dictionary, it is difficult to know what the meaning of the data elements is. The tags appear in patterns but what they "mean" is still a mystery to a computer.

<tag> </tag>data

Page 32: Semantic Integration Patterns

M

D 32

Which external computers may not understand

<PersonGivenName>Dan</PersonGivenName><PersonFamilyName>McCreary</PersonFamilyName><Address>123 Main Street</Address><City>Minneapolis</City><Phone>(651) 555-1234</Phone>

Without a “data dictionary”, it is difficult to know what the meaning of the data elements is. The tags appear in patterns but what they mean is still a mystery to a computer.

Page 33: Semantic Integration Patterns

M

D 33

Metadata & Ontologies• Metadata is any data that describes other data• Metadata is itself data and is stored in specialized

structures (directed graphs) to aid comparison with other metadata

• A controlled store of metadata is called a “registry”• Complex directed graphs can evolve into “ontologies”

Data

describesMetadata

RDBMS

document keywords

tables

web navigation

columns

source-code

org-chart

product-specs

Page 34: Semantic Integration Patterns

M

D 34

Hypertext Links and Data Element Links

The Semantic Web

MetadataRegistry A

MetadataRegistry B

The semantic web is about linking conceptual data elements in published metadata registries

The Hypertext Web

The current HTML web is focused on linking published documents with HTML

Page 35: Semantic Integration Patterns

M

D 35

Enter the URI…

• Today's web allows documents to be accessed by people if people put links in between documents – the hypertext web

• But it is very difficult for machines to "understand" what we are saying and what we mean and what to do with the data

• But machines CAN determine if two URIs match:

<SurName>Smith<SurName> <LastName>Smith</LastName>

http://www.shared_dictionary.com/PersonGivenName

MDR

Hey, you both “mean” the same thing!

Page 36: Semantic Integration Patterns

M

D 36

Subject-Verb-Object Triple

Person

“Joe”

Has-a-Given-Name

The person is named “Joe”.

<PersonGivenName>Joe</PersonGivenName>

Page 37: Semantic Integration Patterns

M

D 37

Triples are Almost all URIs

http://MyDictionay/DataElement/Person

“Dan”

http://MyDictionay/DataElement/PersonGivenName

URIs can point to a standard location in a metadata registry.

The “type” of link.

Page 38: Semantic Integration Patterns

M

D 38

Sample RDF Document

<?xml version="1.0"?><RDF><Description about="http://www.danmccreary.com/Training/Classes/Semantic_Web">

<author>Dan McCreary</author><created>2006-01-01</created><modified> 2006-03-15</modified>

</Description></RDF>

Page 39: Semantic Integration Patterns

M

D 39

Massive Databases of "Triple Stores"

Subject Predicate Object

Triple store is:- A database with just 3 Columns- but millions/billions of rowsMay require specialized hardwareKey Metrics: - Time to load triples into application - Time to save triples into database - Time to browse to an element - Time to configure systemSample Projects:• Kowari• 3Store• Sesame

RDF "Triple Store"

See: http://simile.mit.edu/reports/stores/

Page 40: Semantic Integration Patterns

M

D 40

Semantic Web Standards Stack

URI/IRIURI/IRI UnicodeUnicode

XMLXML NamespacesNamespaces

XML QueryXML Query XML SchemaXML Schema

RDF Model & SyntaxRDF Model & Syntax

Ontology (OWL)Ontology (OWL)

Rules/QueryRules/Query

LogicLogic

ProofProof

Trusted Semantic WebTrusted Semantic Web

Sign

atur

eSi

gnat

ure

Encr

yptio

nEn

cryp

tion

Source: Tim Berners-Lee www.w3c.org

http://www.w3.org/Consortium/Offices/Presentations/SemanticWeb/34.html

Page 41: Semantic Integration Patterns

M

D 41

Example of Metadata Registry

Page 42: Semantic Integration Patterns

M

D 42

Hub and Spokes• Goal: create semantic maps to a few metadata

standard, not many standards

R5

R2

R3

R4R6

R7

RN

Mapping from one to many metadata registry to N other metadata registries: The O(N2) problem

R2

R3

R4

R5

R6

R7

RN

ESB

Mapping to one metadata registryThe O(N) problem(ESB-Enterprise Service Bus)

R1 R1

Page 43: Semantic Integration Patterns

M

D 43

May I have a beer?

Me gusteria una cerveza

Metaphor: The Translator Agent

Customer(Spanish Only)

TranslationService

(Speaks Spanishand English)

InternalServer

(English Only)

Comingright up!

Page 44: Semantic Integration Patterns

M

D 44

Semantic Mappers and Semantic Brokers

ReportRequestIn Model

A

MetadataTranslation

ServiceXML

ResponseIn Model

ATDS

In ModelB

Metadata Registry

Model A Model B

Metadata Mappings

RDFQueries

XMLResults

Gartner: Vocabulary-based transformation

Data Warehouse (RDBMS)SQL or XMLA

QueriesIn Model

B

XMLA: XML for Analysis

Page 45: Semantic Integration Patterns

M

D 45

Wikipedia Rocks!• Knowledge is growing at an exponential rate• The more there is out there, the more need there is to

re-use rather that reinvent knowledge• Tools can extract 50M RDF triples• How many instructors share their database of exam

questions and the effectiveness of each question?

See: Wikipedia: “Semantic Wiki”

Page 46: Semantic Integration Patterns

M

D 46

Open Source Learning Mgmt. System

Page 47: Semantic Integration Patterns

M

D 47

Retrieving Data: An Evolution

• Shorten the time-to-report interval• Allow users to "browse" data sets interactively• Remove programmers with "backlogs" of reports• Users frequently waited days, weeks for months to get a custom

report created

Monthly “Green Bar” ReportsBrowseableGraphical Interface(PivotTables, Cognos)

Increasing Responsiveness

Page 48: Semantic Integration Patterns

M

D 48

Metadata Discovery

• Tools that “scan” data sources and create new ontologies or mappings to existing ontologies

Metadata Registry

Data Source Mappings

Relational Database

Page 49: Semantic Integration Patterns

M

D 49

Classification and Categorization

• Whenever we decide to break the continuous observable world into a predefined list of categories when each category has a label we call this a categorical value. These will then become the "dimensions" of our cube.

• Discrete breaks in continuous values become “rules”

"red" "green" "blue"

George Lakoff: Women, Fire and Other Dangerous Things: What Categories Revel about the Mind

Note: NO OVERLAP!

“normal expense" “large expense“ (requires supervisor approval)

$500$0

Page 50: Semantic Integration Patterns

M

D 50

Federated Ontologies

What do you do when you have more than one Ontology?

1) Combine

2) Map

3) Federate

Þ Tools for combination and federationÞ “Linking is Power”

Multiple Overlapping Ontologies

Page 51: Semantic Integration Patterns

M

D 51

Cost of Poor Semantics• Information Technology Departments

can spend 40-60% of their costs on Integration

• 90% of integration costs are due to poor semantics

• If every application used and "published" a machine readable ontology with mappings to published ontologies integration could be almost "automatic"

Page 52: Semantic Integration Patterns

M

D 52

GartnerMetadata cast into formal logics will drive interoperability, automation, cost cutting, better search capabilities and new business opportunities.

Semantic Web Drives Data Management, Automation and Knowledge and Discovery

Alexander Linder

March 2005

G00125145

Page 53: Semantic Integration Patterns

M

D 53

Semantic Spectrum

Time/Money

HighSemanticPrecision

StrongSemantics

WeakSemantics

UML, XMI

Taxonomies

Ontologies

Thesaurus

RDF

XML, XSLT

See also: Wikipedia/semantic spectrum

Glossaries

OWL

Controlled Vocabularies

Word/HTML

Concept MapsEnterprise Data Models

Page 54: Semantic Integration Patterns

M

D 54

Structures for Increased Semantics

HTML PDF Word PowerPoint Excel Access Server XML RDBMS RDF Taxonomies OntologiesSOAWSDL

Increased Semantic Precision

Source: Network Inference

Page 55: Semantic Integration Patterns

M

D 55

Friend of a Friend• A "Proof of Concept for RDF"• Requires each person to put an RDF file

on their web pages• System in place to prevent spammers

from getting e-mail accounts• Sample RDF vocabulary• Sample FoaF file:<foaf:Person>

<foaf:name>Dan McCreary</foaf:name> <foaf:knows> <foaf:Person> <foaf:name>Bill Titus</foaf:name> </foaf:Person> </foaf:knows></foaf:Person>

Page 56: Semantic Integration Patterns

M

D 56

Ontology Architectures• One "big" ontology (see CycCorp cyc.com)

– Using a single "Uber-Ontology"– Akin to "Boiling the Ocean"

• Compared to:– Many smaller ontologies– Micro-formats (RDF/A)– How to combine?

CYC contains over3 Million "assertions"

Source: cyc.com

Page 57: Semantic Integration Patterns

M

D 57

If You Give A Kid A Hammer…

…the whole world becomes a nail

• People solve problems with the tools they know

• Semantics are new tools for solving computer-to-computer communication problems

• Intelligent agents will be prevalent when we teach organization to publish their metadata

• Example: Procedural vs. Declarative Programming

Page 58: Semantic Integration Patterns

M

D 58

Cognitive StylesThe way we solve problems is dependant on the tools we know how to use.

Shoshana Zuboff (1988)

In the Age of the Smart Machine

Technology creates:- new ways of thinking- new ways of approaching and solving problems- new sets of "Cognitive Styles"

It is only if we share these cognitive styles that we will be able to create a coherent technology strategy that everyone understands

Page 59: Semantic Integration Patterns

M

D 59

Agents

Open The Door To The Semantic Web!

• Metadata publishing is hard• It is a foundation upon which the Semantic Web will be built• The benefits are indirect and need strong executive sponsorship• Metadata publishing is no “silver bullet”• I believe it is the most direct way to get to the Semantic Web• This will be the most practical way to build intelligent agents

Page 60: Semantic Integration Patterns

M

D 60

Top AI Researchers Agree…

If software is ever going to be able to effectively inter-operate (in ways that were not explicitly preconceived and engineered), it will be because applications share enough of the semantics of their data elements.

Doug Lenat, CycorpSemantic Technology Conference

2005

Page 61: Semantic Integration Patterns

M

DCopyright Dan McCreary & Associates

61

Thank You• Questions…