the semantic web

69
The Semantic Web Barry Smith http://ontologist.com

Upload: hilary-shannon

Post on 02-Jan-2016

17 views

Category:

Documents


0 download

DESCRIPTION

The Semantic Web. Barry Smith http://ontologist.com. The problem of ontology. human beings can integrate highly heterogeneous information. Consider how the human mind. copes with complex phenomena in the social realm (e.g. speech acts of promising) which involve: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Semantic Web

The Semantic Web

Barry Smith

http://ontologist.com

Page 2: The Semantic Web

The problem of ontology

human beings can integrate highly heterogeneous information

Page 3: The Semantic Web

Consider how the human mind

copes with complex phenomena in the social realm (e.g. speech acts of promising)which involve: – experiences (speaking, perceiving), – intentions, – language, – action (and tendencies to action), – deontic powers, obligations, claims, authority

…– background habits, – mental competences, – records and representations

Page 4: The Semantic Web

understanding how computers can effect the same sort of integration

is a difficult problem

Page 5: The Semantic Web

A new silver bullet

Page 6: The Semantic Web

The Semantic Web

designed to integrate the vast amounts of heterogeneous online data and services

via dramatically better support at the level of metadata designed to yield the ability to query and integrate across different conceptual systems

Page 7: The Semantic Web

Tim Berners-Lee, inventor of the internet

‘sees a more powerful Web emerging, one where documents and data will be annotated with special codes allowing computers to search and analyze the Web automatically. The codes … are designed to add meaning to the global network in ways that make sense to computers’

Page 8: The Semantic Web

hyperlinked vocabularies, called

‘ontologies’ will be used by Web authors ‘to explicitly define their words and

concepts as they post their stuff online.

‘The idea is the codes would let software "agents" analyze the Web on our behalf, making smart inferences that go far beyond the simple linguistic analyses performed by today's search engines.’

Page 9: The Semantic Web

Exploiting tools such as:

XML

OWL (Ontology Web Language)

RDF (Resource Descriptor Framework)

DAML-OIL (Darpa Agent Mark-Up Language – Ontology Inference Layer)

(? confusing syntactic integration with semantic integration)

Page 10: The Semantic Web

University OntologyPerson*

Employee*

Faculty

Professor

AssistantProfessor

AssociateProfessor

FullProfessor

VisitingProfessor

Lecturer

PostDoc

Assistant

ResearchAssistant

TeachingAssistant

AdministrativeStaff

Director

Chair

Dean

ClericalStaff

SystemsStaff

Student

Undergraduate

GraduateStudent

Organization*

Department

Institute

Program

ResearchGroup

School

University

Publication*

Article*

BookArticle*

ConferencePaper*

JournalArticle*

WorkshopPaper*

Book*

Periodical*

Journal*

Magazine*

Proceedings*

Thesis*

Page 11: The Semantic Web

University Ontology Relations

advisor(Student, Professor) affiliateOf(Organization, Person)* affiliatedOrganization(Organization, Organization)* alumnus(Organization, Person)* containedIn(Document, Document)* doctoralDegreeFrom(Person, University) emailAddress(Person, .STRING)* head(Organization, Person)* listedCourse(Schedule, Course) mastersDegreeFrom(Person, University) member(SocialGroup, Person)*

Page 12: The Semantic Web

University Ontology Relations

offers(University, Course) publicationAuthor(Document, Person)* publicationDate(Document, .DATE)* publicationOrg(Document, Organization)* publicationResearch(Publication, Research) publisher(Document, Organization)* researchInterest(Person, Research) researchProject(ResearchGroup, Research) subOrganizationOf(Organization:"suborganization", Organization:"superorganization")* takesCourse(Student, Course)

Page 13: The Semantic Web

Defining ‘gene’

GDB: a gene is a DNA fragment that can be transcribed and translated into a protein

Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype

Page 14: The Semantic Web

Example: The Enterprise OntologyA Sale is an agreement between two Legal-

Entities for the exchange of a Product for a Sale-Price.

A Strategy is a Plan to Achieve a high-level Purpose.

A Market is all Sales and Potential Sales within a scope of interest.

Page 15: The Semantic Web

Example: Statements of Accounts

Company Financial statements may be prepared under either the (US) GAAP or the (European) IASC standards

These allocate cost items to different categories depending on the laws of the countries involved.

Page 16: The Semantic Web

Job:

to develop an algorithm for the automatic conversion of income statements and balance sheets between the two systems.Not even this relatively simple problem has been satisfactorily resolved

… why not?

Because the very same terms mean different things and are applied in different ways in different cultures

Page 17: The Semantic Web

VerizonThe promise of Web Services, augmented with the Semantic Web, is to provide THE major solution for integration, the largest IT cost / sector, at $ 500 BN/year. The Web Services and Semantic Web trends are heading for a major failure (i.e., the most recent Silver Bullet). In reality, Web Services, as a technology, is in its infancy. ... There is no technical solution (i.e., no basis) other than fantasy for the rest of the Web Services story. Analyst claims of maturity and adoption (...) are already false. ... Verizon must understand it so as not to invest too heavily in technologies that will fail or that will not produce a reasonable ROI.

Dr. Michael L. Brodie, Chief Scientist, Verizon ITOntoWeb Meeting, Innsbruck, December 16-18, 2002

Page 18: The Semantic Web

Assumptions

Communication / compatibility problems should be solved automatically

(by machine)

Hence ontologies must be applications running in real time

Page 19: The Semantic Web

Application ontology:

Ontologies are inside the computer

thus subject to severe constraints on expressive power

(effectively the expressive power of Description Logic)

Page 20: The Semantic Web

The Semantic Web Initiative

The Web is a vast edifice of heterogeneous data sources

Needs the ability to query and integrate across different conceptual systems

Page 21: The Semantic Web

How resolve incompatibilities?

enforce terminological compatibility via standardized term hierarchies, with standardized definitions of terms, which

1. satisfy the constraints of a description logic (DL)

2. are applied as meta-tags to the content of websites

Page 22: The Semantic Web

Clay Shirky

The Semantic Web is a machine for creating syllogisms.

Humans are mortalGreeks are humanTherefore, Greeks are mortal

Page 23: The Semantic Web

Lewis Carroll

- No interesting poems are unpopular among people of real taste - No modern poetry is free from affectation - All your poems are on the subject of soap-bubbles - No affected poetry is popular among people of real taste - No ancient poetry is on the subject of soap-bubbles

Therefore: All your poems are bad.

Page 24: The Semantic Web

the promise of the Semantic Web

it will improve all the areas of your life where you currently use syllogisms

Page 25: The Semantic Web

most of the data we use is not amenable to recombination in syllogistic formbecause it is partial, inconclusive, context-sensitive

So we guess, extrapolate, intuit, we do what we did last time, we do what we think our friends would do … but we almost never use syllogistic logic.

Page 26: The Semantic Web

We Describe the World in Generalities

People who live in Brooklyn speak with a Brooklyn accent

People who live in France speak French

Page 27: The Semantic Web

Merging Databases

Merging databases simply becomes a matter of recording in RDF somewhere that "Person Name" in your database is equivalent to "Name" in my database, and then throwing all of the information together and getting a processor to think about it. [http://infomesh.net/2001/swintro/]

Is your "Person Name = John Smith" the same person as my "Name = John Q. Smith"? Who knows? Not the Semantic Web

Page 28: The Semantic Web

XML-syntax does not help<BUSINESS-CARD>

 <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD>

Page 29: The Semantic Web

and with correct XML-syntax:<BUSINESS-CARD>

 <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17 </STREET>  

Page 30: The Semantic Web

and with correct XML-syntax:<BUSINESS-CARD>

 <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD>

Is "Jules" the first name of the person, or of the business-card?

Page 31: The Semantic Web

and with correct XML-syntax:<BUSINESS-CARD>

 <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD>

Is Jules or Newco the member of XTC Group?

Page 32: The Semantic Web

and with correct XML-syntax:<BUSINESS-CARD>

 <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD>

Do the phone numbers and address belong to Jules or to the business?

Page 33: The Semantic Web

Metadata: the new Silver Bullet

agree on a metadata standard for washing machines as concerns size, price, etc.create machine-readable databases and put them on the net consumers can query multiple sites simultaneously and search for highly specific, reliable, context-sensitive results

Page 34: The Semantic Web

Shirkey:

The Semantic Web's philosophical argument -- the world should make more sense than it does -- is hard to argue with. The Semantic Web, with its neat ontologies and its syllogistic logic, is a nice vision. However, like many visions that project future benefits but ignore present costs, it requires too much coordination and too much energy to be effective in the real world …

Page 35: The Semantic Web

Shirkey

Much of the proposed value of the Semantic Web is coming, but it is not coming because of the Semantic Web. The amount of meta-data we generate is increasing dramatically, and it is being exposed for consumption by machines as well as, or instead of, people. But it is being designed a bit at a time, out of self-interest and without regard for global ontology.

Page 36: The Semantic Web

Semantic Web effort

thus far devoted primarily to developing systems for standardized representation of web pages and web processes

(= ontology of web typography)

not to the harder task of developing of ontologies (term hierarchies) for the content of such web pages

Page 37: The Semantic Web

Cory Doctorow

A world of exhaustive, reliable metadata would be a utopia.

Page 38: The Semantic Web

Problem 1: People lie

Meta-utopia is a world of reliable metadata. But poisoning the well can confer benefits to the poisoners

Metadata exists in a competitive world.Some people are crooks. Some people are cranks. Some people are French philosophers.

Page 39: The Semantic Web

Practical problems

of the semantic web:

who will police the coding?

Page 40: The Semantic Web

Problem 2: People are lazy

Half the pages on Geocities are called “Please title this page”

Page 41: The Semantic Web

Problem 3: People are stupid

The vast majority of the Internet's users (even those who are native speakers of English)cannot spell or punctuate Will internet users learn to accurately tag their information with whatever DL-hierarchy they're supposed to be using?

Page 42: The Semantic Web

Problem 4: Multiple descriptions

“Requiring everyone to use the same vocabulary denudes the cognitive landscape, enforces homogeneity in ideas.”(Cary Doctorow)

Page 43: The Semantic Web

Problem 5: Ontology Impedance

= semantic mismatch between ontologies being merged

This problem recognized in Semantic Web literature:

http://ontoweb.aifb.uni-karlsruhe.de

/About/Deliverables/ontoweb-del-7.6-swws1.pdf

Page 44: The Semantic Web

Solution 1:treat it as (inevitable)

‘impedance’

and learn to find ways to cope with the disturbance which it brings

Suggested here:

http://ontoweb.aifb.uni-karls-ruhe.de/Ab-out/Deliverables/ontoweb-del-7.6-swws1.pdf

Page 45: The Semantic Web

Solution 2: resolve the impedance problem on a case-by-case basis

Suppose two databases are put on the web.

Someone notices that "where" in the friends table and "zip" in the places table mean the same thing.

http://www.w3.org/DesignIssues/Semantic.html

Page 46: The Semantic Web

We can use the Semantic Webto prove that Joe loves Mary

we found two documents on a trusted site, one of which said that ":Joe :loves :MJS", and another of which said that ":MJS daml:equivalentTo :Mary". We also got the checksums of the files in person from the maintainer of the site.

To check this information, we can list the checksums in a local file, and then set up some FOPL rules that say "if file 'a' contains the information Joe loves mary and has the checksum md5:0qrhf8q3hfh, then record SuccessA", "if file 'b' contains the information MJS is equivalent to Mary, and has the checksum md5:0892t925h, then record SuccessB", and "if SuccessA and SuccessB, then Joe loves Mary". [http://infomesh.net/2001/swintro/]

Page 47: The Semantic Web

Both solutions fail

1. treating mismatches as ‘impedance’ ignores the problem of error propagation

(and is inappropriate in an area like medicine)

2. resolving impedance on a case-by-case basis defeats the very purpose of the Semantic Web

Page 48: The Semantic Web

Clinicians

often do not use category systems at all – they use unstructured text

from which usable data has to be extracted in a further step

Why?

Because every case is different, much patient data is context-dependent

Page 49: The Semantic Web

Problem 5: Ontology Impedance

= semantic mismatch between ontologies

‘gene’ used in websites issued by

biotech companies involved in gene patenting

medical researchers interested in role of genes in predisposition to smoking

insurance companies

Page 50: The Semantic Web

Other problems with DL-based ontologies

DL poor when dealing with context-dependent information/usages of termse.g. Severe Acute Respiratory Syndrome

and when it comes to dealing with time

and when it comes to dealing with information about instances (rather than concepts or classes)

Page 51: The Semantic Web

SARS

is NOT

Severe Acute Respiratory Syndrome

it is THIS collection of instances of

Severe Acute Respiratory Syndrome

associated with THIS coronavirus and ITS mutations

Page 52: The Semantic Web

Experience shows

that there can be no mechanical solution to the problems of data integration

in domains like medicine or genetics,

or in the domain of really existing commercial transactions

Page 53: The Semantic Web

The problem in every case

is one of finding an overarching framework for good definitions,

definitions which will be adequate to the nuances of the domain under investigation

Page 54: The Semantic Web

For DL

Ontologies are software tools

thus limited in their expressive power

and in their effectiveness as quality controls

Page 55: The Semantic Web

IFOMIS idea:

distinguish two separate tasks:

- developing computer applications capable of running in real time

- developing an expressively rich ontology of a sort which will allow sophisticated quality control

Page 56: The Semantic Web

Problem 4: Multiple descriptions

Requiring everyone to use the same vocabulary to describe their material is not always practicable

and this is especially so in the medical domain

Page 57: The Semantic Web

Basic Formal Ontology

BFOThe Vampire Slayer

Page 58: The Semantic Web
Page 59: The Semantic Web

BFO

ontology not the ‘standardization’ or ‘specification’ of concepts

(not a branch of knowledge or concept engineering)

but an inventory of the types of entities existing in reality

Page 60: The Semantic Web

BFO goal:

to remove ontological impedance by constraining terminology systems with good ontology

Page 61: The Semantic Web

BFO not a computer application

but a reference ontology

in the sense of Aristotelian philosophyin the sense of Aristotelian philosophy

-- it sacrifices tractability for the sake of -- it sacrifices tractability for the sake of expressive powerexpressive power

Page 62: The Semantic Web

Defining ‘gene’

GDB: a gene is a DNA fragment that can be transcribed and translated into a protein

Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype

Page 63: The Semantic Web

Ontology

‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’, ‘type’

... ‘part’, ‘whole’, ‘function’, ‘inhere’, ‘substance’ …

are ontological terms in the sense of traditional (philosophical) ontology

Page 64: The Semantic Web

Two basic BFO oppositions

Granularity

(of molecules, genes, cells, organs, organisms ...)

SNAP vs. SPAN

getting time right of crucial importance for medical informatics

Page 65: The Semantic Web

MedO: medical domain ontologytheory of granularity relations between – molecule ontology– gene ontology– cell ontology– anatomical ontology– etc.Will serve as basis for new, validated Medical

WordNet

Page 66: The Semantic Web

BFO

not just a system of categoriesbut a formal theory with definitions, axioms, theoremsdesigned to provide formal resources for the

building of reference ontologies for specific domains

the latter should be of sufficient richness that terminological incompatibilities can be resolved intelligently rather than by brute force

Page 67: The Semantic Web

The Reference Ontology Community

IFOMIS (Leipzig) Laboratories for Applied Ontology (Trento/Rome,

Turin)Foundational Ontology Project (Leeds)Ontology Works (Baltimore)Ontek Corporation (Buffalo/Leeds)Language and Computing (L&C)

(Belgium/Philadelphia)

Page 68: The Semantic Web

Domains of Current Work

IFOMIS Leipzig: Medicine, BioinformaticsLaboratories for Applied Ontology

Trento/Rome: Ontology of Cognition/LanguageTurin: Law

Foundational Ontology Project: Space, PhysicsOntology Works: Genetics, Molecular BiologyOntek Corporation: Biological SystematicsLanguage and Computing: Natural Language

Understanding

MOG (Melbourne Ontology Group)(?)MOG (Melbourne Ontology Group)(?)

Page 69: The Semantic Web

The End