research issues & challenges in semantic webdure.net/ebiz04052.pdf · 2017-02-20 · research...
TRANSCRIPT
Research Issues & ChallengesResearch Issues & Challengesin Semantic Webin Semantic Web
Jinsoo Park, Ph.D.
Assistant ProfessorCollege of Business Administration
Korea [email protected]
http://ids.korea.ac.kr
Jinsoo Park2
Self-Introduction
Short Bio1999, Ph.D. in Information Systems, The University of Arizona
1999.9 – 2002.8, Assistant Professor, Carlson School of Management, University of Minnesota
2002.9 – Present, Assistant Professor, Business School, Korea University
Research AreasContent and Metadata Management in Intra- and Inter-organizational Information Systems
Semantic Interoperability and Integration
Knowledge Sharing and Coordination
Ontology
Teaching Areas
PhD – Research Methods
MBA/MS – IS Development Methodologies, AI, Databases, Data Structures & Algorithms, Java
Undergraduate – Systems Analysis and Design, MIS, IT Infrastructure
Jinsoo Park3
e-Business – Technical Challenges
Communication Security & Reliability
System Heterogeneity
Data/Information Heterogeneity
Business Process Heterogeneity
Dynamic Business and Technology Heterogeneity
Jinsoo Park4
Inter-organizational Interoperability
Interaction with diverse, complex enterprises
Interoperability
Buyer
Enterprise Applications
ERP
SCM
Supplier
Enterprise Applications
CRM
SFA
Buyers/Suppliers
ERP SCM CRM SFAEnterprise
Applications
Shim et al. (2000)
Jinsoo Park5
Motivations
Vast amounts of data and escalating
Highly heterogeneous – a plethora of semantic conflicts
Data types, data formats, structures, community, …
Considerable amount of legacy data with no associated metadata
The growth in existing data far exceeds our abilities to locate and analyze the relevant data
“Enterprise data integration is the top item on every CIO’s wish list. So what are we doing about it?”
“Are We Working On The Right Problems?,” Plenary Panel led by Michael Stonebraker, 1998 ACM SIGMOD Conf.
“The diversity of information content and formats is a salient factor in nearly all distributed systems, and the major challenge is to make diverse information systems interoperate at the semantic level while retaining their difference.”
March, Hevner & Ram, Information Systems Research, 2000
98% of companies recently interviewed say that integration is either “extremely important” or “very important” to their firm’s IT strategy
Forrest Research, 2001
Sponsors: NSF, NASA, and NIH
Jinsoo Park6
Interoperability
Interoperability
Syntactic Level (Application Level)
Semantic Level (Knowledge Level)
Interface Message Transport Protocol
MetadataOntology Context Agent
Technology Solution
Linguistic, Social, and Philosophical Solution
Ram, Park and Lee (1999)
Jinsoo Park7
What is Semantics?
The meaning and the use of data (Woods, 1975)
“meaning or relationship of meanings, or relating to meaning (Webster)
Week vs. Deep Semantics (Sheth, 1995)
Week semantics - semantics that can be identified based on structural, syntactic, and value/extensional information in databases
Deep semantics - semantics that involve the issues of human cognition, perception, or interpretation
Example
47
Apples are expensive
A Û 100 ~ 91?
Semantics bring information closer to human thinking and decision-making
Jinsoo Park8
Semantics-based Communication
Theory of communication that links results from semiotics, linguistics and philosophy into actual information technology
Meaning Triangle (Odgen and Richards, 1923)
Concept
Symbol Thing
evokes refers torefers to
stands for
Jinsoo Park9
Semantic Interoperability Problems
Contextual differences between source and target information systems
Different vocabularies, taxonomies, schemas
Implicit semantics – tacit knowledge
Lack of separation between content, intent and process
Embedded rules
Consistency between different versions of the same schema
Jinsoo Park10
Research on Semantic Interoperability
DataDataLevelLevel
AnalysisAnalysis
DataDataLevelLevel
AnalysisAnalysis
Analysis of the differences in data domains caused by the multiple representations and interpretations of similar data
DeMichiel (1989), Yu et al. (1991), Ventrone & Heiler (1991), Sciore et al. (1994), Kahng & McLeod (1998), Goh et al. (1999)
Analysis of the differences in data domains caused by the multiple representations and interpretations of similar data
DeMichiel (1989), Yu et al. (1991), Ventrone & Heiler (1991), Sciore et al. (1994), Kahng & McLeod (1998), Goh et al. (1999)
SchemaSchemaLevelLevel
AnalysisAnalysis
SchemaSchemaLevelLevel
AnalysisAnalysis
Analysis of differences in logical structures and/or inconsistencies in metadata (i.e., schemas) of the same application domain
Batini & Lenzerini (1984), Navathe et al. (1986), Geller et al. (1992), Garcia-Solaco et al. (1995), Lakshmanan et al. (1997)
Analysis of differences in logical structures and/or inconsistencies in metadata (i.e., schemas) of the same application domain
Batini & Lenzerini (1984), Navathe et al. (1986), Geller et al. (1992), Garcia-Solaco et al. (1995), Lakshmanan et al. (1997)
Few research has been done on both levels at the same time
Jinsoo Park11
An Example – Data-Level Conflicts
DISCLOSURE
COMPNO
Attribute
CF
NI
NS
NRCEX(ROE)
3842
Value
19,860,228
146,502
2,909,574
0.11
DATALINE
CODE
Attribute
PERIOD END
EARNED FORORDINARY
TOTAL SALES
RETURN ONSHAREHOLDER
EQUITY
HOND
Value
28-02-86
146,502
2,909,574
19.57
Jinsoo Park12
An Example – Schema-Level Conflicts
DB 1
YEAR
TAX
TAX-TYPE AMOUNT
1999
1999
Property
Water
2000
2000
Property
Water
250.34
38.99
234.98
59.05
DB 2
YEAR
TAX-AMOUNT
PROPERTY WATER
1999
2000
250.34 38.99
234.98 59.05
DB 3
PROPERTY WATER
YEAR AMOUNT
1999
2000
250.34
234.98
YEAR
1999
2000
AMOUNT
38.99
59.05
Jinsoo Park13
The Revolution of the Web
Trusted Web Resources
HyperText Markup Language (HTML)
HyperText Transfer Protocol (HTTP)
Resource Description Framework (RDF)eXtensible Markup Language (XML) Self-Describing Documents
Formatted DocumentsFoundation of the Current Web
Proof, Logic andOntology Languages
(e.g., DAML+OIL)Shared terms/terminologyMachine-Machine communication
1990
2000
2010
Berners-Lee and Hendler (2001), Nature
Jinsoo Park14
The Current Web
Global information space for human consumption.
Information and its presentations are mixed up.
Accessible by merely keywords: high recall, low precision
No distinction of the keyword search “Rose” among these concepts:
Rational Rose, Gun ’n Roses, Rose (flower), Rose (Titanic), England’s Rose.
Difficult for machines to automatically comprehend, process, communicate and interoperate.
Problems in information:finding,
extracting,
representing,
interpreting,
maintaining.
Jinsoo Park15
The Semantic Web
“The Semantic Web is the representation of data on the World Wide Web (based on the RDF standards and other standards to be defined).” (http://www.w3.org/2001/sw/)
Envisioned by Tim Berners-Lee and researched by DARPA team and others
“A web of data that can be processed directly or indirectly by machines”
Tim Berners-Lee, Weaving the Web, HarperBusiness, 2000.
The “Next Generation Web” with well-established infrastructure for expressing information in a
precise,
human-readable, and
machine-interpretable form.
Jinsoo Park16
The Vision
Agents Web Services
Grid Computing
e-Business
e-Science
[Source: C. Globe, “Information Grids, the Semantic Web & Why Ontologies Matter”]
Jinsoo Park17
Current Research and Technologies
Semantic Web technologies are still very much in their infancies
Little consensus about the likely direction of the Semantic Web
No widespread agreement on exactly what the Semantic Web is
Infrastructure
XML(S), RDF(S)
Ontology language
DAML+OIL, OWL, …
Two paradigms in semantic interoperability
Data warehousing (eager) approach
On-demand driven (lazy) approach
Jinsoo Park18
Benefits of XML over HTML
(b)XML<?xml version=“1.0”?>
<document>
<productInfo>
<product>LaserJet1150</product>
<regularPrice>380,000</regularPrice>
<ourPrice>357,000 </ourPrice>
<inStock>yes</inStock>
</productInfo>
</document>
(a)HTML<html>
<body topmargin=20 leftmargin=10>
<font size=3>
<table width="389" border="1">
<tr>
<td height="82" valign="middle">
<pre>
Regular Our
Price Price
LaserJet1150 380,000 357,000 In stock
</pre>
</td>
</tr>
</table>
...
</font>
</body>
</html>
Jinsoo Park19
But XML faces following problems …
Multiple Standards
Need for consistent and standardized tags
There are so many XML standards
“there are more than a dozen XML protocols - for Financial Trading applications alone”
(Chairman of a Financial Services XML Working Group)
e.g., (price, cost), (subject, theme, title), (car, automobile) ...
Implicit Semantics
Agreement upon the precise meaning of each tag
e.g., How precisely defined is the notion of “price”Is it in dollars($) or won (\)?
Even if it is “Dollars” is it US dollars, Canadian dollars, or Hong Kong dollars?
Does the “price” include sales tax? Does it include the value added tax (VAT)?
About notion of “title”It is a movie title or a drama title?
About notion of “bank”
It is a financial institution or a river embankment
Modeling Conflicts
Jinsoo Park20
But XML faces following problems
Evolution of Semantics
Problem of evolution
e.g., Conversion form using local currency to using Euros in Europe
e.g., GMDaewoo, RenaultSamsung
Multiple Purposes
Different purposes necessitate different interpretations of the information
e.g., Student
Professor – Taking courses
Staff – Registration
e.g., Corporate household/family structure
Financial – Risk (credit - bankruptcy)
Accounting – Account consolidation
Legal – Liability (insurance)
and these are dynamic, changing over time ..
Jinsoo Park21
RDF & RDF Schema
RDF (Resource Description Framework)Represents metadata about Web resources
e.g., title, author, and modification data of a Web page …
Data model → resource, property, property value
rdf:Description, rdf:ID, rdf:type
Purport to provide interoperability between applications that exchange machine-understandable information on the Web
RDF SchemaProvides semantics about RDF
a.k.a. RDF Vocabulary Description Language
XML schema: about syntax
Defines an appropriate RDF vocabulary (classes, properties and constraints) for each specific domain
Extension of data model → class and property hierarchy
rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain and rdfs:range
Logical connectives such as conjunction, disjunction, and negation are not provided
Not full-fledged ontological modeling and reasoning
Jinsoo Park22
RDF & RDF Schema
<rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”xmlns:rdfs=“http://www.w3.org/2000/01/rdf-schema#”xml:base=“ http://ids.korea.ac.kr/student.rdfs”>
<rdfs:Class rdf:ID="universityStudent“/><rdfs:Class rdf:ID=“undergraduateStudent">
<rdfs:subClassOf rdf:resource="#universityStudent"/></rdfs:Class><rdfs:Class rdf:ID="graduateStudent">
<rdfs:subClassOf rdf:resource="#universityStudent"/></rdfs:Class>
<rdf:Property rdf:ID=“degree"><rdfs:domain rdf:resource="#graduateStudent"/><rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
</rdf:Property></rdf:RDF>
universityStudent
undergraduteStudent graduateStudent
Properties:degree
<rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”xmlns=“http://ids.korea.ac.kr/student.rdfs#“>
<rdf:Description rdf:about =“http://ids.korea.ac.kr/graduateStudent.rdf#Honggildong”><rdf:type resource=“http://ids.korea.ac.kr/student.rdfs#graduateStudent”/><degree>MIS</degree>
</rdf:Description> </rdf:RDF>
RDF
RDF Schema
Jinsoo Park23
OWL
Web Ontology Language
RDF schema is lacking in some desirable expressiveness
People use different words to represent the same thing
cardinality constraints, conjunction, disjunction …
OWL extends RDF Schema
Uses all RDF Schema’s basic notions of Class, Property, domain, and range
Adds more vocabulary for describing properties and classes
relations between classes (e.g., disjointness)
cardinality (e.g., “exactly one”)
richer typing of properties
characteristics of properties (e.g., symmetry)
enumerated classes
OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms
Jinsoo Park24
CREAM
Conflict Resolution Environment for Autonomous Mediation
An integrated and collaborative facility for achieving semantic interoperability among the participating heterogeneous information sources.
Agent-based Mediation Architecture
SCROL – Semantic Conflict Resolution OntoLogy
Domain independent
Semantic Query Transformation
Park & Ram (2004), ACM Transactions on Information Systems
Jinsoo Park25
Research Questions
What kinds of semantic conflicts are typically found in a heterogeneous environment?
How do we recognize & resolve such conflicts
To what extent can we automate the process of conflict identification & resolution using mediators?
Metadata Layer
Common Repository
SCROL
Schema Designer
Schema Mapper
Ontology Mapper
Wrapper Generator
Data AccessLayer
Data Exchange Layer
XML Generator
XSL Generator
DTD Generator
Output Generators Semantic Filter
QBS InterfaceUsers
InformationIntegrator
Semantic Mediation Layer
Databasesource Web Source
Databasesource Web Source
Semantic Mediators
Semantic Mediators
RMIWrapper
RMIWrapper
ContentWrapper
ContentWrapper
Jinsoo Park27
Semantic Integration
SCROL
Federated Schema
XML Schema
XML DTD
Business Doc/Schema
Users
Semantic Mediation Service Layer
DB Schema
Jinsoo Park28
Metadata – Semantic Model
Ram, Park and Ball (1999), IEEE Computer.
Jinsoo Park29
SCROL – Semantic Conflict Resolution OntoLogy
L = (OC, OI, RP, RS, RM, u)
OC - concepts
OI - instances
RP - parenthood relationship (subconcept-of/superconcept-of, instance-of)
RS - sibling relationship (disjoint, peer, part-of, is-a)
RM - (domain instance value) mapping relationship (one-one, one-many, many-many, total, partial, none)
u - root
Ram & Park (2004), IEEE Transactions on Knowledge and Data Engineering
Jinsoo Park30
SCROL – Graphical Illustration
RootConcept
Concept Concept
Concept Concept Concept Concept Concept
Concept Concept ConceptConcept
ConceptConcept Concept
Concept Concept Concept
Instance InstanceInstance
InstanceInstanceInstance
Instance
InstanceInstance InstanceInstance
disjoint
disjoint
is-a
mapping
is-a
peer
mapping
mapping
part-of
part-of
mapping
Jinsoo Park31
SCROL Interface
Jinsoo Park32
Ontology-Schema Mapping Example
Image
PictureMap
Drawing
BMP
TIFF
JPEG
GIF
VSD
DWG
Vector
Raster
peer
Scale
10-3 10-2 10310-1 100 101 102
Temporal_Format
Date
Day
Julian DateType
String Type
mm/dd/yy
yyyy/mm/dd
Month Day, Year
Date, Month Day, Year
StringCardinal_
Numberpeer
Area
Square Meter
Acre
Graphical Location
CoordinateDescriptive Location
UTMCode String
peer
Duration
Day
MonthWeek
Region
City
Country
State/Province
Town
City
part-of
County-Population City-Population
name
size
census-starting-
date
census-ending-date
location
area
imagename
sizeduration
area area-size
map
census-starting-
date
Jinsoo Park33
Ontology-Schema Mapper
Jinsoo Park34
Semantic Mediators
SCROL
User
Coordinator
ConflictDetector
QueryGenerator
Selector
DataCollector
ConflictResolver
(1) Ask Query
(3)
Ask
Sem
anti
c C
onfl
icts
in th
e R
eque
sted
Que
ry
(5) Identify Conflicts
(4) Traverse SCROL
(6)
Rep
ly S
earc
hing
Res
ults
(6)Resolvable?
NO
(7a)
Rep
ort
to
Dom
ain
Ex
pert
s
YES
(7b ) Ask D
irectory Serv ice and
Local Q
uery Statem
ents
(8b)
Ask
Loc
al Q
uery
Stat
emen
tsMetadataDirectory
(8a) Retrieve DirectoryInformation
Remote System
Remote System
Remote System
RMIServer
RMIServer
RMIServer
(11)
Ret
riev
e Q
uery
Res
ult
(11) Retrieve Q
uery Result
(11) RetrieveQuery Result
(13) Ask Semantic Reconciliation
(14) Reply Resolved Results
(15) Display Query Results
(12) Tell Query Results
MessageGenerator
(2) Generate
Message
(10) Ask RMI to getQuery Result Sets
(9b) Reply GeneratedQuery Sets
(9a) Reconcile ConditionStatements
Jinsoo Park35
Semantic Mediator Communication Protocol
Theory of Speech Acts (Austin 1962, Searle 1969)
PerformativesASK-ALL(QID, Query) - asking the collection of local queries.
ASK-IF(Query) - asking if Query holds.
DELIVER(QueryResults) - reporting the query results.
DETECT(Query) - traversing the SCROL to check semantic conflicts.
GENERATE(Query) - requesting local query generation.
LOCATE(Query) - requesting directory service to retrieve directory information.
RECONCILE(QueryResults) - requesting semantic reconciliation for the query results.
REPLY-ALL(QID, QueryResults) - replying all the query results being asked.
REPLY-IF(Query, Answer) - replying the Answer upon being asked if Query.
REPORT(QneryResults) - reporting the query results.
RESOLVE-IF(Query, Answer) - reporting the Answer upon being asked if Query can be resolvable.
TELL(Query) - notifying and updating the query request.
Jinsoo Park36
Key Issues and Potential Research Directions …
Integration vs. Interoperability
IntegrationIntegrationbased based
approach approach
IntegrationIntegrationbased based
approach approach
attempts to build a monolithic view of the enterprise
integrates processes and applications at the event and message levels so multiple systems become one logical unit
attempts to build a monolithic view of the enterprise
integrates processes and applications at the event and message levels so multiple systems become one logical unit
InteroperabilityInteroperabilitybased based
approach approach
InteroperabilityInteroperabilitybased based
approach approach
focuses on the exchange of meaningful, context-driveninformation between autonomous systemsfocuses on the exchange of meaningful, context-driveninformation between autonomous systems
Jinsoo Park37
Key Issues and Potential Research Directions …
Machine Understandable Semantics
How can software agents learn something about the meaning of a term that it has never before encountered?
Semantic Mediation and Semantic Query Processing
Conflict Detection and Resolution
Semantic Normalization
C (e1) = C (e2)
Semantic Mapping and Translation
Semantic Association
Dynamic Evolution
Jinsoo Park38
Key Issues and Potential Research Directions
Ontology Heterogeneity
Different knowledge representation formalism
Language heterogeneity – when ontologies are expressed using different ontology languages
Naming conflicts
e.g., synonyms, homonyms, etc.
Modeling conflicts
e.g., Total Number of Employees could be attributed to inclusion or exclusion of Temporary Employees
Temporal conflicts
Arises when entity values or definitions belong to different times, or time intervals
Conceptualization conflicts
e.g., time intervals vs. time points
Ontology Learning
Jinsoo Park39
Q & A