bethesda, maryland, april 6, 1999
DESCRIPTION
Bethesda, Maryland, April 6, 1999. Semantic Interoperability and Information Brokering in Global Information Systems. Amit Sheth Large Scale Distributed Information Systems Lab University of Georgia http://lsdis.cs.uga.edu. autonomy. Information Integration Perspective. distribution. - PowerPoint PPT PresentationTRANSCRIPT
Bethesda, Maryland, April 6, 1999Bethesda, Maryland, April 6, 1999
Amit ShethAmit ShethLarge Scale Distributed Information Systems LabLarge Scale Distributed Information Systems Lab
University of GeorgiaUniversity of Georgia
http://lsdis.cs.uga.eduhttp://lsdis.cs.uga.edu
Information Integration PerspectiveInformation Integration Perspectivedistribution
auto
nom
y
heterogeneity
Three perspectives to GlobISThree perspectives to GlobIS
Information Brokering PerspectiveInformation Brokering Perspective
data
meta-data
semantic (terminological,contextual)
““Vision” PerspectiveVision” Perspectivedataconnectivity computing
information
knowledge
MermaidMermaidDDTSDDTS
Multibase, MRDSM, ADDS, Multibase, MRDSM, ADDS, IISS, Omnibase, ...IISS, Omnibase, ...
Generation IGeneration I1980s1980s
Evolving targets and approaches in integratingEvolving targets and approaches in integratingdata and information data and information (a personal perspective)(a personal perspective)
DL-II projectsDL-II projectsADEPT,ADEPT,InfoQuiltInfoQuilt
Generation IIIGeneration III1997...1997...
InfoSleuth, KMed, DL-I projectsInfoSleuth, KMed, DL-I projectsInfoscopes, HERMES, SIMS, Infoscopes, HERMES, SIMS,
Garlic,TSIMMIS,Harvest, RUFUS,... Garlic,TSIMMIS,Harvest, RUFUS,...
Generation IIGeneration II1990s1990s
VisualHarnessVisualHarnessInfoHarnessInfoHarness
a society for ubiquitous exchange of (tradeable) information in all digital forms of representation;
information anywhere, anytime, any forms
Generation IGeneration I
• Data recognized as corporate resource — leverage it!
• Data predominantly in structured databases, different data models, transitioning from network and hierarchical to relational DBMSs
• Heterogeneity (system, modeling and schematic) as well as need to support autonomy posed main challenges; major issues were data access and connectivity
• Information integration through Federated architecture
• Support for corporate IS applications as the primary objective, update often required, data integrity important
(heterogeneity in FDBMSs)
CCoommmmuunniiccaattiioonn
Hardware/System• instruction set• data representation/coding• configuration
Operating System• file system• naming, file types, operation• transaction support• IPC
Database System• Semantic HeterogeneitySemantic Heterogeneity• Differences in DBMSDifferences in DBMS
• data models data models (abstractions, constraints, query languages)• System level support System level support (concurrency control, commit, recovery)
1970s1970s
1980s1980s
Generation IGeneration I
Generation IGeneration I(Federated Database Systems: Schema Architecture)
ComponentDBS
LocalSchema
ComponentSchema
ExportSchema
ExportSchema
ExportSchema
FederatedSchema
ExternalSchema
ExternalSchema
. . .. . .
ComponentDBS
LocalSchema
ComponentSchema
. . .. . .
. . .. . .
. . .. . .
. . .. . .
schematranslation
schemaintegration
• Model Heterogeneity: Common/Canonical Data Model Schema Translation
• Information sharing while preserving autonomy
• Dimensions for interoperability and integration: distribution, autonomy and heterogeneity
(characterization of schematic conflicts in multidatabase systems)
SchematicSchematicConflictsConflicts
Sheth & Kashyap, Kim & SeoSheth & Kashyap, Kim & Seo
Generalization Conflicts
Aggregation Conflicts
Abstraction LevelAbstraction LevelIncompatibilityIncompatibility
Data Value Attribute Conflict
Entity Attribute Conflict
Data Value Entity Conflict
SchematicSchematicDiscrepanciesDiscrepancies
Naming Conflicts
Database Identifier Conflicts
Schema Isomorphism
Conflicts
Missing Data Items Conflicts
Entity DefinitionEntity DefinitionIncompatibilityIncompatibility
Naming Conflicts
Data Representation Conflicts
Data Scaling Conflicts
Data Precision Conflicts
Default Value Conflicts
Attribute Integrity Constraint Conflicts
Domain DefinitionDomain DefinitionIncompatibilityIncompatibility
Known Inconsistency
Temporal Inconsistency
Acceptable Inconsistency
Data ValueData ValueIncompatibilityIncompatibility
B U Tthese techniques for dealing with schematic heterogeneity do not directly map to dealing with much larger variety of heterogeneous
media
Generation IGeneration I
Generation IIGeneration II
• Significant improvements in computing and connectivity (standardization of protocol, public network, Internet/Web); remote data access as given;
• Increasing diversity in data formats, with focus on variety of textual data and semi-structured documents
• Many more data sources, heterogeneous information sources, but not necessarily better understanding of data
• Use of data beyond traditional business applications: mining + warehousing, marketing, e-commerce
• Web search engines for keyword based querying against HTML pages; attribute-based querying available in a few search systems
• Use of metadata for information access; early work on ontology support distribution applied to metadata in some cases
• Mediator architecture for information management
(limited types of metadata, extractors, mappers, wrappers)
Generation IIGeneration II
Global/EnterpriseWeb Repositories
METADATAMETADATA
EXTRACTORSEXTRACTORS
Digital Maps
NexisUPIAP
Documents
Digital Audios
Data Stores
Digital Videos
Digital Images. . .
. . . . . .
Find Marketing Manager positions in a company that is within 15 miles of San Francisco and whose stock price has been growing at a rate of at least 25% per year over the last three years
Junglee, SIGMOD Record, Dec. 1997
(a metadata classification: the informartion pyramid)
Generation IIGeneration II
Data (Heterogeneous Types/Media)(Heterogeneous Types/Media)
Content Independent Metadata (creation-date, location, type-of-sensor...)(creation-date, location, type-of-sensor...)
Content Dependent Metadata (size, max colors, rows, columns...)(size, max colors, rows, columns...)
Direct Content Based Metadata (inverted lists, document vectors, WAIS, Glimpse, LSI)(inverted lists, document vectors, WAIS, Glimpse, LSI)
Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML(C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...)Document Type Definitions, C program structure...)
Domain Specific Metadata area, population (Census),area, population (Census), land-cover, relief (GIS),metadata land-cover, relief (GIS),metadata concept descriptions from ontologiesconcept descriptions from ontologies
OntologiesClassificationsClassificationsDomain ModelsDomain Models
User METADATA STANDARDSMETADATA STANDARDS
General Purpose:Dublin Core, MCF
Domain/industry specific:Geographic (FGDC, UDK, …),
Library (MARC,…)
Move in thisMove in this direction to direction to tackletackle informationinformation overload!! overload!!
VisualHarness – an exampleVisualHarness – an example
Query processing and information requestsQuery processing and information requests
NOWNOW traditional queries based on keywords attribute based queries content-based queries
NEXTNEXT ‘high level’ information requests involving ontology-based, iconic, mixed-media, and media-independent information rrequests user selected ontology, use of profiles
What’s next (after comprehensive use of metadata)?What’s next (after comprehensive use of metadata)?
GIS Data Representation – ExampleGIS Data Representation – Example
multiple heterogeneous metadata models with different tag names for the same data in the same GIS domain
FGDC Metadata ModelFGDC Metadata ModelTheme keywordsTheme keywords:: digital line graph,
hydrography, transportation...
TitleTitle: Dakota Aquifer
Online linkageOnline linkage::http://gisdasc.kgs.ukans.edu/dasc/
Direct Spatial Reference Method:Direct Spatial Reference Method: Vector
Horizontal Coordinate System Definition:Horizontal Coordinate System Definition:Universal Transverse Mercator
… … … ...
UDK Metadata ModelUDK Metadata ModelSearch termsSearch terms:: digital line graph,
hydrography, transportation...
TopicTopic:: Dakota Aquifer
Adress Id:Adress Id:http://gisdasc.kgs.ukans.edu/dasc/
Measuring Techniques:Measuring Techniques: Vector
Co-ordinate System:Co-ordinate System:Universal Transverse Mercator
… … … ...
Kansas StateKansas State
Generation IIIGeneration III
• Increasing information overload and broader variety of information content (video content, audio clips etc) with increasing amount of visual information, scientific/engineering data
• Continued standardization related to Web for representational and metadata issues (MCF, RDF, XML)
• Changes in Web architecture; distributed computing (CORBA, Java)
• Users demand simplicity, but complexities continue to rise
• Web is no longer just another information source, but decision supportdecision support through “data mining and information discovery, information fusion, information dissemination, knowledge creation and management”, “information management complemented by cooperation between the information system and humans”
• Information Brokering Architecture proposed for information management
Information Brokering: An Enabler for the InfocosmInformation Brokering: An Enabler for the Infocosm
INFORMATION/DATAINFORMATION/DATAOVERLOADOVERLOAD
INFORMATION PROVIDERS
Newswires
Universities
CorporationsResearch Labs
InformationSystem
DataRepository
InformationSystem
INFORMATION CONSUMERSCorporations
Universities
People
GovernmentPrograms
User Query
User Query
User Query
arbitration between information consumers and providers for resolving
information impedance
INFORMATION BROKERINGINFORMATION BROKERING
InformationSystem
DataRepository
InformationSystem
InformationRequest
InformationRequest
InformationRequest
dynamic reinterpretation of information requests for determination of relevant
information services and products—
dynamic creation and composition of information products
Information Brokering: Three DimensionsInformation Brokering: Three Dimensions
S E M A N T I C SS E M A N T I C S
S T R U C T U R ES T R U C T U R E
S Y N T A XS Y N T A X
S Y S T E MS Y S T E M
C O N S U M E R SC O N S U M E R S
B R O K E R SB R O K E R S
P R O V I D E R SP R O V I D E R S
D A
T AD
A T A
M E T A
D A
T AM
E T A D
A T A
V O C
A B
U L A
R Y
V O C
A B
U L A
R Y
T H R E E D I M E N S I O N S
Objective:Objective: Reduce the problem of knowing structure and semantics of data in the huge
number of information sources on a global scale to: understanding andnavigating a significantly smaller number of domain ontologies
W W WW W W
a confusing heterogeneity of media,formats (Tower of Babel)
information correlation using physical (HREF)links at the extensional data level
location dependent browsing of informationusing physical (HREF) links
user has to keep track of information content !!
W W WW W W + Information Brokering + Information Brokering
Domain Specific Ontologies as “semantic conceptual views”
Information correlation using concept mappings at the intensional concept level
Browsing of information using terminological relationships across ontologies
Higher level of abstraction, closerto user view of information !!
What else can Information Brokering do?What else can Information Brokering do?
Concepts, tools and techniques to support semanticsConcepts, tools and techniques to support semantics
context
media-independentinformation correlations
semanticproximity inter-ontological
relations
ontologies(esp. domain-specific) profiles
domain-specific metadata
Tools to support semanticsTools to support semantics
• Context, context, contextContext, context, context
• Media-independent information correlations
• Multiple ontologies
– Semantic Proximity (relationships between concepts within and across ontologies) using domain, context, modeling/abstraction/representation, state
– Characterizing Loss of Information incurred due to differences in vocabulary
BIG challenge:BIG challenge: identifying relationship oridentifying relationship orsimilarity between objects of different media, similarity between objects of different media,
developed and managed by different persons and systemsdeveloped and managed by different persons and systems
Heterogeneity...Heterogeneity... … … is a Babel Tower!!is a Babel Tower!!
SEMANTIC INTEROPERABILITYSEMANTIC INTEROPERABILITY
metadata
ontologies
contexts
SEMANTIC HETEROGENEITYSEMANTIC HETEROGENEITY