2nd semantic web mining workshop at ecml/pkdd-2002, august 2002, helsinki, finland data fusion and...
TRANSCRIPT
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Data Fusion and Semantic Web: Meta-Models of Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision FusionDistributed Data and Decision Fusion..
Project ReportProject Report
Vladimir Gorodetski,Oleg Karsaev,
Vladimir SamoilovIntelligent System Laboratory of the
St. Petersburg Institute for Informatics and Automation
E-mail: {gor, ok, samovl}@mail.iias.spb.suhttp://space.iias.spb.su/ai/english/gorodetski.htm
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Title of the ProjectTitle of the Project
“Autonomous Information Collection, Knowledge Discovery Techniques and Software Tool
Prototype for Knowledge-Based Data Fusion”
Project from European Office of Aerospace Research and
Development (EOARD) –AFRL/IF (USA)(December 2000 - December 2003)
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Outline of the Project PresentationOutline of the Project Presentation
1. Outline of the Data and Information Fusion problems
2. Project research objectives
3. Examples of case studies and applications used
4. Ontology-centered meta-model of data sources
5. Meta-model of decision fusion
6. Multi-agent architecture
7. Conclusion
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Tasks and Applications of Data and Information Fusion Tasks and Applications of Data and Information Fusion
Application Fields
Critical areas of human society security, life support, security of critical state infrastructures, large-scale logistics, natural and man-made disasters, etc.
Examples of Applications Assessment and prediction of situations, Resource management and rescue operation planning in
large scale natural and man-made disasters, Decision making and planning of rescue operations in
systems like US 911, Situational awareness and prediction for terrorist intents and anti-terrorist activity planning,
Military situation assessment, Safeguard of critical plants like nuclear power stations,
electrical power grids, etc.
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Information Fusion-DefinitionInformation Fusion-Definition
“…data fusion is a formal framework in which means and tools for the alliance of data originating from different sources are expressed. It aims at obtaining information of greater quality; the exact definition of “greater quality” will depend on the application” (JDL-Joint Directors of Laboratories model, USAF)
Level 1-Object assessment
Level 2- Situation assessment
Level 3- Impact assessment
Dis
trib
uted
dat
a so
urce
s
Level 4-Process
refinement
Data Base Management System
Support DB Fusion DB
Level 5-User refinement
Distributed information
sources
Human-Computer interface
Sensor management, resource management
(Erik Blash, Fusion-2002, July, 2002, Annapolis, USA)
Areas of the Areas of the current and Future current and Future research projects research projects
are yellowedare yellowed
Sensor 1
Sensor 2
Sensor N
…
Level 0-Pre-processing of sensor data
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Project Research ObjectivesProject Research Objectives
Development of DF software tool providing support for design (first of all, for learning!) and implementation of DF applications of broad spectrum, in particular, providing support for :
Development of ontology-based meta-models of data sources, meta-model of decision fusion and conceptual model of DF software tool,
Development of Multi-agent architecture and Design and implementation of applications of broad
spectrum.
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Examples of case studies and application used in Examples of case studies and application used in ProjectsProjects
Case studiesCase studies -KDD Cup99 dataset -- Preprocessed relational data
specifying Intrusion Detection task http://kdd.ics.uci.edu/databases/kddcup99.html -Landsat Multi-Spectral Scanner image dataset http://www.dfc-grss.org/data/grss_dfc_0010.zip
-STULONG dataset– Longitudinal Study of Atherosclerosis Risk Factors
http://euromise.vse.cz/challenge/en/projekt/index.php
ApplicationApplication to be used in debugging and validation of MAS DK-DF - Intrusion detection learning system (Project also funded by EOARD/AFRL)
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Subtasks of the Project matching Semantic Web Mining areaSubtasks of the Project matching Semantic Web Mining area
1. Design and implementation of meta-model of data sources caused by heterogeneity and distribution of data to be fused.
2. Design and implementation of meta-model of distributed learning.
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Multiplicity of Data Sources Presenting User’s Activity in Multiplicity of Data Sources Presenting User’s Activity in Intrusion Detection systemIntrusion Detection system
Host-based sources
System program 3
Network-based sources
IPHeader
UDP/TCPHeader
FTP Data
Network Traffic
TELNET Data
SMTP Data
HTTP Data
DNS Data
…………
Network Packet
IPHeader
ICMP Header
…
TCPDUMP (WINDUMP)
Network PacketNetwork PacketNetwork PacketNetwork PacketNetwork Packet
DNS service
HTTP service
Telnet service
FTP service
Tcpdump
HTTP log
Mail log
FTP log
Filtered OS audit trailAuditing subsystem of OS
System program 2
System program 1
Log of commands run by users plus resource
Log of all login failures
Log of all user logins/logouts and system startups and shutdowns
SPP - Statistical processing program
SPP
OS audit trail statistical data
Tcpdump statistical
data
SPP
statistical data set 1SPP
statistical data set 2SPP
statistical data set 3SPP
FTP statistical dataSPP
Telnet log Telnet statistical dataSPP
Mail statistical dataSPPMail service
SPP HTTP statistical data
SPP DNS statistical dataDNS log
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Interrelation of Semantic Web and Ontology-oriented Interrelation of Semantic Web and Ontology-oriented Research within the ProjectResearch within the Project
Semantic Web considers development and standardization of the ontology specification languages (XML, RDF, DAML+OIL), ontology-based query languages, ontology editors, etc). Semantic Web Mining considers specific problems of ontology design technology for (Web-based) Data Mining systems. Any DF system technology supposes (Web-based) distributed Data Mining and KDD and that is why it is a sub-area of the Semantic Web Mining. Ontology-based Data and Information Fusion system design put a number of specific problems of technological sort. Among them, the most important one is a technology for distributed design of distributed ontology.
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
What is distributed design of distributed ontology?What is distributed design of distributed ontology? Data Sources Meta-model Data Sources Meta-model
Data SourceSensor
Data Source Manager
Data Source management
agent
Data SourceSensorData Source management
agent
Data Source Sensor
Data Source management
agent
Data Source Sensor
Data Source Manager
Meta-data manager
“KDD Master”Agent
Ontology-based meta-model of Data sources
Data Source Manager
Data Source Manager
Data Source management
agent
…….
Meta-model =Ontology + Data source models at meta-level supporting a unified view of data of particular sources
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
DF system ontologyDF system ontology
DF Problem ontology
…Private component of
application ontology of data
source 1
Private component of
application ontology of data
source k
Private component of
application ontology of data
source 2
Tower of DF application ontology components
Shared component of Application ontology
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Agent 1
Agent 2 Agent 3
Agent k
Distributed Ontology and Protocols for Distributed Distributed Ontology and Protocols for Distributed Ontology DesignOntology Design
“KDD Master”Agent
Problem and shared components of
application ontology
Data Source 2
Data Source 1
Shared component of application ontology
Private component of application ontology-3
Shared component of application ontology
Private component of application ontology-k
DS- 1 management
agent
KDD agent of source 1
DS- 2 management
agent
KDD agent of source 2
Meta-level KDD
Agent
Data Source k
Shared component of application ontology
Private component of application ontology-3
Shared component of application ontology
Private component of application ontology-k
Data Source 3
DS- 3 management
agent
KDD agent of source 3
DS- k management
agent
KDD agent of source k
Protocols, Functions
Protocols, Functions
Protocols, Functions
Protocols, Functions
Protocols, Functions
…….
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Particular Tasks to Be Solved on the Basis of Meta-Particular Tasks to Be Solved on the Basis of Meta-model of Data Sourcesmodel of Data Sources
• Providing for monosemantic understanding of terminology used in data specification by distributed analysts;
• Solution of the entity identification problem;• Providing consistency of data representation (in case if
the same attributes are presented differently in different data sources);
• Providing a gateway between ontology and distributed databases accessibility making possible interaction between ontology and distributed databases, and several other tasks.
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Meta-model of Data Sources: Meta-model of Data Sources: Ontology + Protocols =>=> Monosemantic understanding of terminologyMonosemantic understanding of terminology
Monosemantic understanding of terminology among DF system components is provided by shared vocabulary used by DF system distributed entities for communication. This excludes different naming of the same entities and their properties in different sources, and equal naming of different entities within different data sources thus providing integrity and consistency of shared vocabulary.
Protocols Supports distributed collaborative design of coherent ontology by distributed analysts.
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Example of Application Ontology: Example of Application Ontology: High-level Part of Intrusion Detection Domain OntologyHigh-level Part of Intrusion Detection Domain Ontology
Network attackA
ReconnaissanceR
Implantation and threat realization
I
Collection of Information
Identification of hosts
Identification of services
Identification of OS Resource
Enumeration
Users and Groups
EnumerationIH
IS
IO
CI
RE
UE
Applications and Banners Enumeration
ABE
Escalating Privilege Threat
Realization
Covering Tracks
Getting Access to Resources
GARER GAD
TR
CT
Creating Back Doors
CBD
Gaining Additional
Data
Network Ping Sweeps
DC Port ScanningSPIH
TCP connect scan
ST
TCP SYN scan
SSNotions of micro-layer
TCP FIN scan
SF
TCP Xmas Tree scan
SX
Proxy scanning
Dumb host scan
Scanning 'FTP
Bounce'
PS
DHS
SFB
TCP Null scan
SN
Half scan
HS
UDP scan
SU "Part of" relationship
N o t i o n s o f l o w e r l e v e l s“Subclass of" relationship
CD ID DOS
Confidentiality destruction
Integrity destructio
n
Denial of Service
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
The Simplest ("top-down") Meta-protocol for The Simplest ("top-down") Meta-protocol for Collaborative Ontology DesignCollaborative Ontology Design
Source 1. Local source expert
Source 1: Data preparation agent
Source N: Data preparation agent
Meta-data description agent
Application domain expert
Source N: Local source expert
Forming the basic variant of ontology
Sending the basic variant
Sending the basic variant
Analysis of the suggested basic variant
Analysis of the suggested basic variant
Modifying and expanding the ontology
Synchronization of modifications by the basic protocol
Modifying and expanding the ontology
Synchronization of modifications by the basic protocol…
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Ontology Synchronization Protocol Represented in Ontology Synchronization Protocol Represented in Terms of UML-sequence DiagramTerms of UML-sequence Diagram
1 2 3 4 5 6 7 8 9Current state
readingRequest for required ontology descriptions
Unconfirmed changes buffer query
Representation of current state of ontology
Changes of ontology Recording the changes
Forming the current representation of ontology
Forming the current representation of ontology
Representation of current state of ontology
Sending current changes to the shared ontology
Periodic request for suggested changes
Verification of changes
Introducing changes
Confirmation/rejection of suggested changes
Deletion of verified changes
Adding changes to ontologyIntroducing of changes
Deletion of verified changes
Legend:1. Local source expert2. Local source data managing agent3. Local source ontology 4. Local source: buffer of temporary changes 5. KDD master (Meta- data description agent)6. Shared ontology 7. Meta-level agent: buffer of temporary changes 8. Application expert (meta-level)9. Local source determining the modified ontology part
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Meta-model of Data Sources: Meta-model of Data Sources: Entity Identification ProblemEntity Identification Problem
# of case Attributes of Data
Source 2
1
4
5
9
11
12
14
15
17
19
# of case
Attributes of Data source 3
1
2
4
8
9
11
14
15
# of case
Attributes of Data source 1
1
3
4
7
9
11
15
19
Data Source 1
Data Source 3
Data Source 2
Explanation of Entity Identification Problem
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Demonstration of Entity Identification Problem: Intrusion Demonstration of Entity Identification Problem: Intrusion Detection ApplicationDetection Application
…………………………………………………………………………………………
Network-based sources
IP
Hdr
TCP Hdr
FTP Data
FTP Data
FTP Data
FTP Data
…………… IP
HdrTCP Hdr
(SYN)IP
HdrTCP Hdr
(ACK)IP
HdrTCP Hdr
(FIN)
Connection N
Host-based sources
TCPDUMP (WINDUMP)
Tcpdump
SPP
Tcpdump statistical data on Connection 1
Tcpdump statistical data on Connection N…
IP
Hdr
TCP Hdr
SMTP Data
SMTP Data
SMTP Data
SMTP Data
…………… IP
HdrTCP Hdr
(SYN)IP
HdrTCP Hdr
(ACK)IP
HdrTCP Hdr
(FIN)
Connection 1
Mail log Mail statistical data on Connection 1SPPMail service
FTP service FTP log FTP statistical data on Connection NSPP
System program 3
Filtered OS audit trailAuditing subsystem of OS
System program 2
System program 1
Log of commands run by users plus resource
Log of all user logins/logouts and system startups and shutdowns
OS audit trail statistical data on Connection 1SPP
statistical data on Connection 1
SPP
OS audit trail statistical data on Connection N
statistical data on Connection N
statistical data on Connection 1
SPP statistical data on Connection N
Case 1
Case N
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
A Technique for Entity Identification ProblemA Technique for Entity Identification Problem
In the DF problem ontology, for each instance of an object to be classified, the notion of entity identifier ("ID entity") is introduced. This entity identifier plays the role of the primary key of the instance (in analogy with the primary key of a table).
For each such identifier, a rule as a component of the shared part of application ontology is defined, which can be used to calculate the value of the instance key. A rule is a function which arguments are chosen from the set of this entity attributes. A rule is defined for each local data source to uniquely connect the entity identifier and the local primary key in this source. This rule specifies:
how to derive the local primary key of instance from the entity identifier value;
how to derive the entity identifier value from the value of the local primary key of an instance of the source.
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Meta-model of Data Sources: Meta-model of Data Sources: Diversity of Measurement Diversity of Measurement Scales of the Same Attributes in Different Data SourcesScales of the Same Attributes in Different Data Sources
Let X be an attribute in application ontology that is measureddifferently in different sources.
1. In the shared component of application ontology, the type and the measurement unit of the attribute X are determined. Selection of attribute X specification within shared part of application ontology is made by experts during negotiations according to a synchronization protocol.
2. In all the sources where X is present, expressions are determined for this attribute, through which it can further be converted into the same scale in all the sources.
This allows using the values of attributes on the meta-level regardless of the data source from which they originated.
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Meta-model of Data Sources: Meta-model of Data Sources: Interaction of Ontology Interaction of Ontology and Databases of Sourcesand Databases of Sources
The task arises due to the fact that application ontology entities are specified in terms of ontology notions but their instances are represented in terms of database language.
To provide interaction of ontology and databases of sources (accessibility of data requested in ontology terms) , a special gateway is developed.
Application
Access via VIEW objects
Database objectsLocal data source
Client-gateway
DF Application
ontologyLocal source data
properties
DF problem ontology
DF problem ontology
Three-level hierarchy of access to the database objects
DF application ontology
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Meta-model of Distributed LearningMeta-model of Distributed Learning
Components of meta-model of distributed learning:• Meta-model of decision making and combining
decisions of multiple base-level classifiers;• Model of distributed data management (allocation
training and testing data sets for learning particular classifiers; management by computation of meta-data for upper level example-based learning, etc.);
• Approaches and formal techniques used for combining decisions.
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
Conclusion: Future workConclusion: Future work
.
1. Development of sophisticated ontology editor supporting distributed design of a distributed ontology.
2. Further design and Implementation of Data Fusion System software tool for development and implementation of particular distributed applications in Data Fusion area.
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland
For more information and related publications please contact
E-mail: [email protected]
http://space.iias.spb.su/ai/english/gorodetski.htm
AcknowledgementThis research is funded by
AFRL/IF (EOARD), 1999-2003
Thank you!Thank you!