2nd semantic web mining workshop at ecml/pkdd-2002, august 2002, helsinki, finland data fusion and...

26
2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Data Fusion and Semantic Web: Meta-Models Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision Fusion of Distributed Data and Decision Fusion . . Project Report Project Report Vladimir Gorodetski, Oleg Karsaev, Vladimir Samoilov Intelligent System Laboratory of the St. Petersburg Institute for Informatics and Automation E-mail: {gor, ok, samovl}@mail.iias.spb.su http://space.iias.spb.su/ai/english/gorodetski.htm

Upload: darleen-mcdowell

Post on 26-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Data Fusion and Semantic Web: Meta-Models of Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision FusionDistributed Data and Decision Fusion..

Project ReportProject Report

Vladimir Gorodetski,Oleg Karsaev,

Vladimir SamoilovIntelligent System Laboratory of the

St. Petersburg Institute for Informatics and Automation

E-mail: {gor, ok, samovl}@mail.iias.spb.suhttp://space.iias.spb.su/ai/english/gorodetski.htm

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Title of the ProjectTitle of the Project

“Autonomous Information Collection, Knowledge Discovery Techniques and Software Tool

Prototype for Knowledge-Based Data Fusion”

Project from European Office of Aerospace Research and

Development (EOARD) –AFRL/IF (USA)(December 2000 - December 2003)

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Outline of the Project PresentationOutline of the Project Presentation

1. Outline of the Data and Information Fusion problems

2. Project research objectives

3. Examples of case studies and applications used

4. Ontology-centered meta-model of data sources

5. Meta-model of decision fusion

6. Multi-agent architecture

7. Conclusion

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Tasks and Applications of Data and Information Fusion Tasks and Applications of Data and Information Fusion

Application Fields

Critical areas of human society security, life support, security of critical state infrastructures, large-scale logistics, natural and man-made disasters, etc.

Examples of Applications Assessment and prediction of situations, Resource management and rescue operation planning in

large scale natural and man-made disasters, Decision making and planning of rescue operations in

systems like US 911, Situational awareness and prediction for terrorist intents and anti-terrorist activity planning,

Military situation assessment, Safeguard of critical plants like nuclear power stations,

electrical power grids, etc.

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Information Fusion-DefinitionInformation Fusion-Definition

“…data fusion is a formal framework in which means and tools for the alliance of data originating from different sources are expressed. It aims at obtaining information of greater quality; the exact definition of “greater quality” will depend on the application” (JDL-Joint Directors of Laboratories model, USAF)

Level 1-Object assessment

Level 2- Situation assessment

Level 3- Impact assessment

Dis

trib

uted

dat

a so

urce

s

Level 4-Process

refinement

Data Base Management System

Support DB Fusion DB

Level 5-User refinement

Distributed information

sources

Human-Computer interface

Sensor management, resource management

(Erik Blash, Fusion-2002, July, 2002, Annapolis, USA)

Areas of the Areas of the current and Future current and Future research projects research projects

are yellowedare yellowed

Sensor 1

Sensor 2

Sensor N

Level 0-Pre-processing of sensor data

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Project Research ObjectivesProject Research Objectives

Development of DF software tool providing support for design (first of all, for learning!) and implementation of DF applications of broad spectrum, in particular, providing support for :

Development of ontology-based meta-models of data sources, meta-model of decision fusion and conceptual model of DF software tool,

Development of Multi-agent architecture and Design and implementation of applications of broad

spectrum.

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Examples of case studies and application used in Examples of case studies and application used in ProjectsProjects

Case studiesCase studies -KDD Cup99 dataset -- Preprocessed relational data

specifying Intrusion Detection task http://kdd.ics.uci.edu/databases/kddcup99.html -Landsat Multi-Spectral Scanner image dataset http://www.dfc-grss.org/data/grss_dfc_0010.zip

-STULONG dataset– Longitudinal Study of Atherosclerosis Risk Factors

http://euromise.vse.cz/challenge/en/projekt/index.php

ApplicationApplication to be used in debugging and validation of MAS DK-DF - Intrusion detection learning system (Project also funded by EOARD/AFRL)

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Subtasks of the Project matching Semantic Web Mining areaSubtasks of the Project matching Semantic Web Mining area

1. Design and implementation of meta-model of data sources caused by heterogeneity and distribution of data to be fused.

2. Design and implementation of meta-model of distributed learning.

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Multiplicity of Data Sources Presenting User’s Activity in Multiplicity of Data Sources Presenting User’s Activity in Intrusion Detection systemIntrusion Detection system

Host-based sources

System program 3

Network-based sources

IPHeader

UDP/TCPHeader

FTP Data

Network Traffic

TELNET Data

SMTP Data

HTTP Data

DNS Data

…………

Network Packet

IPHeader

ICMP Header

TCPDUMP (WINDUMP)

Network PacketNetwork PacketNetwork PacketNetwork PacketNetwork Packet

DNS service

HTTP service

Telnet service

FTP service

Tcpdump

HTTP log

Mail log

FTP log

Filtered OS audit trailAuditing subsystem of OS

System program 2

System program 1

Log of commands run by users plus resource

Log of all login failures

Log of all user logins/logouts and system startups and shutdowns

SPP - Statistical processing program

SPP

OS audit trail statistical data

Tcpdump statistical

data

SPP

statistical data set 1SPP

statistical data set 2SPP

statistical data set 3SPP

FTP statistical dataSPP

Telnet log Telnet statistical dataSPP

Mail statistical dataSPPMail service

SPP HTTP statistical data

SPP DNS statistical dataDNS log

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Interrelation of Semantic Web and Ontology-oriented Interrelation of Semantic Web and Ontology-oriented Research within the ProjectResearch within the Project

Semantic Web considers development and standardization of the ontology specification languages (XML, RDF, DAML+OIL), ontology-based query languages, ontology editors, etc). Semantic Web Mining considers specific problems of ontology design technology for (Web-based) Data Mining systems. Any DF system technology supposes (Web-based) distributed Data Mining and KDD and that is why it is a sub-area of the Semantic Web Mining. Ontology-based Data and Information Fusion system design put a number of specific problems of technological sort. Among them, the most important one is a technology for distributed design of distributed ontology.

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

What is distributed design of distributed ontology?What is distributed design of distributed ontology? Data Sources Meta-model Data Sources Meta-model

Data SourceSensor

Data Source Manager

Data Source management

agent

Data SourceSensorData Source management

agent

Data Source Sensor

Data Source management

agent

Data Source Sensor

Data Source Manager

Meta-data manager

“KDD Master”Agent

Ontology-based meta-model of Data sources

Data Source Manager

Data Source Manager

Data Source management

agent

…….

Meta-model =Ontology + Data source models at meta-level supporting a unified view of data of particular sources

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

DF system ontologyDF system ontology

DF Problem ontology

…Private component of

application ontology of data

source 1

Private component of

application ontology of data

source k

Private component of

application ontology of data

source 2

Tower of DF application ontology components

Shared component of Application ontology

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Agent 1

Agent 2 Agent 3

Agent k

Distributed Ontology and Protocols for Distributed Distributed Ontology and Protocols for Distributed Ontology DesignOntology Design

“KDD Master”Agent

Problem and shared components of

application ontology

Data Source 2

Data Source 1

Shared component of application ontology

Private component of application ontology-3

Shared component of application ontology

Private component of application ontology-k

DS- 1 management

agent

KDD agent of source 1

DS- 2 management

agent

KDD agent of source 2

Meta-level KDD

Agent

Data Source k

Shared component of application ontology

Private component of application ontology-3

Shared component of application ontology

Private component of application ontology-k

Data Source 3

DS- 3 management

agent

KDD agent of source 3

DS- k management

agent

KDD agent of source k

Protocols, Functions

Protocols, Functions

Protocols, Functions

Protocols, Functions

Protocols, Functions

…….

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Particular Tasks to Be Solved on the Basis of Meta-Particular Tasks to Be Solved on the Basis of Meta-model of Data Sourcesmodel of Data Sources

• Providing for monosemantic understanding of terminology used in data specification by distributed analysts;

• Solution of the entity identification problem;• Providing consistency of data representation (in case if

the same attributes are presented differently in different data sources);

• Providing a gateway between ontology and distributed databases accessibility making possible interaction between ontology and distributed databases, and several other tasks.

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Meta-model of Data Sources: Meta-model of Data Sources: Ontology + Protocols =>=> Monosemantic understanding of terminologyMonosemantic understanding of terminology

Monosemantic understanding of terminology among DF system components is provided by shared vocabulary used by DF system distributed entities for communication. This excludes different naming of the same entities and their properties in different sources, and equal naming of different entities within different data sources thus providing integrity and consistency of shared vocabulary.

Protocols Supports distributed collaborative design of coherent ontology by distributed analysts.

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Example of Application Ontology: Example of Application Ontology: High-level Part of Intrusion Detection Domain OntologyHigh-level Part of Intrusion Detection Domain Ontology

Network attackA

ReconnaissanceR

Implantation and threat realization

I

Collection of Information

Identification of hosts

Identification of services

Identification of OS Resource

Enumeration

Users and Groups

EnumerationIH

IS

IO

CI

RE

UE

Applications and Banners Enumeration

ABE

Escalating Privilege Threat

Realization

Covering Tracks

Getting Access to Resources

GARER GAD

TR

CT

Creating Back Doors

CBD

Gaining Additional

Data

Network Ping Sweeps

DC Port ScanningSPIH

TCP connect scan

ST

TCP SYN scan

SSNotions of micro-layer

TCP FIN scan

SF

TCP Xmas Tree scan

SX

Proxy scanning

Dumb host scan

Scanning 'FTP

Bounce'

PS

DHS

SFB

TCP Null scan

SN

Half scan

HS

UDP scan

SU "Part of" relationship

N o t i o n s o f l o w e r l e v e l s“Subclass of" relationship

CD ID DOS

Confidentiality destruction

Integrity destructio

n

Denial of Service

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

The Simplest ("top-down") Meta-protocol for The Simplest ("top-down") Meta-protocol for Collaborative Ontology DesignCollaborative Ontology Design

Source 1. Local source expert

Source 1: Data preparation agent

Source N: Data preparation agent

Meta-data description agent

Application domain expert

Source N: Local source expert

Forming the basic variant of ontology

Sending the basic variant

Sending the basic variant

Analysis of the suggested basic variant

Analysis of the suggested basic variant

Modifying and expanding the ontology

Synchronization of modifications by the basic protocol

Modifying and expanding the ontology

Synchronization of modifications by the basic protocol…

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Ontology Synchronization Protocol Represented in Ontology Synchronization Protocol Represented in Terms of UML-sequence DiagramTerms of UML-sequence Diagram

1 2 3 4 5 6 7 8 9Current state

readingRequest for required ontology descriptions

Unconfirmed changes buffer query

Representation of current state of ontology

Changes of ontology Recording the changes

Forming the current representation of ontology

Forming the current representation of ontology

Representation of current state of ontology

Sending current changes to the shared ontology

Periodic request for suggested changes

Verification of changes

Introducing changes

Confirmation/rejection of suggested changes

Deletion of verified changes

Adding changes to ontologyIntroducing of changes

Deletion of verified changes

Legend:1. Local source expert2. Local source data managing agent3. Local source ontology 4. Local source: buffer of temporary changes 5. KDD master (Meta- data description agent)6. Shared ontology 7. Meta-level agent: buffer of temporary changes 8. Application expert (meta-level)9. Local source determining the modified ontology part

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Meta-model of Data Sources: Meta-model of Data Sources: Entity Identification ProblemEntity Identification Problem

# of case Attributes of Data

Source 2

1

4

5

9

11

12

14

15

17

19

# of case

Attributes of Data source 3

1

2

4

8

9

11

14

15

# of case

Attributes of Data source 1

1

3

4

7

9

11

15

19

Data Source 1

Data Source 3

Data Source 2

Explanation of Entity Identification Problem

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Demonstration of Entity Identification Problem: Intrusion Demonstration of Entity Identification Problem: Intrusion Detection ApplicationDetection Application

…………………………………………………………………………………………

Network-based sources

IP

Hdr

TCP Hdr

FTP Data

FTP Data

FTP Data

FTP Data

…………… IP

HdrTCP Hdr

(SYN)IP

HdrTCP Hdr

(ACK)IP

HdrTCP Hdr

(FIN)

Connection N

Host-based sources

TCPDUMP (WINDUMP)

Tcpdump

SPP

Tcpdump statistical data on Connection 1

Tcpdump statistical data on Connection N…

IP

Hdr

TCP Hdr

SMTP Data

SMTP Data

SMTP Data

SMTP Data

…………… IP

HdrTCP Hdr

(SYN)IP

HdrTCP Hdr

(ACK)IP

HdrTCP Hdr

(FIN)

Connection 1

Mail log Mail statistical data on Connection 1SPPMail service

FTP service FTP log FTP statistical data on Connection NSPP

System program 3

Filtered OS audit trailAuditing subsystem of OS

System program 2

System program 1

Log of commands run by users plus resource

Log of all user logins/logouts and system startups and shutdowns

OS audit trail statistical data on Connection 1SPP

statistical data on Connection 1

SPP

OS audit trail statistical data on Connection N

statistical data on Connection N

statistical data on Connection 1

SPP statistical data on Connection N

Case 1

Case N

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

A Technique for Entity Identification ProblemA Technique for Entity Identification Problem

In the DF problem ontology, for each instance of an object to be classified, the notion of entity identifier ("ID entity") is introduced. This entity identifier plays the role of the primary key of the instance (in analogy with the primary key of a table).

For each such identifier, a rule as a component of the shared part of application ontology is defined, which can be used to calculate the value of the instance key. A rule is a function which arguments are chosen from the set of this entity attributes. A rule is defined for each local data source to uniquely connect the entity identifier and the local primary key in this source. This rule specifies:

how to derive the local primary key of instance from the entity identifier value;

how to derive the entity identifier value from the value of the local primary key of an instance of the source.

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Meta-model of Data Sources: Meta-model of Data Sources: Diversity of Measurement Diversity of Measurement Scales of the Same Attributes in Different Data SourcesScales of the Same Attributes in Different Data Sources

Let X be an attribute in application ontology that is measureddifferently in different sources.

1. In the shared component of application ontology, the type and the measurement unit of the attribute X are determined. Selection of attribute X specification within shared part of application ontology is made by experts during negotiations according to a synchronization protocol.

2. In all the sources where X is present, expressions are determined for this attribute, through which it can further be converted into the same scale in all the sources.

This allows using the values of attributes on the meta-level regardless of the data source from which they originated.

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Meta-model of Data Sources: Meta-model of Data Sources: Interaction of Ontology Interaction of Ontology and Databases of Sourcesand Databases of Sources

The task arises due to the fact that application ontology entities are specified in terms of ontology notions but their instances are represented in terms of database language.

To provide interaction of ontology and databases of sources (accessibility of data requested in ontology terms) , a special gateway is developed.

Application

Access via VIEW objects

Database objectsLocal data source

Client-gateway

DF Application

ontologyLocal source data

properties

DF problem ontology

DF problem ontology

Three-level hierarchy of access to the database objects

DF application ontology

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Meta-model of Distributed LearningMeta-model of Distributed Learning

Components of meta-model of distributed learning:• Meta-model of decision making and combining

decisions of multiple base-level classifiers;• Model of distributed data management (allocation

training and testing data sets for learning particular classifiers; management by computation of meta-data for upper level example-based learning, etc.);

• Approaches and formal techniques used for combining decisions.

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

Conclusion: Future workConclusion: Future work

.

1. Development of sophisticated ontology editor supporting distributed design of a distributed ontology.

2. Further design and Implementation of Data Fusion System software tool for development and implementation of particular distributed applications in Data Fusion area.

2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland

For more information and related publications please contact

E-mail: [email protected]

http://space.iias.spb.su/ai/english/gorodetski.htm

AcknowledgementThis research is funded by

AFRL/IF (EOARD), 1999-2003

Thank you!Thank you!