current advances in data mining: multimedia data mining and

22
1 Information Management and Data mining Presented by: Dr. Herna L Viktor Others: Dr. Iluju Kiringa Dr. Thomas Tran Dr. Liam Peyton

Upload: tommy96

Post on 22-Apr-2015

1.086 views

Category:

Documents


8 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Current Advances in Data Mining: Multimedia Data Mining and

1

Information Management and Data mining

Presented by: Dr. Herna L Viktor Others: Dr. Iluju Kiringa

Dr. Thomas TranDr. Liam Peyton

Page 2: Current Advances in Data Mining: Multimedia Data Mining and

2

Information overload: “The amount of knowledge in the world has doubled in the past ten (10) years and is doubling every 18 months” American Society of Training and Documentation (ASTD)

Massive Petabytes (250) data repositories: E.g. it is estimated that Google maintains 4 Petabytes of RAM.

E-Commerce and the Web: A digital marketplace; eHealth Data sharing: data must be available “anywhere, any time, and

in almost any form” The “Digital Rosetta Stone”: Our digital heritage is in danger of

being lost due to the silent obsolesce of current technology

OUR RESEARCH: How do we share/store/preserve this data?

What information can we use to improve our decision making? How do we obtain/extract and explore the hidden knowledge?

Page 3: Current Advances in Data Mining: Multimedia Data Mining and

3

Information Management and Data Mining Research: Five Themes Data/Information Management:

(T1) Dr. Iluju Kiringa: Data Sharing (T2) Dr. Herna L Viktor: Relational and multimedia data

mining (T3) Dr. Thomas Tran: Software agents for e-Commerce (T4) Dr. Herna L Viktor: Long-term preservation of data (T5) Dr. Liam Peyton: Accessible data warehousing for e-

health

Page 4: Current Advances in Data Mining: Multimedia Data Mining and

4

(T1) Data Sharing: Dr. Iluju Kiringa

Data must be available “anywhere, any time, and in almost any form”; thus we must cope with

very large networks of data sources complex heterogeneity among the sources Inconsistent data across the sources data sharing and exchange between the sources etc.

Several applications illustrate this need Genomic data E-health Enterprise alliances

Page 5: Current Advances in Data Mining: Multimedia Data Mining and

5

Background and Goals:Dr. Iluju Kiringa

Background: data sharing on peer-to-peer networks P2P networks are open-ended networks of distributed

computational nodes (peers) Each peer can directly exchange data and/or services with

a set of other peers Peers act autonomously, including for joining/leaving Peers are not subject to global control in the form of global

registries, global services, global resource management, or global schema and data repository

Mostly used for sharing files (plain text, songs, movies, video, etc); some examples are

Napster, Gnitella, Kaaza: file sharing applications Seti@home: distributed computing application

Research Goal: Enhance data sharing on P2P networks to offer the same high quality

access to data that the classical distributed relational DBMSs offer

Page 6: Current Advances in Data Mining: Multimedia Data Mining and

6

Data Sharing Research Issues:Dr. Iluju Kiringa Heterogeneity management

Interoperability of peer databases Syntactic and semantics heterogeneity

Dynamics and scale management Protocols for peer databases to join/leave networks

Query processing via propagation Query propagation through the network Query optimization

Data coordination using update propagation distributed triggers

Transaction processing Design non-classical transaction models and correctness criteria Implement the models

Service-oriented architecture Design and compare several possible architecture for a peer DBMS Implement some of these architectures Deploy a real retwork

Applications Theory behind data sharing

Page 7: Current Advances in Data Mining: Multimedia Data Mining and

7

(T2) Data Mining:Dr. Herna L Viktor Multi-relational data mining and link mining

Aim to directly mine a relational database, without extensive preprocessing or “flattening”

Doctor Patient

medicine Illness

hasgives

name

MedId

…name

sin

name

patientid

address

name

…consult

Illidtakes

………

Page 8: Current Advances in Data Mining: Multimedia Data Mining and

8

Data Mining:Dr. Herna L Viktor Multimedia (2D and 3D) data mining

Searching for similarities in multimedia databases Locating clusters of images, 3D objects Classifying images, 3D objects within a cluster

Application Anthropometry (poster) Health care Cultural Heritage

Page 9: Current Advances in Data Mining: Multimedia Data Mining and

9

(T3) Software Agents in E-Commerce: Dr. Thomas Tran

The concept of an agent provides a convenient and powerful way to describe a complex software entity that is capable of acting with a certain degree of autonomy in order to accomplish tasks on behalf of its user.

An agent is defined in terms of its behavior.

Page 10: Current Advances in Data Mining: Multimedia Data Mining and

10

Supporting Decision Making:

Dr. Thomas Tran Designing Intelligent Business Software Agents for E-Commerce

Modeling Trust and Reputation in E-Commerce Developing Agent-Based Frameworks for

Mobile Business Designing Recommender Systems for E-

Commerce

Page 11: Current Advances in Data Mining: Multimedia Data Mining and

11

(T4) Long-term preservation of data:Dr. Herna L Viktor The “Digital Rosetta Stone”:

The life-time of a digital file is only a few decades We might need the digital file in 50+ years Our repositories may become “data morgues”, containing data

which are in formats that cannot be interpreted by present and future generations.

Towards a solution…

Page 12: Current Advances in Data Mining: Multimedia Data Mining and

12

Long-term preservation of data:Dr. Herna L Viktor

Research issues scalability of information and infrastructure managing heterogonous data sources handling updating of hardware and software transparent storage, management and retrieval

“to investigate effective ways to store, maintain and analyze digital objects over a very long period of time (50 years +) ”Approach:Detachment from original mediaTransparent migration to new technologiesEmulate old software on new technologies

Page 13: Current Advances in Data Mining: Multimedia Data Mining and

13

Long-term preservation of data:Dr. Herna L Viktor Architectural framework

Visualization,Exploration

Archiving

Retrieval, Trend Analysis

Data acquisition

Build metadata (index)

Store object and metadata

DBAAgent

IBM DB2Data Warehouse

Retrieve object, metadata(index)

Generate visual interface

Generate Data store

Page 14: Current Advances in Data Mining: Multimedia Data Mining and

14

(T5) Evolving E-Health Business Processes Around Accessible Data Warehouses:Dr. Liam Peyton Goals

Process improvement to take advantage of e-technologies and Data warehouse (DW)

Methodology to specify, automate, manage, and analyze DW-oriented, e-health processes

Addresses privacy, confidentiality, quality, and consent, as well as heavy legacy (and often manual) processes and regulatory environments

Activities Simulation of Ottawa Hospital Data Warehouse and environment Business Intelligence prototype – Infection control data mart,

Discharge process data mart Quality Assurance Framework and Portal

Page 15: Current Advances in Data Mining: Multimedia Data Mining and

15

Assessment Framework Tied to Operational Systems, Performance MGT & Data Warehouse Strategy

Business Systems & Processes

Use Case Maps Goals

Tasks

Performance Mgt Systems & Processes

DataWarehouse

PIQ measures the effectiveness of Reports to measure effectiveness of Organization in meetings its goals.

Stakeholders

Reports PIQ

Page 16: Current Advances in Data Mining: Multimedia Data Mining and

16

In Summary: Vast, evolving repositories…

Page 17: Current Advances in Data Mining: Multimedia Data Mining and

17

Google in 2003 had between 2 and 5 petabytes of hard-disk storage. A more recent calculation, dated June 27, 2006, suggests that the Google cluster may now have 4 petabytes of RAM, on the same order of magnitude as the quantity of hard disk space that was estimated only three years earlier.

As of October 15, 2005, all the files being shared on Kazaa totaled around 54 petabytes.

15 petabytes of data will be generated each year in particle physics experiments using CERN’s Large Hadron Collider, due to be launched in 2007

In 2007, NOAA maintains approximately 1 Petabyte of climate data. NOAA expects that their Comprehensive Large Array-data Stewardship System (CLASS) library will hold 20 Petabyte of data by 2011, 140 Petabyte by 2020

Page 18: Current Advances in Data Mining: Multimedia Data Mining and

18

In Summary: Vast, evolving repositories…

Our research aims to develop new, efficient ways to manage, share and analyze such data

Page 19: Current Advances in Data Mining: Multimedia Data Mining and

19

Graduate students:Dr Thomas Tran Grad Students:

Richong Zhang (PhD) Zhiyong Weng (MCS) Vikas Kumar (MCS) Xiaoguang Ma (MCS) Tapu Kumar Ghose (MCS) Catherine Cormier (MSc) Hong Chen (MSc) Bo Zhan (MCS, co-supervised with Prof. Liam Peyton) Yao Gu (MCS, part time)

Page 20: Current Advances in Data Mining: Multimedia Data Mining and

20

Graduate students and their projects:Dr. Herna L Viktor Hongyu Guo (PhD): Multi-view learning

Rana Awada (PhD): XML database mining (prelim) Nadia Azam (M.Sc.): Link-based clustering Bo Wang (M.Sc.): A storage resource broker agent for long-term

preservation Divine Muhivu (M.Sc.): Data integration through link mining Isis Pena Sanchez (M.Sc): Interestingness mesaurements for

data mining Minjie Shao (M.Sc.): Mining the adverse effects of medication Xiaomei Xia (M.Sc.): Distributed data warehouse query

processing Joining us: Julie Doyle, PhD- Long-term preservation of data Collaborations: NRC, Faculty of Management

Page 21: Current Advances in Data Mining: Multimedia Data Mining and

21

Graduate students:Dr. Liam Peyton Masters Students:

Sepideh Ghanavati Pierre Seguin Bo Zhan

Collaboration with Prof. Daniel Amyot (Ottawa) Prof. Greg Richards (Ottawa) Prof. Michael Weiss (Carleton) Dr. Alan Forster (Ottawa Hospital)

Page 22: Current Advances in Data Mining: Multimedia Data Mining and

22

Graduate students and collaborations:Dr. Iluju Kiringa Have implemented an experimental peer DBMS This is joint work with

Renee Miller (Toronto) John Mylopoulos (Toronto & Trento) Vasiliki Kantere (Athens -- NTUA) Anastasios Kementsietsidis (Edinburgh) Several students in Toronto

Lei Jiang Dan Zhao Patricia Rodriguez

and Ottawa: Mehedi Masud Anisur Rahman Irfan Maki Several alumni …

More (strong) students are needed !!!!! Here is a link to visit: http://www.cs.toronto.edu/db/hyperion