data mining with aura jim austin university of york & cybula ltd

28
Data Mining with AURA Jim Austin University of York & Cybula Ltd

Upload: owen-simmons

Post on 28-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining with AURA Jim Austin University of York & Cybula Ltd

Data Mining with AURA

Jim Austin

University of York

&

Cybula Ltd

Page 2: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 2

Overview• AURA

• Background to AURA• Brief overview of its components• Its implementation

• AURA within UK e-Science• What is e-Science• The DAME pilot project• Use of AURA in DAME

• GRID issues in DM

Page 3: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 3

The AURA Technology

• Neural network based associative storage

• Set of tools to build fast pattern recognition systems

• Aimed at unstructured data

• Aimed at large datasets

• Scaleable technology

Page 4: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 4

AURA as a basis for search

• The game is to remove the chaff using AURA.

• Later processes find the exact match.

Page 5: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 5

The storage system

• Correlation Matrix Memory based

• Exploits threshold logic methods

• Uses distributed encoding of information

• Implemented using binary ‘weights’ for efficient software and hardware implementation

Page 6: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 6

Threshold, T

weights ( )

Inputs

M

P

R

Page 7: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 7

Why is it fast?

• Access only rows that are activated by inputs.

• Inputs are made as sparse as possible and fixed weight.

• Only need to sum over active rows (bit vectors) – ideal for most processors

• Great for bit vector machines (DAP!).

Page 8: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 8

Use of the CMM

Data

CMM systemQuery Data subset

Slow algorithm

Final data

Page 9: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 9

CMM system

Pre-processOperations

Prepare data

CMM system Post process

Page 10: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 10

Pre-processing

• Implements a number of pre-processors– N-grams for text strings– CMAC for numeric data– Graphs for images and graphics– Tokens for logical data– Quantisation for time series

Page 11: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 11

Post processing

• Data selected by the CMM must be accessed quickly.

• Uses ‘best bit index’ method to match output data and recover stored data.

Page 12: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 12

Implementation

• The AURA C++ library

• Implemented on PC or workstation

• Beowulf parallel cluster

• Origin 2000 supercomputer

• Bespoke hardware

Page 13: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 13

AURA parallel implementation 28 dedicated PCI based processors

Beowulf configuration3.5Gb memory size

Cortex-1

Page 14: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 14

UK eScience

• Aims to build on the concept of Grids– To make computing and data provision as

direct and simple as electrical power delivery

• £110M initiative started 18 months ago

• DAME is a £3.5M pilot project to demonstrate its application in the engineering field.

Page 15: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 15

DAME Objectives

• DAME: Distributed Aircraft Maintenance Environment.

• Demonstrate diagnostic capability on the GRID

• Examine timeliness properties of the GRID

• Demonstrate on the RR Aeroengine diagnostic problem

Page 16: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 16

Rolls-Royce

University of Oxford, Lionel Tarassenko.

University of Leeds, Peter Dew, Alison McKay.

York, J Austin, J McDermid, A Wellings.

University of Sheffield, P Fleming.

Rolls-Royce, Derby.

Data Systems & Solutions.

Cybula Ltd.

Page 17: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 17

Engine flight data

Airline office

Maintenance Centre

European data center

London Airport

New York Airport

American data center

GridDiagnostics centre

Page 18: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 18

Diagnostic issues• The system must analyse and report

– Novel engine operation– Identify any cause of events– Do this quickly

• Data– Large (many Tb)

Page 19: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 19

Data – Zmod plots

Page 20: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 20

How does AURA contribute• Search technology for multi-media data

• Parallel pattern match engine based on neural networks.

• Built on Correlation Matrix Memories.

• High performance Beowulf and dedicated hardware implementations.

• Commercially sold by Cybula Ltd.

Page 21: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 21

QuoteNovelty indication

Data used to identify novelty

Data reductionprocesses

Features

Data stores/data warehouse

Diagnostic stationEngine data

Data to be searched for Pattern match

results

Match requests

AURA-G

GRID

Diagnosis

Page 22: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 22

Data sample DM coding CMM

Matching previous events

Simple example of processing chain

Page 23: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 23

Typical pre-processing

DM coding01101111011110111

(1 up and 0 down)

FastPreserves informationProduces a binary vector

Time

Fre

quen

cy

Page 24: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 24

AURA-G

• This is a Globus enabled AURA implementation.

• Developed under DAME

• Will be available end of 2002 for use in other problems.

Page 25: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 25

AURA-G

• Support of scalable pattern matching

• Supports distributed search, across multiple CMM engines at different sites

• OGSA compliant

Page 26: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 26

Grid Issues in Data Mining

• Data provenance

• Standards:– Data transparency independent of location– Managing DB/Data mining link in distributed

system– OGSA DAI

Page 27: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 27

Conclusions

• AURA is a mature component for data search and retrieval

• Robust software and hardware implementation available

• Applications in e-Science for Grid applications underway

Page 28: Data Mining with AURA Jim Austin University of York & Cybula Ltd

22 Oct 2001 28

ContactsJim Austin

Dept Computer Science, University of York, York,

YO1O 5DD.

www.cs.york.ac.uk/arch

[email protected]

01904 432734

01904 432767

Cybula Ltd.

www.cybula.com

01377 236382

DAME : www.cs.york.ac.uk/dame