fair use agreement

23
08/25/2004 KDD ‘04 1 Fair Use Agreement Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides for teaching, if You send me an email telling me the class number/ university in advance. • My name and email address appears on the first slide (if you are using all or most of the slides), or on each slide (if you are just taking a few slides). You may freely use these slides for a conference presentation, if • You send me an email telling me the conference name in advance. • My name appears on each slide you use. • You may not use these slides for tutorials, or in a

Upload: eliora

Post on 22-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Fair Use Agreement. This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides for teaching, if You send me an email telling me the class number/ university in advance. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fair Use Agreement

08/25/2004 KDD ‘04 1

Fair Use AgreementFair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully.

• You may freely use these slides for teaching, if • You send me an email telling me the class number/ university in advance.• My name and email address appears on the first slide (if you are using all or most of the slides), or on each slide (if you are just taking a few slides).

• You may freely use these slides for a conference presentation, if • You send me an email telling me the conference name in advance.• My name appears on each slide you use.

• You may not use these slides for tutorials, or in a published work (tech report/ conference paper/ thesis/ journal etc). If you wish to do this, email me first, it is highly likely I will grant you permission.

(c) Eamonn Keogh, [email protected]

Page 2: Fair Use Agreement

08/25/2004 KDD ‘04 2

Visually Mining and Monitoring Massive Time Series

Jessica Lin* Eamonn Keogh Stefano Lonardi (UC Riverside)Jeffrey Lankford Donna Nystrom (The Aerospace Corp)

Page 3: Fair Use Agreement

08/25/2004 KDD ‘04 3

Motivation• Before the launch of any unmanned space vehicle, a

critical “go/no go” decision must be made.– Data from past launches is available to assist in the decision-

making.– Streaming telemetry must be constantly monitored to detect

any potential problems.

• A single framework is needed to perform these two tasks.– Existing tools inadequate for such tasks.

Page 4: Fair Use Agreement

08/25/2004 KDD ‘04 4

Introduction• We introduce VizTree

– Mining archival data• Pattern discovery

– repeated pattern discovery (motif discovery), – anomaly detection, – query-by-content

– Monitoring incoming streaming data

• Why visualization?– human eye is often advocated as the ultimate data-

mining tool– User-interaction allows visual data exploration and

hypotheses testing

Page 5: Fair Use Agreement

08/25/2004 KDD ‘04 5

Outline• Introduction • Related Works• VizTree Motivation• VizTree Implementation

– Time Series Discretization

• Experimental Evaluation• Diff Tree• Discussion/Conclusion

Page 6: Fair Use Agreement

08/25/2004 KDD ‘04 6

-400 -300 -200 -100 0 100 200 300 400-400

-300

-200

-100

0

100

200

300

400

Jan 1

Dec 23Monday 00:01

Friday 23:59

-400 -300 -200 -100 0 100 200 300 400-400

-300

-200

-100

0

100

200

300

400

Jan 1

Dec 23Monday 00:01

Friday 23:59

Jan 1

Dec 23Monday 00:01

Friday 23:59

Related Work 1:Related Work 1:Time Series SpiralsTime Series Spirals

• Spiral Axis = serial attributes are encoded as line thickness

• Radii = periodic attributes

Carlis & Konstan. UIST-98Independently rediscovered by

Weber, Alexa & Müller InfoVis-01But dates back to 1888!

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Sunday

One year of power demand data

-400 -300 -200 -100 0 100 200 300 400-400

-300

-200

-100

0

100

200

300

400

Page 7: Fair Use Agreement

08/25/2004 KDD ‘04 7

Related Work 2: Related Work 2: TimeSearcherTimeSearcher

CommentsComments• Simple and intuitive• Highly dynamic exploration

• Query power may be limited and simplistic• Limited scalability

Hochheiser, and Shneiderman

Page 8: Fair Use Agreement

08/25/2004 KDD ‘04 8

Related Work 3 – Calendar-based

The cluster and calendar-based visualization on employee working hours data. It shows six clusters, representing different working-day patterns.

Page 9: Fair Use Agreement

08/25/2004 KDD ‘04 9

Motivation of VizTree Motivation of VizTree 10001000101001000101010100001010100010101110111101011010010111010010101001110101010100101001010101110101010010101010110101010010110010111011110100011100001010000100111010100011100001010101100101110101

01011001011110011010010000100010100110110101110000101010111011111000110110110111111010011001001000110100011110011011010001011110001011010011011001101000000100110001001110000011101001100101100001010010

Here are two sets of bit strings. Which set is generated by a human and which one is generated by a computer?

Here are two sets of bit strings. Which set is generated by a human and which one is generated by a computer?

Page 10: Fair Use Agreement

08/25/2004 KDD ‘04 10

VizTree VizTree 10001000101001000101010100001010100010101110111101011010010111010010101001110101010100101001010101110101010010101010110101010010110010111011110100011100001010000100111010100011100001010101100101110101

01011001011110011010010000100010100110110101110000101010111011111000110110110111111010011001001000110100011110011011010001011110001011010011011001101000000100110001001110000011101001100101100001010010

“humans usually try to fake randomness by alternating patterns”

Lets put the sequences into a depth limited tree, such that the frequencies of all triplets are encoded in the thickness of branches…

0

1

00

0

11

1

Page 11: Fair Use Agreement

08/25/2004 KDD ‘04 11

VizTree VizTree

Zoom in

The “trick” on the previous slide only works for discrete data, but time series are real valued.

But we can SAX up a time series to make it discrete!

But we can SAX up a time series to make it discrete!

VisTreeVisTree• Convert the time series to SAX• Push the data in a depth-limited suffix tree• Encode the frequencies as the line thickness

VisTreeVisTree• Convert the time series to SAX• Push the data in a depth-limited suffix tree• Encode the frequencies as the line thickness

Overview Details 1

Details 2

Overview, zoom & filter, details on demand

Overview, zoom & filter, details on demand

Page 12: Fair Use Agreement

08/25/2004 KDD ‘04 12

SAXSAXSSymbolic ymbolic AAggregate ggregate

ApproApproXXimationimation

baabccbc

Page 13: Fair Use Agreement

08/25/2004 KDD ‘04 13

How do we obtain SAX?How do we obtain SAX?

bccbaaba

First convert the time series to PAA representation, then convert the PAA to symbols

It take linear time

0 20 40 60 80 100 120

C

C

0

-

-

0 20 40 60 80 100 120

bbb

c

aa

a

c

Page 14: Fair Use Agreement

08/25/2004 KDD ‘04 15

Visual ComparisonVisual Comparison

A raw time series of length 128 is transformed into the word “aaaaaabbbccdeffdcbbdcdefffffdccbb.”– We can use more symbols to represent the time series since each

symbol requires fewer bits than real-numbers (float, double)

-3

-2 -1 0 1 2 3

DFT

PLA

Haar

APCA

f e d c b a

Page 15: Fair Use Agreement

Subsequence Matching/Motif Dicovery

This example demonstrates subsequence matching and motif discovery. We want to find a U-shaped pattern, so we’d try something that starts high, descends, and then ascends again. Clicking on “abdb” shows such patterns.

Ben Shneiderman

Zoom in

Overview, zoom & filter, details on demand

Overview, zoom & filter, details on demand

Page 16: Fair Use Agreement

08/25/2004 KDD ‘04 17

Motif Discovery

Clicking on “abxx” shows this repeated patterns

Page 17: Fair Use Agreement

08/25/2004 KDD ‘04 18

Anomaly Detection 1

Clicking on the branch “acxx” shows the anomalous heartbeat

Page 18: Fair Use Agreement

08/25/2004 KDD ‘04 19

Anomaly Detection 2

Clicking on “bab” shows the anomalous week (Christmas). Instead of a normal 5-working-day week, it has 3-working day during Christmas.

Page 19: Fair Use Agreement

08/25/2004 KDD ‘04 20

Diff Tree Diff Tree

DiffTreeDiffTree• Convert the two time series to SAX

• Push the data in a depth-limited suffix tree

• Encode the difference of frequencies as the line thickness

• Encode the significance of difference as the line color intensity

• Rank the surprising patterns

DiffTreeDiffTree• Convert the two time series to SAX

• Push the data in a depth-limited suffix tree

• Encode the difference of frequencies as the line thickness

• Encode the significance of difference as the line color intensity

• Rank the surprising patterns

Blue lines - pattern is more common in AGreen lines - pattern is more common in BRed lines – surprising patterns

Page 20: Fair Use Agreement

08/25/2004 KDD ‘04 21

Diff Tree 2

Page 21: Fair Use Agreement

08/25/2004 KDD ‘04 22

Scalability

• The pixel space of the tree is determined solely by the number of segments and alphabet size. – Constant and independent of the size of time series– Size of the dataset plays a role in memory space,

since each node in the tree stores the offsets of its subsequences. However, SAX allows efficient numerosity reduction to reduce the number of subsequences being included into the tree

• large amounts of dimensionality reduction do not greatly affect the accuracy of our results (for the power dataset, the dimensionality is reduced from 672 to 3, a compression ratio of 224-to-1).

Page 22: Fair Use Agreement

08/25/2004 KDD ‘04 23

Conclusion

• We propose VizTree, a novel time series visualization tool for pattern discovery.– Frequently occurring patterns– Anomaly detection– Query-by-content

• A single framework that allows both mining of the archival data and monitoring of streaming data.

• Highly scalable.

Page 23: Fair Use Agreement

08/25/2004 KDD ‘04 24

Future Work• Researchers from other sectors of the industry

can greatly benefit from our system as well. – it could potentially be used for indexing and editing

video sequences.

• Problems that can be indirectly solved:– Subsequence Clustering– Time Series Rule Discovery

• While we mainly focus on the “mining” aspect in this paper, we will extend VizTree to accept online streaming data for monitoring purposes.