discovering lag interval for temporal dependencies

24
Discovering Lag Interval For Temporal Dependencies Larisa Shwartz [email protected] Liang Tang, Tao Li, Larisa Shwartz 1 Liang Tang, Tao Li {ltang002,taoli}@cs.fi u.edu

Upload: daxia

Post on 22-Feb-2016

56 views

Category:

Documents


0 download

DESCRIPTION

Discovering Lag Interval For Temporal Dependencies. Larisa Shwartz [email protected]. Liang Tang, Tao Li {ltang002,taoli}@ cs.fiu.edu. An Example for Time Lag. Disk_Capacity ⟶ [5min,6min] Database, [5min, 6min] is the lag interval. Why time lag is important?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Discovering Lag Interval For Temporal Dependencies

Discovering Lag Interval For Temporal

DependenciesLarisa Shwartz

[email protected]

Liang Tang, Tao Li, Larisa Shwartz1

Liang Tang, Tao Li {ltang002,taoli}@cs.fiu.edu

Page 2: Discovering Lag Interval For Temporal Dependencies

An Example for Time Lag

Liang Tang, Tao Li, Larisa Shwartz

Disk_Capacity ⟶ [5min,6min] Database, [5min, 6min] is the lag interval.

2

3 5 7 8 9 13 1715Timestamp(Minutes):

Disk_Capactiy

Database

A

B B

A A

BB665

C C CC CApp_Heartbeat C

A

B5

23

C C C C C C C C CC

11

B

Why time lag is important?• If the time lag is close to 0, database is writing a huge log. • If the time lag is larger than 0, disk is really full.

Page 3: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Problem DefinitionOur Problem:

Given a temporal dependency A⟶B: when event A happens, B will also happen. What is the time lag between dependent event A and B?

Why study this problem:The time lag indicates the cause of the temporal

dependency.

3

Page 4: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Related WorkAsk the user to predefine a time window for

analyzing the event associations (The user may not know).

Assume the temporal dependency is not interleaved (Two dependent A and B has no other A and B between them).

4

3 5 7 8 9 13 1715Timestamp(Minutes):

Disk_Capactiy

Database

A

B B

A A

BB665

C C CC CApp_Heartbeat C

A

B5

23

C C C C C C C C CC

11

B

Overlap (Interleaved)

Page 5: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Relation with Other Temporal Patterns

5

MutuallyDependent

{A,B}

Partial Periodic A with periodic p and time tolerance δ

Frequent Episode A->B->C

Loose Temporal B follows A before t

Stringent Temporal B follows A about t

, ABBA tt ],0[],0[

AA pp ],[

, CBBA tt ],0[],0[ BA t ],0[ BA tt ],[

Those temporal patterns can be seen as the temporal dependency with particular constraints on the time lag.

Page 6: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Challenges for Finding Time LagGiven a temporal dependency, A⟶[t1,t2]B, what

kind of lag interval [t1,t2] we want to find? If the lag interval is too large, every A and every B

would be “dependent”. If the lag interval is too small, real dependent A and

B might not be captured.

Time complexity is too high.A⟶[t1,t2]B, t1 and t2 can be any distance of any two

time stamps. There are O(n4) possible lag intervals.

6

Page 7: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

What Is a Qualified Lag IntervalIf [t1,t2] is qualified, we should observe many

occurrences for A⟶[t1,t2]B.

7

Lag Interval Number of Occurrences

[0,1] 3[5,6] 4[0,6] 4[0,+∞] 4

3 5 7 8 9 13 1715Timestamp(Minutes):

Disk_Capactiy

Database

A

B B

A A

BB

C C CC CApp_Heartbeat C

A

B

23

C C C C C C C C CC

11

B

Length of the lag interval is larger, the number of occurrences also becomes larger.

Page 8: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

What Is a Qualified Lag Interval Intuition:

If B is randomly and independently distributed, how many occurrences observed in a time interval [t1,t2]?

What is the minimum number of occurrences? Consider the number of occurrences in a lag interval to be

a variable, nr. Then, use the chi-square test to judge whether it is caused by randomness or not?

8

)1()( 2

2

rrA

rArr PPn

Pnn

TnrP B

r ||

The number of As Time frame for the event sequence

Expected value

Page 9: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Brute-Force AlgorithmAlgorithm: For A⟶[t1,t2]B, for every possible t1 and

t2, scan the event sequence and count the number of occurrences.

Time ComplexityThe number of distinct time stamps is O(n).The number of possible t1 and t2 is O(n2). The number of possible [t1,t2] is O(n4).Each scanning is O(n). The total cost is O(n5).

Cannot handle event sequences.

9

Page 10: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Maximum Length of Qualified Lag Interval

10

Event Sample Rate(polling interval in system monitoring, a small constant).

The length of a qualified lag interval cannot be very long.When you increase the length of lag interval, the

minimum threshold for the number of occurrences also increases.

Lemma 2: Any qualified lag interval’s length is less than T/N ∙ 1/minsup.

Page 11: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

STScan AlgorithmIdea:

Avoid redundant scanning, store all time lags into a sorted table.

11

...24523012085200

161934102...

34161934102...

34192382102...

122325118...

5122425118...

25118... ... ... ...

... ... ...

Linked List(Time Lag)

Indices of A

Indices of B

IA1 IA2 IA3

IB2 IB3IB1

E1 E2 E3 E4

t(x5)-t(x3)=3030-3010=20.E2 is 20, soinsert 3 into IA2,insert 5 into IB2.

...BAA...Event Sequence

...303030103010...Time stamp

Index ...543...

Page 12: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

STScan AlgorithmEvery lag interval is represented as a sub-segment of

the linked list.

For example: [20,120] is E2E3E4, the number of occurrences is|IA2 ∪ IA3 ∪ IA4 |

12

...24523012085200

161934102...

34161934102...

34192382102...

122325118...

5122425118...

25118... ... ... ...

... ... ...

Linked List(Time Lag)

Indices of A

Indices of B

IA1 IA2 IA3

IB2 IB3IB1

E1 E2 E3 E4

Time cost for creating this table is O(n2).

The number of elements is O(3n2)=O(n2).

Time cost for scanning is O(n2).

Page 13: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

STScan* AlgorithmProblem of STScan: Space cost O(n2) is too big

to run out of memory.

Observation: STScan only scans one sub-segment at one time and never goes back.

Solution: Incrementally create the sort table and scan.

13

Page 14: Discovering Lag Interval For Temporal Dependencies

61453123

4024102

36206-2

248-6-14

B1 B2 B3 B4 ...

A1

A2

A3

A4 ...

...

...

...

2320

...

...

Incremental Sorted Table Time Lag List of Each A

E4

E5

Liang Tang, Tao Li, Larisa Shwartz

STScan* Algorithm

14

Sort events by time stamps.

We visited the lag interval of sub-segment: E4E5.

The next lag interval is sub-segment:E5E6

We need to first create E6

...B2B1A2A1Event Sequence

...3123210Time stamp

Index ...4321 Ak :the k-th A Bk :the k-th B.

Page 15: Discovering Lag Interval For Temporal Dependencies

61453123

4024102

36206-2

248-6-14

B1 B2 B3 B4 ...

A1

A2

A3

A4 ...

...

...

...

2320

...

...

Incremental Sorted Table Time Lag List of Each A

E4

E5

E624

Liang Tang, Tao Li, Larisa Shwartz

STScan* Algorithm

15

A2, A4’ pointed time lags have the smallest value, 24, so E6=24.

Move A2, A4’ pointers to the next position.

Create links from E6 to A2 and A4.

...B2B1A2A1Event Sequence

...3123210Time stamp

Index ...4321 Ak :the k-th A Bk :the k-th B.

Page 16: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

STScan* Algorithm

16

61453123

4024102

36206-2

248-6-14

B1 B2 B3 B4 ...

A1

A2

A3

A4 ...

...

...

...

2320

24

...

...

Incremental Sorted Table Time Lag List of Each A

For every A, only keep the pointer for the next index of B.

Merge time lag lists of each A (like merge-sort).

Only keep O(n·|r|max) links, the space cost is O(n), where |r|max is maximum length of qualified interval.

...B2B1A2A1Event Sequence

...3123210Time stamp

Index ...4321 Ak :the k-th A Bk :the k-th B.

Page 17: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Time Complexity Lower BoundThe problem of finding all qualified time intervals

is 3SUM-Hard, so the there is o(n2) algorithm in the worst case.

3SUM problem: Given a set of n integers, is there three integers a,b,c in the set such that a+b=c?

No o(n2) algorithm can solve this problem in the worst case.

17

Page 18: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

EvaluationEvaluation Objectives:

Effectiveness: Is able to find the interleaved temporal dependencies? The lag interval is correct?

Efficiency: Run time cost Memory space cost

Comparative Methods: Inter-arrival: do clustering on time lags of A and its

following B. brute-force: try every possible t1,t2 for lag interval [t1,t2]. brute-force*: brute-force with pruning by |r|max .

Testing Environment: Linux 2.6, Intel Xeon 2.5G (8 core), Java VM Memory Heap:

12Gbytes 18

Page 19: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Data SetsSynthetic data: 7 data sequences. 8 event types. Average

sample period is 100. Random generated with 3 embedded dependencies.

19

Embedded Dependency supportI1⟶[400,500]I2 0.1I2⟶[1000,1100]I3 0.12I4⟶[5500,5800]I5 0.15

Dataset Time Frame #Events #Event Types

Account1 54 days 1,124,834 95

Account2 32 days 2,076,408 104

Time lags are large. Dependent items are very likely to be interleaved.

Real data: Tivoli Monitoring system events from two large accounts in IBM service center.

Page 20: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Synthetic DataEffectiveness:

brute-force, brute-force*,STScan, STScan* can find all embedded temporal dependencies if they can finish the running.

inter-arrivals fails.

Efficiency:

20

Data size 103 104 5∙104 105

STScan 3∙104 3∙106

8∙107 OutOfMemory

STScan* 103 104 5∙104 105

Brute-Force 9∙102 104 5∙104 9∙104

Brute-Force*

9∙102 104 5∙104 9∙104

Inter-arrival

<102 <102 <102 <102

Page 21: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Tivoli Monitoring System Events

21

Dataset Discovered DependenciesAccount1 MSG_Plat_APP ⟶[3600,3600] MSG_Plat_APP

Linux_Process ⟶[0,96] Process

SMP_CPU⟶[0,27] Linux_Process

Account2 TEC_Error ⟶[0,1] Ticket_Retry

TEC_Retry ⟶[0,1] Ticket_Error

AIX_HW_ERROR⟶[8,9] AIX_HW_ERROR

Event Plot for Account2

Inter-arrivals only find

Page 22: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Tivoli Monitoring System Events

22

Run times on Account1 data Run times on Account2 data

Page 23: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

Conclusion and Future WorkConclusion

Study the problem of discovering interleaved temporal dependencies.

Propose STScan and STScan* two algorithms, which are faster than brute-force search approaches, although their time complexities are still high O(n2).

Prove that the problem is 3SUM-Hard.

Future workDevelop an approximation algorithm which can solve

the problem in a linear time complexity.

23

Page 24: Discovering Lag Interval For Temporal Dependencies

Liang Tang, Tao Li, Larisa Shwartz

EndThank you!

Any question?

24