using shapes of trends in active data mining

33
Using Shapes of Trends Using Shapes of Trends in Active Data Mining in Active Data Mining Duy Lam Duy Lam Norris Boothe Norris Boothe

Upload: savea

Post on 31-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Using Shapes of Trends in Active Data Mining. Duy Lam Norris Boothe. Shape Querying and Active Data Mining. Historical time sequences make up a large portion of data stored in computers Mining trends in histories useful - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using Shapes of Trends in Active Data Mining

Using Shapes of Trends in Using Shapes of Trends in Active Data MiningActive Data Mining

Duy LamDuy Lam

Norris BootheNorris Boothe

Page 2: Using Shapes of Trends in Active Data Mining

Shape Querying and Active Shape Querying and Active Data MiningData Mining

Historical time sequences make up a Historical time sequences make up a large portion of data stored in large portion of data stored in computerscomputers

Mining trends in histories usefulMining trends in histories useful Many applications, including Many applications, including

observing trends in stock prices, observing trends in stock prices, online bids, and rule miningonline bids, and rule mining

Page 3: Using Shapes of Trends in Active Data Mining

OverviewOverview

Overview of SDLOverview of SDL SDL languageSDL language Applications to data miningApplications to data mining

Page 4: Using Shapes of Trends in Active Data Mining

A (Very) Simple HistoryA (Very) Simple History

Page 5: Using Shapes of Trends in Active Data Mining

Shape Definition LanguageShape Definition Language

SDL is a shape definition language SDL is a shape definition language used to query the “shapes” of historiesused to query the “shapes” of histories

Small, powerful language that allows Small, powerful language that allows “blurry” matching“blurry” matching

Designed to make it easy and natural Designed to make it easy and natural to queryto query Easily implementableEasily implementable Little non-determinismLittle non-determinism

Page 6: Using Shapes of Trends in Active Data Mining

AlphabetAlphabet

SDL allows you to specify an “alphabet” defining SDL allows you to specify an “alphabet” defining transitionstransitions

Example:Example:SymbolSymbol DescriptionDescription

upup Slightly increasing transitionSlightly increasing transition

UpUp Highly increasing transitionHighly increasing transition

downdown Slightly decreasing transitionSlightly decreasing transition

DownDown Highly decreasing transitionHighly decreasing transition

appearsappears Transition from zero to non-zeroTransition from zero to non-zero

disappearsdisappears Transition from non-zero to zeroTransition from non-zero to zero

stablestable The final value nearly equal to initial The final value nearly equal to initial valuevalue

zerozero Both initial and final value are zeroBoth initial and final value are zero

Page 7: Using Shapes of Trends in Active Data Mining

So with this alphabet we can So with this alphabet we can describe a shapedescribe a shape

Use such a description to query a Use such a description to query a history to produce all subsequences history to produce all subsequences that match the shapethat match the shape

Describing a shapeDescribing a shape

(shape name(parameters) descriptor)

Page 8: Using Shapes of Trends in Active Data Mining

(shape spike() (concat Up up down Down))

Page 9: Using Shapes of Trends in Active Data Mining

Derived ShapesDerived Shapes

anyany allows a shape to have multiple valuesallows a shape to have multiple values

concatconcat shapes can be concatenated together shapes can be concatenated together

contiguouslycontiguously

(any up Up)

(concat down up down up)

Page 10: Using Shapes of Trends in Active Data Mining

Multiple Occurrence Multiple Occurrence OperatorsOperators

Shapes made of multiple contiguous Shapes made of multiple contiguous occurrences of the same shapeoccurrences of the same shape

Resulting subsequences are such that Resulting subsequences are such that they are neither preceded nor followed they are neither preceded nor followed by a subsequence that matches Pby a subsequence that matches P

(exact 5 (any up Up))

(atleast 3 stable)

(atmost 2 (concat disappear appear))

Page 11: Using Shapes of Trends in Active Data Mining

Bounded Occurrence Bounded Occurrence OperatorsOperators

inin permits “blurry” matching by allowing users to state an permits “blurry” matching by allowing users to state an

overall shape without specific detailsoverall shape without specific details

within the specified time period length, we can within the specified time period length, we can have a specified number of occurrences of a shapehave a specified number of occurrences of a shape

can have arbitrary gaps and can have overlapcan have arbitrary gaps and can have overlap

(in 7 (nomore 5 up))

(precisely n P)(noless n P)(nomore n P)

Page 12: Using Shapes of Trends in Active Data Mining

Bounded Occurrence Bounded Occurrence OperatorsOperators

inorderinorder specifies shapes that must appear in a specifies shapes that must appear in a

specific orderspecific order

(inorder P1 P2 ... Pn)

Page 13: Using Shapes of Trends in Active Data Mining

(in 5 (and (noless 2 (any up Up)) (nomore 1 (any down Down))))

Shape Definition ExamplesShape Definition Examples

Page 14: Using Shapes of Trends in Active Data Mining

Shape Definition ExamplesShape Definition Examples

(in 7 (inorder (atleast 2 (any up Up))(in 4 (noless 3 (any down Down))))))

Page 15: Using Shapes of Trends in Active Data Mining

Parameterized ShapesParameterized Shapes

Can parameterize shape definitions Can parameterize shape definitions instead of using concrete valuesinstead of using concrete values

(shape spike(upcnt dncnt)(concat (exact upcnt (any up Up)) (exact dncnt (any down Down))))

(shape doublepeak(width ht1 ht2)(in width (inorder spike(ht1 ht1)

spike(ht2 ht2))))

Page 16: Using Shapes of Trends in Active Data Mining

Advantages of SDLAdvantages of SDL

natural and powerful language for natural and powerful language for expressing shape queriesexpressing shape queries

capability of blurry matchingcapability of blurry matching reduction of output clutterreduction of output clutter efficient implementationefficient implementation

Page 17: Using Shapes of Trends in Active Data Mining

SDL’s Expressive PowerSDL’s Expressive Power

SDL is equivalent to regular SDL is equivalent to regular expressions for regular matchingexpressions for regular matching

several features enchance its several features enchance its effectivesness, howevereffectivesness, however greedy matching and “lookahead” greedy matching and “lookahead”

capabilities help reduce output cluttercapabilities help reduce output clutter

Page 18: Using Shapes of Trends in Active Data Mining

SDL’s Expressive PowerSDL’s Expressive Power

““blurry” matching enables a much more blurry” matching enables a much more natural and compact specification of certain natural and compact specification of certain shapesshapes

For example, if we wanted precisely one For example, if we wanted precisely one occurrence of each aoccurrence of each aii in any order in any order in SDL:in SDL:

regular expressions requires at least exponential regular expressions requires at least exponential size to specify!size to specify!

(and (precisely 1 a1)(precisely 1 a2)...(precisely 1 an))

Page 19: Using Shapes of Trends in Active Data Mining

SDL SummarySDL Summary

SDL is a small, powerful language for SDL is a small, powerful language for naturally and intuitively expressing naturally and intuitively expressing shapes found in historiesshapes found in histories

Equivalent in power to regular Equivalent in power to regular expressions, but much more expressions, but much more effectiveeffective

Permits “blurry” matchingPermits “blurry” matching

Page 20: Using Shapes of Trends in Active Data Mining

Using SDL inUsing SDL inActive Data MiningActive Data Mining

Page 21: Using Shapes of Trends in Active Data Mining

Static Data MiningStatic Data Mining

Discovery of rules forDiscovery of rules for AssociationsAssociations SequencesSequences ClassificationClassification

Entire data set is minedEntire data set is mined Inherent weakness: Rules are not Inherent weakness: Rules are not

staticstatic

Page 22: Using Shapes of Trends in Active Data Mining

Active Data MiningActive Data Mining

Partition into time periodsPartition into time periods Run data mining algorithm on each Run data mining algorithm on each

periodperiod Gather rules into a ‘rulebase’Gather rules into a ‘rulebase’ Create triggers to discoverCreate triggers to discover

Trends in rulesTrends in rules Associations between rulesAssociations between rules

Page 23: Using Shapes of Trends in Active Data Mining

Period 3Rules

Active Data Mining ProcessActive Data Mining Process

Period 1RulesLarge

DataBase

Rule ID

History (support, confidence, etc)

1 2 … n

1

2

Period 2Rules

Page 24: Using Shapes of Trends in Active Data Mining

SelectedRules

Active Data Mining Process Active Data Mining Process (cont).(cont).

Rule ID

History (support, confidence, etc)

1 2 … n

1

2

ShapeDefinitionLanguage

TriggerDefinitionLanguage

ActiveData

Mining

Page 25: Using Shapes of Trends in Active Data Mining

Active Data Mining Active Data Mining ComponentsComponents

Shape definitions (SDL)Shape definitions (SDL) (shape (shape name(parameters) descriptorname(parameters) descriptor)) Ex: Ex:

(shape spike(upcnt dncnt)(shape spike(upcnt dncnt)

(concat (atleast upcnt (any up Up))(concat (atleast upcnt (any up Up))

(atleast dncnt (any down (atleast dncnt (any down Down))))Down))))

QueriesQueries TriggersTriggers

Page 26: Using Shapes of Trends in Active Data Mining

QueriesQueries For rule selectionFor rule selection Syntax:Syntax:

(query ((query (shape shape ((history-name start-time end-history-name start-time end-timetime))))))

‘‘start’ and ‘end’ specify the end points of start’ and ‘end’ specify the end points of historyhistory

Result: rules that match the desired shapeResult: rules that match the desired shape Ex:Ex:

(shape ramp() (concat Up Up))(shape ramp() (concat Up Up))(query (ramp() (confidence start end)))(query (ramp() (confidence start end)))

Page 27: Using Shapes of Trends in Active Data Mining

(shape upramp(len cnt) (shape upramp(len cnt) (in len (noless cnt (any up Up))))(in len (noless cnt (any up Up))))

(shape dnramp(len cnt) (shape dnramp(len cnt) (in len (noless cnt (any down Down))))(in len (noless cnt (any down Down))))

(query (and(query (and(upramp(5 3) (support start 10))(upramp(5 3) (support start 10))(dnramp(5 3) (confidence start (dnramp(5 3) (confidence start

10))))10))))

Larger Query ExampleLarger Query Example

Results: rules where support is increasing but confidence is decreasing

Page 28: Using Shapes of Trends in Active Data Mining

TriggersTriggers Datastream type functionalityDatastream type functionality ECA (Event Condition Action) model used ECA (Event Condition Action) model used

(Chakravarthy et al. 1989)(Chakravarthy et al. 1989)

Syntax:Syntax: (trigger (trigger trigger-nametrigger-name

(events (events events-specevents-spec))(condition ((condition (shape history-specshape history-spec))))(actions (actions action-specaction-spec))))

Events:Events: Rule creationRule creation History updatesHistory updates

Page 29: Using Shapes of Trends in Active Data Mining

Wave Execution SemanticsWave Execution Semantics Stratified execution of triggers – Stratified execution of triggers –

similar to Datalogsimilar to Datalog

Set ofEvents

Triggersfor thoseEvents

Queriesfor thoseTriggers

Set ofActions/Events

Page 30: Using Shapes of Trends in Active Data Mining

Trigger ExampleTrigger Example Identifying rules where support is Identifying rules where support is

increasing, but confidence is decreasingincreasing, but confidence is decreasing(trigger detect_up(trigger detect_up

(events updatehistory)(events updatehistory)(condition (upramp 5 4) (support (- end 5) end)))(condition (upramp 5 4) (support (- end 5) end)))(actions upward))(actions upward))

(trigger detect_dn(trigger detect_dn(events upward)(events upward)(condition (dnramp 5 4) (confidence (- end 5) (condition (dnramp 5 4) (confidence (- end 5)

end)))end)))(actions notify))(actions notify))

Page 31: Using Shapes of Trends in Active Data Mining

ImplementationImplementation Implemented on AIX systemImplemented on AIX system

Part of IBM’s Quest projectPart of IBM’s Quest project Successfully tested:Successfully tested:

Large set (5 years) of mail order data Large set (5 years) of mail order data (2.9 million records)(2.9 million records)

Large set (3 years) of POS (point-of-Large set (3 years) of POS (point-of-sale) transactions (6.8 million records)sale) transactions (6.8 million records)

Page 32: Using Shapes of Trends in Active Data Mining

Future WorkFuture Work At time of paper…At time of paper…

Integrate constructs into a SQL relational Integrate constructs into a SQL relational systemsystem

Improve incremental computations using partial Improve incremental computations using partial results of current trigger queriesresults of current trigger queries

Since then…Since then… Integrated into the Quest Data Mining SystemIntegrated into the Quest Data Mining System Subsumed into IBM’s data mining products, Subsumed into IBM’s data mining products,

including Intelligent Minerincluding Intelligent Miner Referenced for work in Active Data Mining and Referenced for work in Active Data Mining and

“blurry” pattern matching“blurry” pattern matching

Page 33: Using Shapes of Trends in Active Data Mining

ReferencesReferences ““Querying Shapes of Histories”, by Rakesh Agrawal, Querying Shapes of Histories”, by Rakesh Agrawal,

Giuseppe Psaila, Edward L. Wimmers, and Mohamed Giuseppe Psaila, Edward L. Wimmers, and Mohamed Zait of the IBM Almden Research Center, 1995Zait of the IBM Almden Research Center, 1995

““Active Data Mining”, by Rakesh Agrawal and Active Data Mining”, by Rakesh Agrawal and Giuseppe Psaila of the IBM Almden Research Center, Giuseppe Psaila of the IBM Almden Research Center, 19951995

““The Quest Data Mining System”, by Rakesh Agrawal, The Quest Data Mining System”, by Rakesh Agrawal, Manish Mehta, John Shafer, and Ramakrishnan Manish Mehta, John Shafer, and Ramakrishnan Srikant of the IBM Almden Research Center in Srikant of the IBM Almden Research Center in coordination with Andreas Arning and Toni Bollinger coordination with Andreas Arning and Toni Bollinger of the IBM German Software Laboratory, 1996of the IBM German Software Laboratory, 1996

IBM Almden Research Center Website: IBM Almden Research Center Website: http://www.almaden.ibm.com/software/quest/http://www.almaden.ibm.com/software/quest/