continuous data stream processing music virtual channel – extensions data stream monitoring –...
Post on 19-Dec-2015
226 views
TRANSCRIPT
Continuous Data Stream Continuous Data Stream ProcessingProcessing
Music Virtual Channel – extensionsData Stream Monitoring – tree pattern miningContinuous Query Processing – sequence queries
Date: 2005/10/21Post-Excellence ProjectPost-Excellence ProjectSubproject 6Subproject 6
Continuous Data Stream Management
22
Clusteringengine
Clusteringengine
Music metadata
Music metadata
Music Virtual Channel Music Virtual Channel Extensions Extensions
…11
NN
22
…
Music collections
Internet V.C.player
V.C.player Filtering
engineFilteringengine
Music channel simulat
or
Music channel simulat
or
InterfaceInterface
ProfilemonitorProfile
monitorClustermonitorClustermonitor
ChannelmonitorChannelmonitor
FavoritechannelFavoritechannel
Clustercoordinator
Clustercoordinator
Peer searchengine
Peer searchengine
Profiledatabase
Profiledatabase
MusicXML
database
MusicXML
database
XML Filteringengine
XML Filteringengine
Continuous Data Stream Management
33
An Extension on Virtual ChannelAn Extension on Virtual Channel
After a player starts a rangerange (or kNNkNN) search, It updates its profile periodically The search results are continuously maintained
V.C. player(query)
0%
10%
20%
30%
40%
50%
POP BLUE ROCK LATIN JAZZ DANCE0%
10%
20%
30%
40%
50%
POP BLUE ROCK LATIN JAZZ DANCE
V.C. player(peer)
0%
10%
20%
30%
40%
50%
POP BLUE ROCK LATIN JAZZ DANCE0%
10%
20%
30%
40%
50%
POP BLUE ROCK LATIN JAZZ DANCE0%
10%
20%
30%
40%
50%
POP BLUE ROCK LATIN JAZZ DANCE
Continuous Data Stream Management
44
An Extension on Virtual ChannelAn Extension on Virtual Channel
Compared with the clustering engine A flexible definition of “clusters” Update is more natural than insertion/deletion No need of parameter setting and re-clustering Indexing can relieve the pain of frequent update
Compared with the problem of moving objects Movements in a high-dimensional feature space In most cases every object is also a query Prediction of object movement is possible
Continuous Data Stream Management
55
When a music piece is played on a channel, The corresponding musicXML file can be obtained A query can be a portion of musicXML or XQuery
An Extension on Favorite ChannelAn Extension on Favorite Channel
Continuous Data Stream Management
66
An Extension on Favorite ChannelAn Extension on Favorite Channel
Compared with query segments More musical semantic in a query Do not interfere the music playback Matching on complex tree-structures
• Common subquery is still useful
Continuous Data Stream Management
77
Research IssuesResearch Issues
Peer Search Engine An indexing method to support continuous query An indexing method to support continuous query
processing for high-dimensional moving objectsprocessing for high-dimensional moving objects A prediction-based bounding mechanism to reduce
the frequency of profile updateXML Filtering Engine
An online method to enable tree pattern mining An online method to enable tree pattern mining over a data streamover a data stream
An indexing mechanism to support XML filtering
Discovering Frequent Tree Discovering Frequent Tree Patterns over Data StreamsPatterns over Data Streams
Submitted for publication
Continuous Data Stream Management
99
Problem DefinitionProblem Definition
As the query trees stream in, find out the subtrees which occur more then θ·N times, where N is the number of trees received so far and 0≦θ 1≦
STMerSTMer
Frequent Tree Patterns
T1 T3 T2
Continuous Data Stream Management
1010
Problem Definition (Cont.)Problem Definition (Cont.)
Labeled ordered treeInduced subtree
B
D C
differs fromB
C D
A
B E
C D
Tree pattern Query Tree
Continuous Data Stream Management
1111
An ExampleAn Example
Given θ = 0.6
Frequent Tree Patterns (occurrence > 0.6*1) :
STMerSTMer
A
B C
A
B CA B C
A
B
A
C
Frequent Tree Patterns (occurrence > 0.6*2) :
B
B
D E
Frequent Tree Patterns (occurrence > 0.6*3) :
A BA
B
A
B F
Continuous Data Stream Management
1212
Main DifficultiesMain Difficulties
The properties of data streams: One pass Traditional tree mining methods fail Fast input rate Efficiency issue is critical Incremental An incremental algorithm is
required Unbounded Approximate counting is needed
Continuous Data Stream Management
1313
An Overview of Our MethodAn Overview of Our Method
Subtree generation
Subtree maintenance
STMerSTMerT1
A candidate pool
Requests on demand
Continuous Data Stream Management
1414
String RepresentationString Representation
DFS order on T (label, level) node sequence S
Continuous Data Stream Management
1515
Subtree GenerationSubtree Generation
Data stream
Buffer A1
A
TD
A1
A
t1
A,1
Buffer A1B2
A
B
TD
B1
B
A
B
A1B2
t2
B,2
Continuous Data Stream Management
1616
Subtree Generation (Cont.)Subtree Generation (Cont.)
Data stream
t1t2
B1
B
A
B
A1B2A1
A
B,2
Buffer A1B2C2
TD
A
B CC1
CA
C
A1C2
A
B C
A1B2C2
A,1C,2
t3
Continuous Data Stream Management
1717
Subtree Generation (Cont.)Subtree Generation (Cont.)
A1 B1
B2
ΦAPT
C1
D2
D1
E3
E2
E1
C2
D3
E4
C2
D3
E4
Buffer A1B2
TD
A
B C
D
E
F2C2 D3 E4
Continuous Data Stream Management
1818
Subtree MaintenanceSubtree Maintenance
Buffer A1B2E2
(E2, 1, 3)
APT
A1 B1 E1
B2 E2
E2
Φ
GPT
+1
#query trees received = 321
(A1, 5, 0)
(B2, 4, 1)
Φ
(C3, 2, 1)
+1
+1
Continuous Data Stream Management
1919
Experiments on SensitivityExperiments on Sensitivity
Minimum support Error parameter
Continuous Data Stream Management
2020
Experiments on ComparisonExperiments on Comparison
StreamT (ICDM’02)
Continuous Data Stream Management
2121
ConclusionConclusion
Contribution A novel technique is proposed for efficient
subtree generation A compact structure is employed to reduce the
the memory requirement of the candidate poolCurrent work
Mining closed frequent subtrees over data streams A
B C
2
A
B5
A
C2
A
5