top-k queries on uncertain data: on score distribution and typical answers presented by qian wan,...
Post on 15-Jan-2016
218 views
TRANSCRIPT
![Page 1: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/1.jpg)
Top-k Queries on Uncertain Data: On score Distribution and Typical Answers
Presented by Qian Wan, HKUSTBased on [1][2]
![Page 2: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/2.jpg)
Introduction: Uncertain Data Management
Modeling Uncertain Data Possible Worlds Model
Uncertain data management Top-k, Join, kNN, Skyline, Indexing,
etc. Uncertain Data Mining
Clustering, Classification, Frequent Pattern, Outlier Detection
![Page 3: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/3.jpg)
Introduction: Data Introduction: Data RepresentationRepresentation
A simple way to representing probabilistic data
Each tuple has a confidence Pr(instance)=
∏Pr(attendance) x ∏Pr(absence)
Mutual Exclusion Constraints for each tuple*
Scoring function*
![Page 4: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/4.jpg)
Introduction: Other WorksIntroduction: Other Works
K tuples that co-exist in a possible world U-Topk
Returning tuples according to marginal distribution of top-k results U-kRanks and PT-k
![Page 5: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/5.jpg)
Introduction: Other Works Introduction: Other Works (Example)(Example)
![Page 6: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/6.jpg)
Introduction: Other Works Introduction: Other Works (drawback)(drawback)
The top-k result may be atypical The distribution of scores is not used
![Page 7: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/7.jpg)
Introduction: c-Typical-Top kIntroduction: c-Typical-Top k
3-Typical-Top 2 scores of this example is {118, 183, 235}
Expected distance is 6.6 The vectors are {(t2, t6),
(T7,T6), (T7,T3)}
![Page 8: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/8.jpg)
Algorithm
Distribution of top-2 tuples’
scores
![Page 9: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/9.jpg)
Algorithm – Naïve approach
INPUT: tuples with membership probabilities
OUTPUT: Top-k scores distribution IDEA: recursively go through all
possible worlds to calculate all probabilities, until reaching a threshold
![Page 10: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/10.jpg)
Algorithm – a DP approach D(i,j): score
distribution of top-j starting at Ti.
The main problem is D(1,k) (?)
![Page 11: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/11.jpg)
Algorithm – a DP approach Transformation:
D(i,j) = TF[D(i+1,j),D(i+1,j-1)]
D(i+1,j): For each (v,p) add (v,
p(1-pi)) D(i+1,j-1):
For each (v,p) add (v+si, p*pi)
Merge duplicate items Bottom up DP Approximation
![Page 12: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/12.jpg)
Handling More Real Scenarios
Handling Mutually Exclusive Rules Compress the ME group Refine by lead tuple region
Handling Ties When two tuples have the same
score, rank them according to probability
![Page 13: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/13.jpg)
Algorithm
118, 0.2
183, 0.15
235, 0.12
0
0.05
0.1
0.15
0.2
0.25
0 50 100 150 200 250
Series1
3-Typical-Top 2 scores
![Page 14: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/14.jpg)
c-Typical-Top kc-Typical-Top k
3-Typical-Top 2 scores of this example is {118, 183, 235}
Expected distance is 6.6 The vectors are {(t2, t6),
(T7,T6), (T7,T3)}
![Page 15: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/15.jpg)
Computing c-Typical-Top kComputing c-Typical-Top k
Define F^a(j) to be the optimal objective over {sj, …, sn} where a is the number of typical scores.
G^a(j) means the same
![Page 16: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/16.jpg)
Computing c-Typical-Top kComputing c-Typical-Top k
Just solve the two function optimization problem, using DP
Boundary conditions
![Page 17: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/17.jpg)
Empirical Study 3 -Typical VS U-Topk
![Page 18: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/18.jpg)
Empirical Study
![Page 19: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/19.jpg)
Empirical Study
![Page 20: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/20.jpg)
Q&A
![Page 21: Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]](https://reader035.vdocuments.mx/reader035/viewer/2022070412/56649d555503460f94a3214c/html5/thumbnails/21.jpg)
Reference [1] Charu C. Aggarwal, Philip S. Yu “A Survey
of Uncertain Data Algorithms and Applications”, IEEE Transactions on Knowledge and Data Engineering, 2009
[2] Tingjian Ge, Stan Zdonik, Samuel Madden. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. SIGMOD,
2009