![Page 1: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/1.jpg)
Analysis of Uncertain Data:Smoothing of Histograms
Eugene FinkAnkur Sarin
Jaime G. Carbonell
10 20 30
![Page 2: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/2.jpg)
Density estimate problemConvert a set of numeric data points to a smoothed approximation of the underlying probability density.
10 20 30
1112
1921
ExamplePoints
1718
2226
2729
![Page 3: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/3.jpg)
Techniques
•Manual estimates
•Histograms10 20 30
10 20 30
•Curve fitting10 20 30
![Page 4: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/4.jpg)
Generalized histograms
10 20 30
0.2 chance: [11 .. 12]0.5 chance: [17 .. 22]0.3 chance: [26 .. 29]
General formprob1: [min1 .. max1]prob2: [min2 .. max2]
…probn: [minn .. maxn]
• Intervals do not overlap• Probabilities sum to 1.0
![Page 5: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/5.jpg)
Special cases
•Standard histogram
•Set of points
•Weighted points
![Page 6: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/6.jpg)
Smoothing problem
Given a generalized histogram, construct its coarser approximation.
10 20 30
10 20 30
10 20 30
![Page 7: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/7.jpg)
Input
•Initial distribution:A point set or a fine-grained histogram
•Distance function:A measure of similarity between distributions
• Target size:The number of intervals in an approximation
![Page 8: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/8.jpg)
Standard distance measures•Simple difference:∫ | p(x) − q(x) | dx
•Kullback-Leibler:∫ p(x) · log (p(x) / q(x)) dx
•Jensen-Shannon:(Kullback-Leibler (p, (p+q)/2) + Kullback-Leibler (q, (p+q)/2)) / 2
![Page 9: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/9.jpg)
Smoothing algorithmRepeat: Merge two adjacent intervalsUntil the histogram has the right size
10 20 30
![Page 10: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/10.jpg)
Interval merging
min1 min2max1 max2
prob1
prob2
min1 max2
prob1 + prob2
•For each potential merge,calculate the distance
•Perform the smallest-distance merge
![Page 11: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/11.jpg)
Smoothing examples:Normal distribution
5000 points 200 intervals
50 intervals 10 intervals
![Page 12: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/12.jpg)
Smoothing examples:Geometric distribution
5000 points 200 intervals
10 intervals50 intervals
![Page 13: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/13.jpg)
Running time
•Theoretical:O (n · log n)
•Practical:O (n)
![Page 14: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/14.jpg)
Running time3.4 GHz Pentium, C++ code
(2.5 ± 0.5) · num-pointsmicroseconds
Number of points
Tim
e (m
icro
sec)
102 104 106
102
104
106
![Page 15: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/15.jpg)
Visual smoothing
We convert a piecewise-uniform distribution to a smooth curve by spline fitting.
The user usually prefers a smooth probability density.
10 20 30
![Page 16: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/16.jpg)
Main results
10 20 30
10 20 30
10 20 30
•Density estimation
•Lossy compression ofgeneralized histograms
![Page 17: Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20…](https://reader036.vdocuments.mx/reader036/viewer/2022062911/5a4d1bf97f8b9ab0599ead9f/html5/thumbnails/17.jpg)
Advantages
•Explicit specification of - Distance measure- Compression level
•Effective representationfor automated reasoning