focus : clustering crowdsourced videos by line-of-sight
DESCRIPTION
FOCUS : Clustering Crowdsourced Videos by Line-of-Sight. Puneet Jain , Justin Manweiler , Arup Acharya , and Kirk Beaty. Clustered by shared subject. c hallenges. CAN IMAGE PROCESSING SOLVE THIS PROBLEM?. Camera 1. Camera 2. LOGICAL similarity does not imply VISUAL similarity. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/1.jpg)
FOCUS: Clustering Crowdsourced Videos by Line-of-Sight
Puneet Jain, Justin Manweiler, Arup Acharya, and Kirk Beaty
![Page 2: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/2.jpg)
Clustered by shared
subject
![Page 3: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/3.jpg)
CHALLENGES
![Page 4: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/4.jpg)
CAN IMAGE PROCESSING SOLVE THIS PROBLEM?
![Page 5: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/5.jpg)
5
Camera 2
Camera 4Camera 3
Camera 1
LOGICAL similarity does not imply VISUAL similarity
![Page 6: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/6.jpg)
6
VISUAL similarity does not imply LOGICAL similarity
![Page 7: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/7.jpg)
CAN SMARTPHONE SENSING SOLVE THIS PROBLEM?
![Page 8: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/8.jpg)
Sensors are noisy, hard to distinguish subjects…
Why not triangulate?
![Page 9: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/9.jpg)
GPS-COMPASS Line-of-Sight
![Page 10: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/10.jpg)
INSIGHT
![Page 11: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/11.jpg)
Don’t need to visually identify actual SUBJECT, can use background as PROXY
hard to identify
easy to identify
Simplifying Insight 1
![Page 12: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/12.jpg)
same basic structure persists
Simplifying Insight 2
Don’t need to directly match videos, can compare all to a predefined visual MODEL
![Page 13: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/13.jpg)
Simplifying Insight 3
Light-of-sight (triangulation) is almost enough, just not via sensing (alone)
![Page 14: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/14.jpg)
FOCUSFast Optical Clustering of live User Streams
SensingCloudVision
![Page 15: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/15.jpg)
Hadoop/HDFSFailover, elasticity
Image processingComputer visionVideo Streams
(Android, iOS, etc.)
Clustered Videos
FOCUS Cloud Video Analytics
VideoExtraction
Watching Livehome: 2 away: 1
Users Select & Watch Organized Streams
Change Angle
ChangeFocus
![Page 16: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/16.jpg)
Clustered Videos
FOCUS Cloud Video Analytics
VideoExtraction
Watching Livehome: 2 away: 1
Users Select & Watch Organized Streams
Change Angle
ChangeFocus
pre-defined reference “model”
Hadoop/HDFSFailover, elasticity
Image processingComputer vision
![Page 17: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/17.jpg)
17Model construction technique based onPhoto Tourism: Exploring image collections in 3DSnavely et al., SIGGRAPH 2006
zmulti-view reconstructionzkeypoint
extraction
estimates camera POSE and content in field-of-view
Multi-view Stereo Reconstruction
![Page 18: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/18.jpg)
Visualizing Camera Pose
![Page 19: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/19.jpg)
19
~ 1 second at 90th%
~ 18 seconds at 90th%
zmulti-view reconstructionzkeypoint
extraction zframe-by-framevideo to model
alignmentzsensory inputs
• Given a pre-defined 3D, align incoming video frames to the model
• Also known as camera pose estimation
![Page 20: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/20.jpg)
20
zmulti-view reconstructionzkeypoint
extraction zintegration of sensory inputs
Gyroscope, provides “diff” from vision initial position
0 1 2 3 4 t - 1 t - 2
Filesize ≈ 1/Blur Sampled FrameGyroscope
![Page 21: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/21.jpg)
21
Field-of-view
Using POSE + model POINT CLOUD, FOCUS geometrically identifies the set of model points in background of view
zmulti-view reconstructionzkeypoint
extraction zpairwise model image analysis
![Page 22: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/22.jpg)
1
3
2
Similarity between image 1 & 2 = 18
Similarity betweenimage 1 & 3 = 13
22
Finding the similarity across videos as size of point cloud set intersection
zmulti-view reconstructionzkeypoint
extraction zpairwise model image analysis
![Page 23: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/23.jpg)
Clustering “similar” videos
Similarity Score1
33
22
1 Application of Modularity Maximization
high modularity implies:• high correlation among the
members of a cluster • minor correlation with the
members of other clusters
![Page 24: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/24.jpg)
RESULTS
![Page 25: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/25.jpg)
25
Collegiate Football Stadium• Stadium 33K seats
56K maximum attendance
• Model: 190K points 412 images (2896 x 1944 resolution)
• Android Appon Samsung Galaxy Nexus, S3
• 325 videos captured 15-30 seconds each
![Page 26: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/26.jpg)
26
Line-of-Sight Accuracy (visual)
![Page 27: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/27.jpg)
27
Line-of-Sight Accuracy
GPS/Compass LOS estimation is <260 meters for the same percentage
In >80% of the cases, Line-of-sight estimation is off by < 40 meters
![Page 28: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/28.jpg)
28
FOCUS Performance
75% true positives
Trigger GPS/Compass failover techniques
![Page 29: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/29.jpg)
Natural Questions
• What if 3D model is not available?– Online model generation from first few uploads
• Stadiums look very different on a game day?– Rigid structures in the background persists
• Where it won’t work?– Natural or dynamic environment are hard
![Page 30: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/30.jpg)
Conclusion
• Computer vision and image processing are often computation hungry, restricting real-time deployment
• Mobile Sensing is a powerful metadata, can often reduce computation burden
• Computer vision + Mobile Sensing + Geometry, along with right set of BigData tools, can enable many real-time applications
• FOCUS, displays one such fusion, a ripe area for further research
![Page 31: FOCUS : Clustering Crowdsourced Videos by Line-of-Sight](https://reader036.vdocuments.mx/reader036/viewer/2022062501/5681685f550346895ddea4c7/html5/thumbnails/31.jpg)
Thank You
http://cs.duke.edu/~puneet