ppt
TRANSCRIPT
![Page 1: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/1.jpg)
Characteristics of Streaming Media Stored on the Web
Mingzhe Li, Mark Claypool, Robert Kinicki and James Nichols
ACM Transactions on Internet Technology (TOIT)
Vol. 5, No. 5, November 2005
![Page 2: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/2.jpg)
Introduction (1 of 2)
• Improvements to Internet enable users to stream from Web browsers– Across national and cultural boundaries
• Web users expect “point and click” to stream
• 2001, RealNetworks says 350,000 hours [1]
• 2002, CAIDA says streaming is significant fraction of traffic– Going to increase with cellular networks
• Concern drives new protocols, routers, etc. to deal with traffic better
![Page 3: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/3.jpg)
Introduction (2 of 2)
• Much work that characterizes streaming applications to better understand
• Unfortunately, little shows what current streams stored on Web look like
• Previous study in 1997 [19]– Looked at every video on the Web– Found Internet could not support streaming– RealPlayer and Media Player not created
• In 1985, papers by Ousterhout et al [21] studied characteristics of files– Fundamental in designing new file system
Need study of streaming media stored on the Web to help research today
![Page 4: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/4.jpg)
Investigation (1 of 2)
• What are the most popular streaming media products?– Previous studies [12] show very different– Earlier, prevalence of MPEG, AVI, QuickTime
made it difficult for new comers
• What is the ratio of streaming audio versus streaming video?– Audio has lower bitrate cap (voice, music) than
video – Can give current bitrate expectations
• Are media durations long-tailed?– Long-tailed can contribute to self-similarity– Self-similar traffic difficult to manage
![Page 5: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/5.jpg)
Investigation (2 of 2)
• What are typical streaming media target bitrates?– Direct impact on network traffic
– Provides insight into frame resolution, frame rates, color depth
• What fraction of streaming codecs being used?– Codecs determine compression efficiency
– Knowledge of codec prevalence suggests how fast improvements incorporated
![Page 6: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/6.jpg)
Focus
• Focus on commercial– Big 3: Media Player, RealPlayer, QuickTime
• Other studies looked at server side or one client– This study broader
• Have been p2p studies, but p2p not streamed (mostly)– Instead downloaded, as is file transfer
• Build specialized crawler, crawl over 17 million URLs from different starting points, and analyze about 30 thousand clips
![Page 7: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/7.jpg)
Teasers
• Volume and relative amount increased since 1997
• Proprietary most prevalent– RealPlayer 1st, Media Player 2nd
• Most clips short, with long-tailed duration
• Encoded at low-resolution, less than current monitors can handle
• Work useful for:– Selecting clip workloads
– Generating streaming models
![Page 8: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/8.jpg)
Outline
• Introduction (done)
• Methodology
• Analysis
• Sampling Issues
• Conclusions
![Page 9: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/9.jpg)
Methodology(Mini-Outline)
• Media Crawler
• Starting Pages
• Measurement
![Page 10: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/10.jpg)
Media Crawler
• Modify Larbin Web crawler
• Recursively traverses URLs– Avoid loops by caching previous
• Identify streaming media based on protocol type– Ex: mms://,
rtsp://
• Also examine
HTTP extensions
![Page 11: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/11.jpg)
Starting Pages
• Wanted international and popular
• International – chose 10 most wired countries– Allow for cross cultural
analysis– If Nielsen gave no additional
info, chose domestic newspaper as starting point
• USA – chose 7 popular themes– Allow for cross-content
analysis
• Feb 13, 2003, crawl 1 million from each– Took 4 to 24 hours, based on
RTT
![Page 12: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/12.jpg)
Measurement of Content Characteristics
• Use specialized tools to access each Media URL– Collect: encoding, bitrate, duration, size, …
– Tools built from SDK, use player core
• RealNetworks:– RealAnalyzer, TestPlay (could not do levels)
• Microsoft Media:– Media Analyzer, Wmprop (could do levels)
• MPlayer– Open source (could not do bitrate)
![Page 13: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/13.jpg)
Outline
• Introduction (done)
• Methodology (done)
• Analysis– Aggregate analysis
– Commercial productsVideo
Audio
– Codec
• Sampling Issues
• Conclusions
![Page 14: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/14.jpg)
Aggregate Analysis (1 of 3)
• Remove unique, giving about 11 million URLs– About 54,000 were streaming
• In 1997, about 25 million URLs– About 22,000 were streaming
• Extrapolating Today, about 15 million total Increase from 0.09% to 0.47%
![Page 15: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/15.jpg)
Aggregate Analysis (2 of 3)
Some “heavy hitters”, more so than typicalWeb servers
![Page 16: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/16.jpg)
Aggregate Analysis (3 of 3)
- Real almost ½ of all streaming content - In 1997, MPEG, AVI, QuickTime were all, butnow only 10% combined- MP3 is most popular non-proprietary format
![Page 17: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/17.jpg)
Outline
• Introduction (done)
• Methodology (done)
• Analysis– Aggregate analysis
– Commercial productsVideo
Audio
– Codec
• Sampling Issues
• Conclusions
![Page 18: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/18.jpg)
Commercial Product Analysis
• Run custom tools on commercial
• Of original 39,000 only about 29,000 valid– 50% “cannot find specified file”
– 25% “cannot connect to server”
– 10% “authorization failure”
• Can be from playlist– But 97% only 1 clip
![Page 19: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/19.jpg)
Live versus Pre-Recorded
- Most pre-recorded- 98% is pre-recorded, 2% live
![Page 20: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/20.jpg)
Percentage of Audio and Video
- More RealAudio than MP3 Audio- Proportionally less WSM is audio- Almost no QuickTime is audio
![Page 21: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/21.jpg)
Duration
- 1997, 90% only 45 seconds or less- Still, today much shorter than T.V. show or movie
![Page 22: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/22.jpg)
Self-Similar Analysis (1 of 2)
Definitive test:Is tail flat?
Looks flat, but that is not good enough [31]
![Page 23: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/23.jpg)
Self-Similar Analysis (2 of 2)
• Measure curve of tail (1/16th of distro, others same)– Curve defined as 3 point estimate, take derivative
• Estimate Pareto (long-tailed) slope α– Used aest tool
• Generate 1000 samples from Pareto with α– Each sample has same number of points as n
– Calculate curvature of sample tail, mean µ
• Calculate difference (d) between µ and original
• Count number out of 1000 differ by d– 495 (video) and 498 (audio), about ½
• Cannot reject null-hypothesis May be long-tailed
![Page 24: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/24.jpg)
Outline
• Introduction (done)
• Methodology (done)
• Analysis– Aggregate analysis
– Commercial productsVideo
Audio
– Codec
• Sampling Issues
• Conclusions
![Page 25: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/25.jpg)
Video Encoded Bitrate
In 1997, 1% stream for modem, 50% for broadband, 20% for T1+- Said, modem could not support streamingNote, today, broadband still not targeted
![Page 26: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/26.jpg)
Streams Encoded Per Clip
Media Scaling will be difficult!Note, earlier study [15] found real at 65%
Audio is onestream
![Page 27: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/27.jpg)
Aspect Ratios
Very uniform, but a few odd-balls30% above or belowTake product for size (next)
![Page 28: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/28.jpg)
Video Resolution
- Most much smaller than typical monitors(1024 x 768 would be 786,432)- Room to grow!
![Page 29: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/29.jpg)
Outline
• Introduction (done)
• Methodology (done)
• Analysis– Aggregate analysis
– Commercial productsVideo
Audio
– Codec
• Sampling Issues
• Conclusions
![Page 30: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/30.jpg)
Audio Encoded Bitrates
- Most for modems, but 10% for broadband- In 1999, 100% found for modems- Will likely increase (MP3 128 kbps), but cap
![Page 31: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/31.jpg)
Video Codecs
v8 buffers differently than v9
- Newest versions, v9, still not deployed much- Useful as snapshot in time
![Page 32: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/32.jpg)
Outline
• Introduction (done)
• Methodology (done)
• Analysis (done)
• Sampling Issues
• Conclusions
![Page 33: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/33.jpg)
Sampling Issues
• In 1997, could analyze all on Web
• Today, impractical– Would take 16 years to crawl and analyze clips
• Is 17 million large “enough” sample?– Is is possible to obtain same results with fewer
starting points?
– Is it possible to obtain same results with fewer than 1 million URLs per starting point?
– How does sampling affect distributions?
– How does choice of starting point affect distribution?
![Page 34: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/34.jpg)
Percentage of Media versus URLs
Took 200k from each, build setOverall, above 400k from each is stable ½ million
![Page 35: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/35.jpg)
Duration of Video for Number of URLs
Can get away with far fewer and have same distribution of durations
![Page 36: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/36.jpg)
Media Type versus Starting Points
9 Starting points sufficient
![Page 37: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/37.jpg)
Duration for Number of Starting Points
![Page 38: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/38.jpg)
Media Type in USA versus International
- International similar- May be because cross-cultural Web
![Page 39: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/39.jpg)
Duration for USA and Non-USA
![Page 40: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/40.jpg)
Summary
• Many researchers worry about volume increase of Video
• Video characteristics made based on old data
• Current data on media stored on Web
• Crawled 17 million URLs, analyzed 30k clips
![Page 41: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/41.jpg)
Conclusions
• Streaming media increased 600% in past 5 years
• Real Media 1st, Microsoft Media 2nd
• Audio and video about equal
• Vast majority pre-recorded (not live)
• Most targets still for modem
• Potential to be large since monitor resolutions much larger than video
![Page 42: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/42.jpg)
Future Work?
![Page 43: ppt](https://reader035.vdocuments.mx/reader035/viewer/2022081401/55a056731a28abbf218b476d/html5/thumbnails/43.jpg)
Future Work
• Correlate to actual data streamed
• Congestion responsiveness
• P2P
• Future study (now ~5 years old!)