mobile video delivery with http

10
IEEE Communications Magazine • April 2011 166 0163-6804/10/$25.00 © 2010 IEEE INTRODUCTION Over the past decade, Internet-based video delivery has become a fixture of modern life. Consumers have embraced the convenience, flexibility, and variety of Internet video, and con- tent providers have acknowledged the earnings potential of this nascent market. The penetra- tion of rich media into the mobile arena has fos- tered a new wave of innovation in mobile networking. Content providers and consumers alike see the third screen as a natural extension of the desktop video and television viewing expe- rience. The wide availability of broadband access in the home and office combined with powerful desktop and laptop computers with ubiquitous support for the browser-based Adobe Flash ® Player has enabled Internet-scale video delivery to the desktop. Challenges still exist, however, in migrating from traditional media outlets like TV and movies, and even from the desktop Internet, to the mobile environment. Limited cellular cov- erage with up to three orders of magnitude lower bandwidth and high latency, underpow- ered handsets with low screen resolutions, and a myriad of proprietary media player technologies continue to inhibit the scalable delivery of high quality video. Content providers and content viewers demand and expect high quality video. Satisfying these demands is necessary for mass adoption, and mass adoption is required to make content delivery cost effective. The proliferation of 3G cellular access and the promise of 4G has enabled a new class of smart phones capable of providing high quality video to users. But this burgeoning market is hindered by the disparity in device capabilities and development environ- ments. Smart phone hardware (e.g., CPU and memory), as well as operating systems, develop- ment tools, and third party libraries, have been evolving rapidly in recent years. Where PCs can be easily configured to support a broad array of delivery methods, protocol support varies widely across the major mobile device platforms, as shown in Table 1. Significant disparity between the desktop and mobile platforms makes it diffi- cult to translate the common desktop solutions to the mobile domain. Device diversity among carriers further contributes to the lack of mobile platform convergence. The lack of convergence makes it difficult to optimize mass distribution of mobile content since custom approaches are often required for each platform. This need for customization has fostered a host of niche solu- tion providers to handle each of the different cases. As each provider in the fragmented ecosystem seeks to differentiate themselves, an inconsistent user experience results. Employing different solution providers for every platform is very inefficient. A convergence of mobile video delivery methods is crucial to consolidation and growth of the mobile content delivery ecosystem. Be it at the device, operating system, or protocol level, consolidation will ease the transition from desktop to mobile, simplify mass distribution, enhance the monetization potential of mobile content, and drive industry growth. This article takes a macroscopic view of the end-to-end mobile content delivery ecosystem. We first discuss the current state of the content delivery ecosystem and the mobile delivery net- work infrastructure. We then discuss advances in application-layer video delivery protocols, specif- ically the renewed interest in HTTP, and com- pare them to traditional approaches. We examine the media preparation involved in sup- porting these delivery protocols, for both video ABSTRACT Expansion in 3G cellular coverage and the emergence of more powerful mobile devices has increased demand for massively scalable mobile video delivery. The rapid adoption of the third screen as a primary screen for video has high- lighted inefficiencies in the mobile delivery ecosystem and scalability issues in the mobile delivery infrastructure. This article provides an overview of the current mobile content delivery ecosystem and discusses the expanding role of HTTP-based mobile video delivery. A new class of HTTP-based mobile delivery protocols seeks to address existing quality and scalability issues by simplifying and standardizing mobile video delivery. This article shows how segment-based delivery has enabled HTTP-based live streaming and dynamic bitrate adaptation while increasing scalability through the use of existing CDN infrastructure. ACCEPTED FROM OPEN CALL Kevin J. Ma, Azuki Systems, Inc. and University of New Hampshire Radim Bartos , University of New Hampshire Swapnil Bhatia, Palo Alto Research Center and University of New Hampshire Raj Nair, Azuki Systems, Inc. Mobile Video Delivery with HTTP

Upload: r

Post on 22-Sep-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

IEEE Communications Magazine • April 2011166 0163-6804/10/$25.00 © 2010 IEEE

INTRODUCTION

Over the past decade, Internet-based videodelivery has become a fixture of modern life.Consumers have embraced the convenience,flexibility, and variety of Internet video, and con-tent providers have acknowledged the earningspotential of this nascent market. The penetra-tion of rich media into the mobile arena has fos-tered a new wave of innovation in mobilenetworking. Content providers and consumersalike see the third screen as a natural extensionof the desktop video and television viewing expe-rience. The wide availability of broadband accessin the home and office combined with powerfuldesktop and laptop computers with ubiquitoussupport for the browser-based Adobe Flash®

Player has enabled Internet-scale video deliveryto the desktop. Challenges still exist, however, inmigrating from traditional media outlets like TVand movies, and even from the desktop Internet,to the mobile environment. Limited cellular cov-erage with up to three orders of magnitudelower bandwidth and high latency, underpow-ered handsets with low screen resolutions, and amyriad of proprietary media player technologiescontinue to inhibit the scalable delivery of highquality video.

Content providers and content viewersdemand and expect high quality video. Satisfyingthese demands is necessary for mass adoption,and mass adoption is required to make contentdelivery cost effective. The proliferation of 3Gcellular access and the promise of 4G hasenabled a new class of smart phones capable ofproviding high quality video to users. But thisburgeoning market is hindered by the disparityin device capabilities and development environ-ments. Smart phone hardware (e.g., CPU andmemory), as well as operating systems, develop-ment tools, and third party libraries, have beenevolving rapidly in recent years. Where PCs canbe easily configured to support a broad array ofdelivery methods, protocol support varies widelyacross the major mobile device platforms, asshown in Table 1. Significant disparity betweenthe desktop and mobile platforms makes it diffi-cult to translate the common desktop solutionsto the mobile domain. Device diversity amongcarriers further contributes to the lack of mobileplatform convergence. The lack of convergencemakes it difficult to optimize mass distributionof mobile content since custom approaches areoften required for each platform. This need forcustomization has fostered a host of niche solu-tion providers to handle each of the differentcases. As each provider in the fragmentedecosystem seeks to differentiate themselves, aninconsistent user experience results. Employingdifferent solution providers for every platform isvery inefficient. A convergence of mobile videodelivery methods is crucial to consolidation andgrowth of the mobile content delivery ecosystem.Be it at the device, operating system, or protocollevel, consolidation will ease the transition fromdesktop to mobile, simplify mass distribution,enhance the monetization potential of mobilecontent, and drive industry growth.

This article takes a macroscopic view of theend-to-end mobile content delivery ecosystem.We first discuss the current state of the contentdelivery ecosystem and the mobile delivery net-work infrastructure. We then discuss advances inapplication-layer video delivery protocols, specif-ically the renewed interest in HTTP, and com-pare them to traditional approaches. Weexamine the media preparation involved in sup-porting these delivery protocols, for both video

ABSTRACT

Expansion in 3G cellular coverage and theemergence of more powerful mobile devices hasincreased demand for massively scalable mobilevideo delivery. The rapid adoption of the thirdscreen as a primary screen for video has high-lighted inefficiencies in the mobile deliveryecosystem and scalability issues in the mobiledelivery infrastructure. This article provides anoverview of the current mobile content deliveryecosystem and discusses the expanding role ofHTTP-based mobile video delivery. A new classof HTTP-based mobile delivery protocols seeksto address existing quality and scalability issuesby simplifying and standardizing mobile videodelivery. This article shows how segment-baseddelivery has enabled HTTP-based live streamingand dynamic bitrate adaptation while increasingscalability through the use of existing CDNinfrastructure.

ACCEPTED FROM OPEN CALL

Kevin J. Ma, Azuki Systems, Inc. and University of New Hampshire

Radim Bartos∨, University of New Hampshire

Swapnil Bhatia, Palo Alto Research Center and University of New Hampshire

Raj Nair, Azuki Systems, Inc.

Mobile Video Delivery with HTTP

MA LAYOUT 3/22/11 11:13 AM Page 166

IEEE Communications Magazine • April 2011 167

on demand (VoD) and live streaming, while tak-ing into account the impact of these schemes onthe ecosystem partners. While there are manyimportant video delivery topics related to data-link, network, and transport protocols, videocodecs and containers, error correction, etc., wefocus our discussion on issues related to contentdelivery ecosystem partner integration. Under-standing the interactions between ecosystempartners is a key component for evaluatingmobile video delivery architectures. Understand-ing the business motivations of ecosystem part-ners allows optimization of video deliveryarchitectures using a combination of businessmetrics and traditional networking metrics. For-mulating these hybrid types of metrics wouldallow us to evaluate video delivery schemes fromboth a qualitative and quantitative perspective.

MOBILE VIDEODELIVERY ECOSYSTEM

The content delivery ecosystem is a complex net-work of partners and service providers. Figure 1shows some of the primary entities, which inter-act to make Internet-based video delivery possi-ble. There are many logistical hurdles involvedin preparing content for mass distribution. Inaddition to producing content, the contentprovider must manage distribution, monetiza-tion, and delivery of the content. Figure 1 depictsthis starting at the top with the content provider.Internet-based distribution partners are shownon the left, advertising partners are shown onthe right, and the video preparation and deliverypath is shown in the center.

Once the content is produced, it is placed inthe content provider’s content management sys-tem (CMS) for distribution to its partners. Thecontent provider then initiates advertising anddistribution discussions with its partners. Adver-tisers and sponsors are signed on to support thecontent, and advertising media is generated bythe ad producers. The advertising media (videos,banners, interstitials, jump pages, etc.) are thenpublished through the advertising CMS to becombined later with the content, by the contentdistributors. At the same time, contractors are

employed for desktop Web development, mobileWeb development, and mobile smart phoneapplication development. These three distribu-tion paths each require special expertise. Thetwo mobile paths are also affected by the frag-mented technology landscape. Different operat-ing system development environments (for AppleiPhone® , Blackberry® , Google Android,Nokia®, Symbian, and Windows Mobile®, etc.),different carriers, and differences in device capa-bilities (e.g., screen resolution, audio/videocodec support) require more specializedresources.

Figure 1 also depicts the interactions of Websites and apps with ad traffickers and analyticstracking services. The ad traffickers determinewhich ads to show, and track which users sawwhich ads and when. Though the sponsored ads(solicited by the content providers and producedspecifically for the content) may be deliveredthrough the third party ad traffickers, they aretypically directly integrated into the site or app.The primary role of the ad traffickers is to deliv-er remnant inventory (ads that are not soldagainst any specific content). The analytics track-ing services track non-ad page views and appusage. Both provide valuable user demographicinformation to the content providers. Advertise-ments provide direct revenue, while the analyticsinformation provides justification to advertisersand valuable market research for future projects.Without this revenue, the ecosystem cannotexist.

MOBILE VIDEO TRANSCODINGUnlike desktop systems which may support a widevariety of audio and video codecs and containers,mobile phones typically support very limited sub-sets of codecs and containers, with differentdevices supporting different subsets. The optimalparameters for each device, on a given network(video frame rate, video bitrate, audio samplingrate, audio bit rate, resolution, etc.) also differ. Assuch, the source media from the content pro-viders need to be transcoded into many differentencodings and formats. Complex compressionschemes coupled with the massive amounts ofdata represented in high definition source videosmake transcoding very CPU intensive.

Table 1. Comparison of smart phone platforms.

AppleiPhone®

Blackberry Google Android Nokia®WindowsMobile®

HTTP Progressive Download Yes Version 4.3+ Yes No Yes

RTSP Streaming No Version 4.3+ Yes Yes Non-native

HTTP Live Streaming Version 3.0+ No Not yet released No No

Windows Media® HTTP Streaming No No No No Yes

Microsoft® Silverlight™ SmoothStreaming

No No No Not yet released Windows Phone 7

RTMP Streaming No Not yet released Not yet released Flash® Lite™ Flash® Lite™

MA LAYOUT 3/22/11 11:13 AM Page 167

IEEE Communications Magazine • April 2011168

Transcoding services are often implementedas “Software as a Service” (SaaS) and deployedin a compute cloud, as depicted in Fig. 2.Because transcoding is a computationally inten-sive operation, it is typically only performedonce, but needs to be performed quickly, to min-imize distribution latency. Compute clouds offerlarge quantities of computational resources andbandwidth on demand. Using compute cloudresources is typically cheaper than maintainingdedicated servers and network connections forlow frequency operations like transcoding. Thisprovides content providers with incentives to useone-time transcoding services to produce staticfiles for HTTP delivery. Though dynamic (on-demand) transcoding solutions offer greater flex-ibility [1], they are currently not a cost-effectivesolution.

MOBILE VIDEO DELIVERY SERVERSIn order to meet the challenges of Internet-scale delivery, content delivery networks(CDNs) are typically employed by content pro-viders. Managing a global network of data cen-ters requires significant resources. CDNs takeadvantage of economies of scale to more effi-ciently deliver content. Once transcoding iscomplete, transcoded files are uploaded to theCDN and replicated to the thousands of edgecaches for delivery to end users. CDNs mayprovide an origin server for uploading contentto, or may pull from a content provider’s originserver and then manage the distribution totheir edge servers. CDNs rely on distributedcache hierarchies for efficient synchronizationof content. Delivery of relatively static content,via the simple and ubiquitous HTTP protocol,provides the most cost-effective solution forCDNs. Stable content requires less synchroniza-tion and less bandwidth, while optimized HTTPservers can handle extremely high load. Special-ty streaming servers typically support fewerconcurrent sessions due to CPU-intensive videoprocessing, require specially trained supportstaff, and may incur licensing fees from the

technology vendor. Strictly HTTP-based solu-tions limit operational expenses and allow theCDNs to pass cost savings on to their customers(i.e., the content providers).

Through DNS proximity and load balancing,CDNs attempt to ensure that end users aredirected to the optimal edge server for highquality delivery. This works well for desktopusers, as Web browsers can directly access theCDN content. The final stage of delivery fromCDN to mobile client, however, must also tra-verse the mobile operator infrastructure. Mobileoperators typically maintain a large private net-work that connects cell towers to the Internetthrough a series of gateways, across the mobilebackhaul. All requests for media must gothrough a carrier gateway before crossing thepublic Internet to the CDN. The gateway obfus-cates the actual mobile device location, invali-dating any proximity-based delivery pathoptimizations. This, combined with the sparsecoverage, over-subscription, and limited aggre-gate bandwidth of carrier networks, make for achallenging environment.

Figure 1 and Fig. 2 show some of the manyparties involved in making modern video deliv-ery possible. Many of the partner categoriesshown in Fig. 1 may consist of multiple actualvendors (e.g., multiple advertisers and ad traf-fickers to maximize sponsorship and remnantrevenues, and multiple mobile app developersand transcoding services, one for each mobileplatform). The logistical cost of employing andmanaging such a large network of partners cur-rently limits the financial viability of mobilevideo delivery, though content providers remaincommitted to delivering video to the thirdscreen. Simplifying cross-platform delivery formobile devices will help content providers paredown the number of partners on which theyrely, lowering operational expenses. The keycomponents of this consolidation are deliveryprotocol and file format. A single delivery pro-tocol will simplify CDN integration. A single fileformat will simplify pre-transcoding require-ments. Combining both with deterministic quali-ty guarantees will help foster organic growth inthe ecosystem. The following sections detail theevolution in mobile video delivery paradigmscurrently underway.

MOBILE VIDEO DELIVERY METHODSEarly mobile video players had two basicoptions for video retrieval: download using theHyperText Transfer Protocol (HTTP) orstreaming via the Real Time Streaming Proto-col (RTSP). Initially HTTP was only used fordownload and play where the entire file had tobe downloaded before playback could com-mence; progressive download, which allowedrendering while downloading, came later. Dueto bandwidth limitations, downloading prior toand playing incurred significant latency. Conse-quently, downloaded videos tended to be short-er and of low quality to reduce file sizes andshorten download times. This also encouragedadoption of RTSP by many mobile devices,since RTSP requires very little data to bebuffered before playback can commence, allow-

Figure 1. Video delivery ecosystem.

Desktopad

traffickers

Desktopweb

browser

Mobileweb

browserMobile app

CDN

ContentCMS

Contentproviders

Mobilead

traffickers

Analyticstracking

Desktopweb

developers

Advertisers/sponsors

Adproducers

AdCMS

Mobileapp

developers

Desktoptranscoding

service

Mobiletranscoding

service

Mobileweb

developers

MA LAYOUT 3/22/11 11:13 AM Page 168

IEEE Communications Magazine • April 2011 169

ing for lower playback latencies. Download andstreaming represent two extremes in videodelivery. Figures 3a and 3b depict the trafficpatterns for these two schemes. Hybrid schemesoffer a compromise between download andstreaming, as depicted in Figs. 3c and 3d.

DOWNLOAD AND PLAYThough numerous protocol options exist fordownloading video files on desktop platforms,the most common protocol used in practice,especially for mobile platforms, is HTTP. Thebasic as-fast-as-possible delivery of HTTP wasdesigned for delivering small amounts of datawith minimal latency. Video files, however, tendto be much larger than HTML pages. HTTP isbuilt upon TCP to ensure data integrity. Forvideo, data integrity ensures that the intendedpicture quality is achieved, given timely deliveryof data. In modern desktop environments, wherebroadband connections are typical, packet loss isrelatively rare and bandwidth is relatively plenti-ful. But, in lossy, congested, high latency, lowbandwidth 2G cellular networks, the relativelyhigh number of high latency retransmissionsrequired to support data integrity can disruptplayback continuity. The greedy delivery trafficpattern may also cause undue congestion due topremature delivery of data. In cases where theuser access pattern involves partial viewing,bandwidth may be wasted if not all of the down-loaded data is rendered [2]. Nonetheless, theprimary advantage of HTTP is its ubiquity. Theomnipresence of Web browsers in both desktopand mobile devices, the wide acceptance ofHTTP for firewall traversal, the interchangeabili-ty of stateless HTTP servers, and the existingCDN data hosting and delivery infrastructures,make HTTP the de facto choice for most typesof data delivery, including video.

STREAMING

Unlike download, streaming relies on just-in-time data delivery with just-in-time rendering.Though many proprietary streaming protocolsexist, the most common standardized protocolis RTSP. RTSP is a control protocol that usesthe Real-time Transport Protocol (RTP) todeliver individual video frames over unreliableUDP transport. Just-in-time delivery uses lessinstantaneous bandwidth than as-fast-as-possi-ble delivery and typically requires less clientbuffer space to be reserved. Spreading outbandwidth usage over time can help preventcongestion, assuming the delivery rate doesnot exceed the avai lable bandwidth. Thispaced delivery can also prevent unnecessarybandwidth usage when user access patternsinclude random seeks or incomplete viewing.The real-time nature of streaming also makesit suitable for delivering live video. Thoughstreaming is more bandwidth eff ic ient ,because frames arrive at the last possiblemoment, there is no time for retransmits. Assuch, there is no advantage to using a reliabletransport like TCP. UDP provides “graceful”degradation of picture quality with minimalplayback stoppages; as individual frames arelost, pixelation or rendering distortions will benoticeable to the user. Frame-based deliveryallows for intelligent dropping of frames (e.g.,non-key frames, or frames from low motionscenes) [3], but this requires an intelligentnetwork that knows which frames to drop.This level of intel l igence is not general lyfound in the Internet today. Frame-baseddelivery combined with feedback from theReal-time Transport Control Protocol (RTCP)can also be used to implement dynamic videobitrate adaptation [4].

Figure 2. Mobile video delivery network.

Gateway

Mobilebrowser

Desktopbrowser

Carrierbackhaul

Mobileoperator

Origin(storage)

server

Mobile adtraffickers

Mobile webservice

Transcodingservice

AdvertisersInternet

CDN

Analyticstracking

ContentCMS

AdCMS

Computecloud

Basestation

Edgeserver

Mobileapp Ad

producers

Contentproviders

Source media ingestion and transcodingCDN upload of transcoded videoEnd user video retrieval

The omnipresence of

Web browsers in

both desktop and

mobile devices, the

wide acceptance of

HTTP for firewall

traversal, the inter-

changeability of

stateless HTTP

servers, and the

existing CDN data

hosting and delivery

infrastructures, make

HTTP the de facto

choice for most

types of data deliv-

ery, including video.

MA LAYOUT 3/22/11 11:13 AM Page 169

HYBRID SCHEMES

Over the past couple of years, as handset capabili-ties and mobile network infrastructure havesteadily advanced, a new class of hybrid deliveryschemes has emerged. Resource scarcity with firstgeneration smart phones and second generationcellular networks limited the effectiveness ofHTTP delivery optimizations, but recent advanceshave made optimized HTTP more viable. Hybridschemes seek to strike a balance between the twoextremes depicted by Figs. 3a and 3b. The formersends the entire video as fast as possible, whilethe latter sends the smallest possible increment ofvideo as slow as possible. Figures 3c and 3d showtwo alternatives for retrieving video in burstedsegments. These schemes combine the video qual-ity guarantees of TCP-based HTTP, with thebandwidth management and dynamic videobitrate adaptation capabilities of RTSP/RTP.

Figure 3c depicts the traffic pattern for aclient-side pacing scheme, while Fig. 3d depictsthe traffic pattern for a server-side pacing scheme.Client-side pacing over HTTP is typically imple-mented using Range requests or by requestingsegmented files (which have been pre-chopped),to retrieve smaller amounts of data over time.Server-side pacing schemes typically rely onextracting metadata (i.e., the playout rate) fromthe video file and simply pacing the output.Many HTTP-based schemes also employ burst-ing data initially to preload the client-side bufferto decrease playback latency [5]. Both client andserver pacing schemes typically base their pacingrates on the aggregate constant video bitrate.Schemes will often employ delivery rates thatoverestimate the video bitrate to prevent playerunderruns [2]. Most schemes also estimate avail-able bandwidth and latency either throughexplicit checks, or implicitly through TCP ACKs.Though less network efficient than pure stream-ing, paced progressive download enables videoquality guarantees, through reliable TCP deliv-ery, not available in traditional streamingschemes.

The differences between the four schemesshown in Fig. 3 are summarized in Table 2. Theprimary advantages of the two hybrid schemesare:• Pacing provides more efficient bandwidth

usage compared to straight download, espe-cially for long-form content.

• Bursting segments containing many framesincreases the timing margin for retransmit-ting a lost packet within the segment, com-pared to RTSP/RTP frame-based delivery.

• Defined segment boundaries provide conve-nient transition points at which to changethe video bitrate, when performing dynamicbitrate adaptation.The hybrid schemes rely on HTTP for data

delivery which make them ideal for use withCDNs, whose infrastructure is already optimizedfor distributing mass quantities of data viaHTTP. Though CDNs do support RTSP andother streaming protocols, the overhead of main-taining separate, more specialized servers makessupporting those protocols expensive and lessdesirable. Beyond the significant processingoverhead, RTSP, in particular, also requires theuse of four UDP channels (two RTP connec-tions, one for audio, one for video, along withtheir corresponding RTCP connections) whichfurther limits server scalability and complicatesnetwork design. For many networks, dynamicprovisioning for a large range of UDP ports isundesirable as it typically requires real-time fire-wall “fixups” which tax the firewall and in manycases violates security policies. Figure 4 diagramsthe network connections necessary forRTSP/RTP/RTCP-based streaming.

With RTSP/RTP, there is also the issue ofgracefully degraded quality, due to randompacket loss (e.g., network error or discard-basedtraffic shaping). Graceful degradation is non-deterministic and undesirable to content pro-viders. Many interesting schemes have beenshown to improve quality and predictability bylimiting key frame loss [6], recovering from key

IEEE Communications Magazine • April 2011170

Figure 3. Video delivery methods: a) HTTP straight download; b) simplified RTSP/RTP streaming (Fig. 4;c) HTTP paced range/segment download; d) HTTP paced download.

(a) (b) (c) (d)

Client Server Client Server Client Server Client ServerAs handset capabili-

ties and mobile net-

work infrastructure

have steadily

advanced, a new

class of hybrid deliv-

ery schemes has

emerged. These

schemes combine

the video quality

guarantees of

TCP-based HTTP,

with the bandwidth

management and

dynamic video

bitrate adaptation

capabilities of

RTSP/RTP.

MA LAYOUT 3/22/11 11:13 AM Page 170

IEEE Communications Magazine • April 2011 171

frame loss [7], or proactively dropping non-keyframes [3], but the non-deterministic nature ofloss remains unchanged. With the hybridschemes, TCP-based transport guarantees framedelivery, though not necessarily on-time delivery.Late delivery may result in playback stoppage,but stoppage is deterministic in terms of render-ing (unlike pixelation due to frame loss).

DETERMINISTICBITRATE ADAPTATION

Bitrate adaptation can take many forms. Dynam-ic transcoding may be used to provide continu-ously variable bitrate encodings, but dynamictranscoding is computationally and financiallyexpensive. In most cases, small variations inbitrate are not discernable due to limitations inhuman perception, as well as limitations in com-pression techniques. As such, the benefits ofdynamic transcoding are often not worth thecost. Deterministic bitrate adaptation relies onusing a discrete number of pre-determinedbitrates. Content providers may choose andapprove bitrates based on previewed content.With guaranteed delivery, they can be confidentthat the quality of their delivered product willmatch that of the previewed content. Selectionof which bitrate to use is typically based onbandwidth estimations, either implicitly by theserver [8] or through direct client feedback [4,9]. Deterministic bitrate adaptation minimizescongestion and reduces the possibility of latedelivery. Most schemes require the availability ofmultiple pre-transcoded files. The following sec-tion compares some of the different video fileorganizations used to support deterministicbitrate adaptation.

MOBILE VIDEO FILE FORMATTINGStraight HTTP download was originally used incases where the video was retrieved in its entire-ty before playback would begin. The simplicity ofthis scheme was that it required no file pre-pro-cessing other than transcoding to the correct for-mats and codecs for the target devices. ForRTSP and some of the hybrid schemes, addition-al pre-processing is required, to support pacingand bitrate adaptation. For RTSP, videos musttypically be “hinted” to aid in decoding theaudio and video tracks in real-time, to support

RTSP frame-based delivery. Some of the hybridschemes rely on file segmentation. While filesegmentation has been proposed for cache opti-mization [10, 11], in this context, file segmenta-tion is used to support deterministic bitrateadaptation and near-live streaming.

Figures 5b–e show some of the common filetranscoding options for supporting bitrate adap-tation, given the source file shown in Fig. 5a. Inall cases a small number of bitrates is used (e.g.,three in the case of Fig. 5).

Figure 5b shows the simplest form oftranscoding, where the source file is transcoded,in its entirety, into the lower bitrates. Given con-sistent bandwidth, these files could be used byany of the delivery protocols by selecting thehighest bitrate that does not exceed the availablebandwidth. This is the most common solutiontoday. However, most packet switched networksare bursty and jittery and lack bandwidth consis-tency. This makes dynamic bitrate adaptationdesirable. The files shown in Fig. 5b are not con-ducive to dynamic bitrate adaptation, given theoverhead of retrieving the new bitrate file.

For HTTP (assuming HTTP Range requestsare supported), switching bitrates requires theclient to issue a request to retrieve the headersfrom the new bitrate file, then determine thebyte offset for the frame corresponding to wherethe player left off, then issue another request tostart retrieving data from that offset. For highlatency cellular networks, these two additionalround trip times (RTT) can cause playback dis-ruptions. RTSP is better suited to take advan-tage of these files, but typically requires theadditional overhead of RTCP feedback channelsto determine when to switch [4].

One artifact of the streaming paradigm is thatthe server will continue to send data at the cur-rent bitrate until the client terminates the ses-sion or requests a rate switch. If the rate is toohigh, graceful degradation allows the network todrop packets as necessary. However, during peri-ods of network congestion, a constant stream ofRTP packets may actually prevent recovery.RTP packets may continue to flood the networkat too high a bitrate, prolonging congestion. Thismay inhibit delivery of RTCP packets and fur-ther delay rate adaptation. With TCP-basedapproaches like HTTP, the built-in flow controlmechanisms could help ease congestion sooner.TCP, though, is less well suited for real-time

Table 2. Comparison of video delivery protocols.

Straight HTTP RTSP/RTP Client/Server Paced HTTP

Transport Protocol TCP TCP/UDP TCP

Ports Required 1 1/4 1

Quality Assurance (TCP Retransmits) Yes No Yes

Graceful Degradation (UDP Drops) No Yes No

Bandwidth Management (Data Pacing) No Yes Yes

Dynamic Bitrate Adaptation No Yes Yes

In most cases, small

variations in bitrate

are not discernable

due to limitations in

human perception,

as well as limitations

in compression

techniques. As such,

the benefits of

dynamic transcoding

are often not worth

the cost.

MA LAYOUT 3/22/11 11:13 AM Page 171

data delivery, e.g., live streaming. The lack ofgraceful degradation with progressive downloadrequires more buffering, which introduces laten-cy into the live streaming system, making it onlynear-live.

To mitigate the need for multiple requestswhen using HTTP to switch between the filesshown in Fig. 5b, some hybrid schemes use seg-mented files, as shown in Fig. 5c. The source filein Fig. 5a is still transcoded into the same num-ber of bitrates, but for each bitrate, a series ofindependently playable segments is created,rather than a single file, and the segments areplayed in succession to render the entire video.By creating segments, the header information isphysically distributed, rather than being stored atthe beginning of a monolithic file. Bitrate adap-tation with segments may occur on segmentboundaries without the need for separate headerand data requests. Changing the segment dura-tion allows for more or less dynamic bitrateadaptation. Though RTSP or dynamic transcod-ing may allow bitrate adaptation to occur on ashorter time scale, given the high RTT of mostcellular networks, it is impractical to expect sub-second bitrate adaptation. Use of a reasonablesegment duration can produce near-optimalbitrate adaptation in real world environments.

Figure 5d represents a cumulative layeredencoding. The layers use a base encoding, and oneor many layers that may be used to increase thequality of the base encoding [12]. Each layer isonly useful given the base encoding and all previ-ous layers. Layers can be sent independently,though it requires that the rendering engine issueparallel requests for each layer, as well as under-stand how to decode the data [13]. It has alsobeen proposed that layers be sent from multiplesenders [9], though most CDNs and clients have

not yet reached that level of sophistication. Forclient-side bitrate adaptation, the client mustalways know where each layer resides and retrievethem in the proper combinations to reproduce thedesired rate. Loss of a single layer can negate thevalue of all the higher layers; this makes cumula-tive encodings more complex to manage than non-cumulative encodings. Layered encodings can alsobe used in server-side optimizations (e.g., as partof an RTSP server) where each frame is construct-ed from as many layers as are appropriate for spe-cific client bandwidth constraints.

Figure 5e represents an alternative to layeredencodings: multiple description coding (MDC).MDC encoding can be either cumulative or non-cumulative [12], where non-cumulative encod-ings may be decoded independently. Whendistributed across multiple servers, MDC pro-vides both data and path redundancy as well asallowing for bitrate adaptation, though the useof multiple descriptions incurs network header,video container, and data redundancy overhead.MDC schemes employ varying levels of redun-dancy between descriptions in order to providehigher or lower error resiliency [14]. There is atrade-off between limiting the distortion causedby the loss of a description and the increaseddata transmission overhead of the data redun-dancy. Higher data redundancy is also oftenrequired to produce non-cumulative encodings.The independently decodable nature of non-cumulative encodings improves resiliency andsimplifies description selection. For client-sidebitrate adaptation, the client may choose anydescriptions it wishes from any of the servers.This flexibility reduces the data retrieval com-plexity compared to cumulative encodings, but itis still more complex than retrieving a single,pre-transcoded bitrate file.

IEEE Communications Magazine • April 2011172

Figure 4. RTSP/RTP connection setup.

RTSP RTP RTCP RTP RTCP

AudioVideo

Client ServerOptions

Describe

SetupVideoSetupAudio

Play

Teardown

Client Server Client Server Client Server Client Server

MA LAYOUT 3/22/11 11:13 AM Page 172

IEEE Communications Magazine • April 2011 173

MOBILE LIVE VIDEO STREAMING

Most file pre-processing schemes work well forvideo on demand (VoD) but are less suitable forlive streaming situations where the video is notavailable a priori for pretranscoding or stitching.Transcoding may be performed on site, by thelive broadcaster. For desktop feeds, a singlebitrate and format is typically sufficient, so thereis no additional burden on the broadcaster.However, given the multitude of specialized for-mats required for mobile, deploying transcodersfor each format and bitrate combination isimpractical. Remote broadcast sites often lacksufficient bandwidth to upload multiple formats,and deploying multiple transcoders increaseslogistical complexity and operational cost.

Three commonly supported mobile livestreaming protocols include: RTSP, WindowsMedia® HTTP Streaming (MS-WMSP), andHTTP Live Streaming [15]. Microsoft® Sil-verlight™ has not yet arrived for mobile, andAdobe Flash Lite continues to see limited mar-ket penetration. Windows Media®, Silverlight™HTTP Streaming and Microsoft® SmoothStreaming, used by Windows Mobile® devices,and HTTP Live Streaming [15], used by AppleiPhone® devices, all rely on HTTP for datadelivery. Adobe Flash® typically uses the RealTime Messaging Protocol (RTMP), which untilrecently, was an unpublished, proprietary proto-col. Table 1 compares some of the current dif-ferences in the major smart phone platforms.

RTSP/RTP, with its frame-based processing,

is ideal for processing live streams with minimallatency, as only the minimum amount of data (asingle frame) needs to be processed before trans-mission can be continued. Segmentationschemes, though they incur a latency penalty forgenerating, transcoding, and uploading the initialsegments, offer a near-live alternative to RTSP.Furthermore, segmentation schemes inherentlyrecord the live stream, which is not the case withRTSP. Since most live streams are recorded forsubsequent VoD access, segmentation schemessimplify the recording and transcoding processfor companion VoD.

Figure 6 depicts an example of HTTP LiveStreaming [15] with dynamic bitrate adaptation.The HTTP Live Streaming scheme relies on ahierarchy of m3u8 playlist files. The m3u8 for-mat is an extension to the m3u format used formp3 audio playlists. The top level playlist con-tains static pointers to separate playlists for theindividual bitrates. Each of the bitrate playlistscontains a rolling list of pointers to segments. Asegmenter is responsible for recording from thelive stream and transcoding segments into thedifferent target bitrates. Once new segments areavailable, the bitrate playlists are updated,adding the new segment and removing the oldestsegment. The new segments and the updatedplaylist are pushed to the CDN for delivery, atregular intervals.

The client in Fig. 6 is passed a link to themaster playlist, from which it obtains a list ofavailable bitrates. Once a bitrate is selected, theclient begins polling the playlist file correspond-

Figure 5. Methods for preparing a video file: a) source file; b) independent bitrate encodings; c) segmentedbitrate encodings; d) multi-layer encoding; and e) multiple description codings.

Hi segment 1

Header Original high definition source data(a)

Header Medium bitrate transcoded data

Header Low bitrate transcoded data

H Hi segment 2H Hi segment 3H

Med seg 1H(c)

High layer encoding data

Medium layer encoding data

Base low bitrate encoding data

Header(d)

Med seg 2H Med seg 3H

Hi segment 4H

Med seg 4H

Hi segment 5H

Med seg 5H

Lo seg 1H Lo seg 2H Lo seg 3H Lo seg 4H Lo seg 5H

Header High bitrate transcoded data

(b)

Header Description 1 transcoded data

Header Description 2 transcoded data(e)

Header Description 3 transcoded data

Though RTSP or

dynamic transcoding

may allow bitrate

adaptation to occur

on a shorter time

scale, given the high

RTT of most cellular

networks, it is

impractical to expect

sub-second bitrate

adaptation. Use of a

reasonable segment

duration can pro-

duce near-optimal

bitrate adaptation in

real world

environments.

MA LAYOUT 3/22/11 11:13 AM Page 173

IEEE Communications Magazine • April 2011174

ing to the selected bitrate. After the initial readof the playlist, subsequent reads would ideallyoccur at a regular interval equal to the durationof the segments. For uninterrupted playback, theasynchronous upload of segments and playlists,polling for playlist updates, and download andrendering of segments would need to align prop-erly. The HTTP Live Streaming specificationprovides guidelines for polling and polling retrydelays, where the delay should be equal to thesegment duration if the playlist has changed, orhalf the segment duration for the first retry ifthe playlist has not changed (with subsequentretries backing off to 1.5 and 3.0 times the seg-ment duration) [15].

Alternate polling schemes could be used tominimize the playlist update latency, but higherpolling rates may incur greater network over-head. In general, the polling interval must beshorter than the duration of the segments, butlonger than the end-to-end latency betweenclient and server. If the polling interval is longerthan the segment duration, then segments wouldfinish playing before new segments could beretrieved. If the polling interval is shorter thanthe RTT, then the minimum effective pollinginterval is still just the RTT. The same playlistand segmentation scheme can be used to providebitrate adaptation for VoD. In the VoD case,playlists contain a complete list of segments andmay only be downloaded once.

When the client is initialized, it has two options:it can download all of the segments in the playlistor it can start by downloading the last (newest)segment in the playlist. The former option providesmore data for protection against network error,but it incurs the largest viewing delay. The latteroption minimizes viewing delay, providing a view-ing experience as close to live as possible. Once thestream is playing, the client can monitor bandwidthand if it determines that a bitrate change is desired,it can begin polling a different playlist. The nextsegment retrieved will be for the new bitrate.

The polled playlist offers the flexibility tospecify segments from different locations forredundancy, stream switching, or ad insertion.For live streams, stream switching (e.g., changingcamera angles) or ad insertion is typically han-dled by the broadcaster. Ignoring the possibleneed for redundant servers, segmentationschemes can be implemented without playlists byusing a well known file naming convention (e.g.,using the segment number). Early versions ofSilverlight™, as well as other commercial imple-mentations, have taken this approach. Newerversions of Silverlight™ pack segments into asingle file, and clients make requests providingindices which the server uses to extract the cor-rect segment. This requires a specialized serverto handle requests, which is expensive for CDNsand their customers. Another alternative is toprovide an index file, where the index file mapssegment boundaries to byte offsets, allowing theuse of standard HTTP range requests. There aremany options for implementing live streamingwith HTTP, and we see significant commercialinterest in exploring these options to find anoptimal HTTP-based streaming solution.

THE ROAD AHEADMobile rich media delivery is currently poisedfor a massive expansion, but lacks the platformuniformity and protocol convergence necessaryfor supporting a consistently high quality userexperience. Disparities among handsets, handsetmanufacturers, operating system vendors, andmobile carriers all contribute to the ecosystem’sinability to achieve mass adoption. End users,developers, and mobile operators would all ben-efit from ecosystem consolidation and commonunderstanding of a converged video distributionsolution. The current movement toward HTTP-based video streaming is one step on the roadahead.

Content providers want the highest qualitydelivery for their videos to protect their brandintegrity. Streaming protocols based on unreli-able protocols, like RTP over UDP, cannot bythemselves provide deterministic quality guaran-tees. In addition, most carriers prefer not toopen their networks to the dynamic UDP portallocation required by streaming protocols likeRTP. The use of HTTP as a delivery protocoladdresses both of these issues. The reliable TCP-based delivery, in conjunction with deterministicrate adaptation, provides deterministic qualityguarantees, while the ubiquity of HTTP allows itto easily traverse firewalls and take advantage ofoptimized CDN caching infrastructures. ThoughHTTP was not designed for real-time applica-tions and may not be optimal for video delivery,it provides the desired functionality using abroadly supported protocol. New protocols orenhancements to existing protocols, such asRTP, may eventually provide quality guaranteesand ubiquity to surpass HTTP, but in the nearterm HTTP offers a pragmatic alternative. Like-wise, client-side hardware support and server-side distribution support for layered and MDCencodings may eventually reduce complexity andmake them more commercially viable solutions,but for now pre-transcoded single bitrate files

Figure 6. HTTP live streaming with dynamic bitrate adaptation setup.

Client

Poll Download

HTTP livestream

High bitrate m3u8 playlistHi segment N URL Hi seg N

Hi segment N+1 URL

Hi segment N+2 URL...

Hi seg N+1

Hi seg N+2

master m3u8 playlistHi bitrate playlist URL

Med bitrate playlist URL

Lo bitrate playlist URL

Medium bitrate m3u8 playlistMed segment N URL Med seg N

Med segment N+1 URL

Med segment N+2 URL

Med seg N+1

Med seg N+2

Low bitrate m3u8 playlistLo segment N URL Lo seg N

Lo segment N+1 URL

Lo segment N+2 URL

Lo seg N+1

Lo seg N+2

Segmenter

...

...

...

...

...

MA LAYOUT 3/22/11 11:13 AM Page 174

IEEE Communications Magazine • April 2011 175

provide natively supported solutions.HTTP has proven its versatility as a reliable

data delivery protocol, exemplified by its use inCDNs. The unmatched scalability of CDNs havefundamentally changed media delivery in theInternet by bringing economies of scale to indi-vidual content providers. By pushing content tothe network edge through well provisionedcaching hierarchies, CDNs have increased mediadelivery scalability and improved the user experi-ence. However, mobile carrier infrastructure stillposes an obstacle to optimal HTTP delivery. Theclosed nature of carrier networks restricts theecosystem’s ability to fully optimize mobile videodelivery. Extending the reach of the CDN intothe mobile operator domain is a logical next stepon the road ahead. These wireless CDNs(wCDN) constitute the final missing partner inthe mobile content delivery ecosystem. We arejust beginning to see the emergence of HTTP-based mobile video delivery protocols, but this isa pivotal step in the evolution of the mobile con-tent ecosystem.

REFERENCES[1] B. Shen, W. Tan, and F. Huve, “Dynamic Video Transcod-

ing in Mobile Environments,“ IEEE Multimedia Mag.,vol. 15, no. 1, Jan. 2008, pp. 45–51.

[2] L. Guo et al., “Analysis of Multimedia Workloads withImplications for Internet Streaming,“ Proc. 14th ACMInt’l. Conf. World Wide Web (WWW 2005), May 2005,pp. 519–28.

[3] Y. Li et al., “Content-Aware Playout and Packet Schedul-ing For Video Streaming Over Wireless Links,“ IEEETrans. Multimedia, vol. 10, no. 5, Aug. 2008, pp.885–95.

[4] P. Fröjdh et al., “Adaptive Streaming Within the 3GPPPacket-Switched Streaming Service,“ IEEE NetworkMag., vol. 20, no. 2, Mar. 2006, pp. 34–40.

[5] L. Guo et al., “Delving into Internet Streaming MediaDelivery: A Quality and Resource Utilization Perspec-tive,“ Proc. 6th ACM SIGCOMM Conf. Internet Mea-surement (IMC 2006), Oct. 2006, pp. 217–30.

[6] Y. J. Liang, J. Apostolopoulos, and B. Girod, “Analysisof Packet Loss for Compressed Video: Effect of BurstLosses and Correlation Between Error Frames,“ IEEETrans. Circuits and Systems for Video Tech., vol. 18, no.7, July 2008, pp. 861–74.

[7] Y. J. Liang and B. Gerod, “Network-Adaptive Low-Laten-cy Video Communication Over Best-Effort Networks,“IEEE Trans. Circuits and Systems for Video Tech., vol.16, no. 1, Jan. 2006, pp. 72–81.

[8] B. Wang et al., “Multipath Live Streaming via TCP:Scheme, Performance and Benefits,“ ACM Trans. Multi-media Computing Commun. and Applications, vol. 5,no. 3, Aug. 2009, article 25.

[9] J. Chakareski and P. Frossard, “Distributed Collaborationfor Enhanced Sender-Driven Video Streaming,“ IEEETrans. Multimedia, vol. 10, no. 5, Aug. 2008, pp.858–70.

[10] S. Chen et al., “Sproxy: A Caching Infrastructure toSupport Internet Streaming,“ IEEE Trans. Multimedia,vol. 9, no. 5, Aug. 2007, pp. 1064–74.

[11] X. Li, W. Tu, and E. Steinback, “Dynamic SegmentBased Proxy Caching for Video on Demand,“ Proc.2008 IEEE Int’l. Conf. Multimedia & Expo (ICME 2008),June 2008, pp. 1181–84.

[12] T. Kim and M. H. Ammar, “A Comparison of Heteroge-neous Video Multicast Schemes: Layered Encoding orStream Replication,“ IEEE Trans. Multimedia, vol. 7, no.6, Dec. 2005, pp. 1123–30.

[13] S. Chattopadhyay, L. Pamaswamy, and S. M. Bhan-darkar, “A Framework for Encoding and Caching ofVideo for Quality Adaptive Progressive Download,“Proc. 15th ACM Int’l. Conf. Multimedia, 2007, pp.775–78.

[14] Y. Wang, A. R. Reibman, and S. Lin, “Multiple Descrip-tion Coding for Video Delivery,“ Proc. IEEE, vol. 93, no.1, Jan. 2005, pp. 57–70.

[15] R. Pantos, “HTTP Live Streaming,“ Internet EngineeringTask Force (IETF), Internet-Draft Version 5 (draft-pantos-http-live-streaming-05), Nov. 2010.

BIOGRAPHIESKEVIN J. MA ([email protected]) received his B.S.in computer science from the University of Illinois atUrbana-Champaign in 1998, his M.S. in computer sciencefrom the University of New Hampshire in 2004, and is cur-rently a Ph.D. candidate at the University of New Hamp-shire. He is a founding engineer at Azuki Systems where heis responsible for software system architecture. Prior toAzuki, he was a software engineer at Cisco Systems, com-ing to Cisco via the Arrowpoint Communications acquisi-tion. He holds three patents in the area of contentnetworking with numerous patents pending includingmany in the area of mobile content delivery.

RADIM BARTOS ([email protected]) received his Ph.D.degree in Mathematics and Computer Science from theUniversity of Denver in 1997. He is currently an AssociateProfessor in the Department of Computer Science at theUniversity of New Hampshire. Besides mobile media deliv-ery, his research interests include mobile sensor networks,underwater networking and access networks.

SWAPNIL BHATIA ([email protected]) is a postdoctoralresearcher in the Bioinformatics group at the Palo AltoResearch Center (formerly Xerox PARC). His research inter-ests include modeling and analysis of networks and sys-tems, theoretical work in Computer Science, and lately,proteomics. He received a B. Engg. degree in ComputerEngineering from Bombay University's V.E.S. Institute ofTechnology and a Ph.D. degree in Computer Science fromthe University of New Hampshire.

RAJ NAIR ([email protected]) received his B.Tech inMechanical Engineering from the Indian Institute of Tech-nology, Madras in 1983, his M.S. in Mechanical Engineer-ing in 1986 and Computer Science in 1988 from IowaState University, and his Ph.D. in computer science fromthe University of Maryland in 1991. He is the CTO and afounder of Azuki Systems responsible for product architec-ture. Prior to Azuki, he was at Cisco Systems through theArrowpoint Communications acquisition. As a foundingengineer at Arrowpoint he pioneered the concept of layer4-7 content switching. Previously he was also a principalengineer at Fujitsu Network Communications via the Nex-ion acquisition and also worked at Netrix Corporation priorto that. He holds eight patents in the areas of contentswitching and ATM flow control with multip le otherpatents pending.

These wireless CDNs

(wCDN) constitute

the final missing

partner in the mobile

content delivery

ecosystem. We are

just beginning to see

the emergence of

HTTP-based mobile

video delivery proto-

cols, but this is a piv-

otal step in the

evolution of the

mobile content

ecosystem.

MA LAYOUT 3/22/11 11:13 AM Page 175