exploring new technologies for non-destructive adjustment of running times of file based...

14
The authors are solely responsible for the content of this technical presentation. The technical presentation does not necessarily reflect the official position of the Society of Motion Picture and Television Engineers (SMPTE), and its printing and distribution does not constitute an endorsement of views which may be expressed. This technical presentation is subject to a formal peer-review process by the SMPTE Board of Editors, upon completion of the conference. Citation of this work should state that it is a SMPTE meeting paper. EXAMPLE: Author's Last Name, Initials. 2011. Title of Presentation, Meeting name and location.: SMPTE. For information about securing permission to reprint or reproduce a technical presentation, please contact SMPTE at [email protected] or 914-761-1100 (3 Barker Ave., White Plains, NY 10601). © 2015 Society of Motion Picture & Television Engineers® (SMPTE®) SMPTE Meeting Presentation Exploring New Technologies for Non-Destructive Adjustment of Running Times of File Based Content Scott Matics Telestream LLC. Written for presentation at the SMPTE 2015 Annual Technical Conference & Exhibition Abstract. A generation ago, a 60 minute broadcast program slot was filled with a fairly static mix of program material, commercials, and other content. That all changed beginning in the late 1990's, when the number of minutes of non-program material began inching up, and by 2013 had increased to more than 14 minutes of every hour. This change resulted in tens of thousands of previously produced episodes and other material no longer fitting into the program block, and left broadcasters with two not-so-great choices for adjusting the running time of original programming: cut scenes out, or use destructive baseband re-timing options. Today, there are sophisticated algorithms that utilize modern GPU video processing and transcoding techniques that can re-time content with virtually no decrease in video or audio quality. This paper will focus on the history of content re-timing for broadcast, and delve into the legacy and modern technologies for re-timing. Keywords. Video retiming, audio retiming, multimedia retiming, file retiming, running time adjustment, movie retiming, editing for time

Upload: scott-matics

Post on 14-Apr-2017

61 views

Category:

Documents


1 download

TRANSCRIPT

The authors are solely responsible for the content of this technical presentation. The technical presentation does not necessarily reflect the official position of the Society of Motion Picture and Television Engineers (SMPTE), and its printing and distribution does not constitute an endorsement of views which may be expressed. This technical presentation is subject to a formal peer-review process by the SMPTE Board of Editors, upon completion of the conference. Citation of this work should state that it is a SMPTE meeting paper. EXAMPLE: Author's Last Name, Initials. 2011. Title of Presentation, Meeting name and location.: SMPTE. For information about securing permission to reprint or reproduce a technical presentation, please contact SMPTE at [email protected] or 914-761-1100 (3 Barker Ave., White Plains, NY 10601).

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®)

SMPTE Meeting Presentation

Exploring New Technologies for Non-Destructive Adjustment of Running Times of File Based Content

Scott Matics Telestream LLC.

Written for presentation at the SMPTE 2015 Annual Technical Conference & Exhibition

Abstract. A generation ago, a 60 minute broadcast program slot was filled with a fairly static mix of program material, commercials, and other content. That all changed beginning in the late 1990's, when the number of minutes of non-program material began inching up, and by 2013 had increased to more than 14 minutes of every hour.

This change resulted in tens of thousands of previously produced episodes and other material no longer fitting into the program block, and left broadcasters with two not-so-great choices for adjusting the running time of original programming: cut scenes out, or use destructive baseband re-timing options.

Today, there are sophisticated algorithms that utilize modern GPU video processing and transcoding techniques that can re-time content with virtually no decrease in video or audio quality.

This paper will focus on the history of content re-timing for broadcast, and delve into the legacy and modern technologies for re-timing.

Keywords. Video retiming, audio retiming, multimedia retiming, file retiming, running time adjustment, movie retiming, editing for time

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 2

Introduction Ask any director the question “What is the ideal running time for a feature film?” and you will likely get a range of answers.

Since the beginning of feature film creation, the actual definition of a work being a feature film / movie or motion picture was dictated by the running time. Originally, this was around 60 minutes; in fact the first film considered a feature film, 1906’s Australian movie, The Story of the Kelly Gang, ran right at 60 minutes. [1]

This minimum running time represented a “mile of film” as so designated by the National Center of Film Photography in France for a 35mm film that was 1,600 meters (5,200 feet) long, which was exactly 58 minutes and 29 seconds. [2]

Today, the minimum specified by the Screen Actors Guild is 80 minutes. [3] Although there is no standard for dictating the maximum length of a feature film, most modern films use the highly scientific calculation suggested by Alfred Hitchcock: "The length of a film should be directly related to the endurance of the human bladder." [4]

Ask any television executive about the ideal running time for something adapted to the TV screen, and you will probably hear “Any length that fits into our program block, and allows for the appropriate number of commercials.”

The TV program block is where the running time of content really matters. Movie content that ranges from an hour to multiple hours, and archives of made-for-TV content that was produced in a different era (where the program block was arranged much differently) need to fit into prescribed slots.

In the past, broadcasters have been faced with several unpleasant and destructive options for dealing with content that is too long for the prescribed program block: Edit the content by taking out scenes, or use legacy analog methods for shortening content by throwing away frames and re-recording shorter length programs.

Today, new technologies are available for analyzing and merging (or creating) frames, retiming and pitch correcting audio, and retiming captions and subtitles to match; and are making it much easier for broadcasters and editors to “edit to fit” their content in mostly non-destructive manners.

Early Retiming Requirements and Techniques What do the 1950’s, 60’s, 70’s and most of the 80’s have in common? For U.S. broadcast television, it was the fact that an hour of commercial television contained 52 minutes of program and 8 minutes of non-programming content, such as advertisements. [5]

The relatively static sizes of program blocks began to change in the 1980’s with a greater mix of advertising. Today, most hours of commercial programing are comprised of 44 minutes of non-advertising content. [6] (See Figure 1)

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 3

Figure 1. Minutes of non-program content per hour since 1989

A film that has seen many replays on television over the years – The Wizard of Oz - had an original running time of 101 minutes. This was typically placed into a 2 hour program block in the 1960’s and 1970’s. However, with the current program blocks, broadcasting this movie in its entirety would require approximately 2 hours and 15 minutes. [7]

For this reason, The Wizard of Oz was shown for many years 15 minutes shorter than its original running time. [8]

This 15 minute reduction was accomplished using destructive trimming, using a technique often referred to as microcutting, which is a process of selectively cutting camera pans and other non-critical sections, which is a time consuming and arbitrary exercise.

Destructive trimming has been performed on movies like The Wizard of Oz for many years, but the consensus is that this is not an ideal way to effect a duration change. In fact, most recent broadcasts of The Wizard of Oz have been with it shown in its entirety, either with more breaks, or with an odd program block of 2 hours and 15 minutes. [9]

Modern Destructive Techniques for Shortening Running Times Non-linear editing introduced rudimentary potential for adjusting running time in a less destructive manner than deleting segments.

Time Remapping is one technique that was used in the movie The Matrix (1999) to slow down the fight scenes. This form of retiming uses a technique that allows the editor to vary the speed

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 4

of portions of a clip by using keyframes. In The Matrix parts of the fight scenes were slowed down while other parts were sped up to create the desired effect. [10] (See Figure 2)

Figure 2. Slowing down a bullet in a fight scene from The Matrix

Time remapping was used in The Matrix to achieve a certain look. For The Wizard of Oz, in which this technique was used to re-time the film, one can see how a Time Remapping technique could be used to selectively speed up, rather than cut out, those camera pans and non-essential parts of the movie, in order to reach the time goal.

The downside of Time Remapping with keyframe manipulation is that some elements, like sound, is sped up incrementally along with the video, resulting in less than desirable quality.

By 1995, video engineers were working on adaptations of live television delay devices that could serve another purpose: Instead of delaying or speeding up a live show to censor offensive content, the same technology could be used to speed up or slow down an analog stream to subtract or add more time to the duration of the show.

The first complete systems for altering the running time of analog inputs involved playout from a source, such as a tape machine, into a retiming system, and then back out to a recording device, which would record the altered show. These early systems dropped frames where necessary, had a limit of about 30 seconds of time altering, and didn’t support captioning metadata. However, they could do in 60 minutes what previously would take a typical editor 6 hours to accomplish, so for anyone who needed to alter running times of content efficiently, these products were essential purchases. [11]

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 5

Intelligent Non-Destructive Retiming Legacy retiming techniques relied on frame count manipulation for achieving shorter run times, and this most often involved throwing away frames. Since the early analog processing technology had no concept of total file length, it was often necessary to drop more frames in one section of a show, like at the beginning, and less frames toward the end of the show. This resulted in inconsistent quality across the length of the show.

Modern retiming methods also, naturally, produce fewer or more frames of content depending on if the show is shortened or lengthened. The advancement is in the way that frames are subtracted or added.

When adjusting the duration of a program or clip, it is not always possible to find suitable frames to drop or repeat. When this situation occurs, and it occurs often, frame interpolation techniques are used to recreate sequences at effectively a new frame rate which is then played back at the target broadcast frame rate. This new frame rate occupies a shorter or longer period of time than the original source, thus running shorter, or longer as the case may be. (See Figure 3)

Figure 3 – Motion Compensated Frame Interpolation for Reducing Running Time

When we reduce or increase the running time of an original program, we are effectively telling the story in a shorter or longer period of time.

Let’s take the example of a 25 minute clip, which will consist of 44,955 frames at 29.97 fps. If we wish to shorten that clip by 2 minutes, or 8%, then the retimed output will consist of 41,359 frames, which will play out at 23 minutes.

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 6

To achieve that reduction in the number of frames in a non-destructive manner, no useful data can be deleted; rather, the number of frames is adjusted so that the whole story can be played out in the desired duration.

For this adjustment to be successful, and for input motion quality to be retained, Motion Compensated Frame Interpolation is used to avoid frame drops (when decreasing running time) or to avoid repeated frames (when increasing running time). A Motion Compensated process calculates the motion between frames in the content, and determines precisely where to move the objects to decrease or increase the number of frames used to get from point “A” to point “B”. (See Figure 4)

Figure 4. Motion Compensated Interpolation

Intelligent Lengthening of Running Times There are times when it is necessary to lengthen the running time of content. One such use for this is to adjust the length of a clip produced in one market, but broadcast in another, in which the program blocks are different by several minutes. (See Figure 5)

Figure 5. Internationalized Content Retiming

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 7

Early nonlinear editing techniques for lengthening content involved adding in repeated multiple angles or taking multiple shots of simultaneous action and playing them sequentially. This was done in the movie Pulp Fiction where scenes were shown one after the other even though they occurred simultaneously or in a different order. [12]

This is fine for movie editing where the additional shots and presumably audio is available to do the insert. For finished television shows, or longer form content that is sent from the U.S. to Europe, for example, adding minutes to a show to fit into the longer program block is highly desirable.

Lengthening the running time of a show involves some particularly complex motion compensated interpolation techniques. Where shortening running time involves moving objects from point “A” to point “B” more quickly, lengthening requires using the interpolation analysis to inject more data between points “A” and “B” that follow the same path in a natural progression.

If we wish to lengthen our 25-minute clip to 27 minutes – we will go from 44,955 frames at 29.97 fps to 48,551 frames at the same frame rate. Where do these 3,596 additional frames come from? These additional frames come from interpolated data that is generated by a highly-capable algorithm that projects where objects would go if they took a bit longer to get there. (See Figure 6)

Figure 6. Interpolation for Sequence Expansion –

This example shows a 10% increase in playback duration. Let’s look at an example of shortening and lengthening running times in which a large number of objects are in opposing motion.

The picture below (See Figure 7) is a still from a scene in which a camera is filming a street crossing while different people are moving across the square in various directions in the foreground and in the background. [13]

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 8

Figure 7. Example of scene with objects moving in opposing directions

There are several considerations for retiming a scene like this:

• If the goal is to shorten this scene, the retiming algorithm must shorten the time that the man in the foreground takes to reach the other side of the street, but also track the much smaller objects (people in the background) that are also moving in both directions.

• If the goal is to lengthen this scene, the algorithm will need to calculate where it can add frames so that the man in the foreground takes longer to reach the other side of the street. This calculation must take the people in the background into consideration as they are moving at a different relative pace.

• Motion interpolation is not the only technology that is required to alter the running time of a scene like this. Analysis of a group of frames to determine where frames can be reduced or added without introducing artifacts or object merging is an intensive computational process.

• Although scenes like this can be successfully retimed with a minimal of artifacting, in some cases, depending on the amount of time alteration being applied to the total clip, the operator may choose to not retime this particular segment.

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 9

Quality Hurdles In the early days of content retiming, options were limited and since most program material was lower resolution, resultant quality was not the most important consideration.

Today, however, HD and 4K content is commonplace, and viewers have access to much higher-quality broadcast signals. Consequently, anything that touches content today must take quality as a serious consideration.

One way that modern content retiming is able to improve quality is to apply the retiming of the file longitudinally across the length of the file. This was not possible with earlier analog technologies due to the fact that the retiming was being applied to an incoming stream rather than a complete file. Longitudinal retiming applies the retiming in small amounts across the length of the file, rather than affecting some segments more than others. [14]

Interlaced content can be especially troublesome to retime owing to the fact that each interlaced video frame is two fields captured at different points in time, so native retiming of an interlaced file is not viable. For this reason, automated retiming systems must either provide de-interlacing during the retiming process, or files need to be de-interlaced prior to retiming, and then re-interlaced for playback.

Media that has gone through a telecine process, where motion picture film has been transferred to video, can be troublesome to retime without some additional processing due to issues with cadence abnormalities. Modern intelligent retiming relies on frame rate consistencies in order to manipulate playback at faster or slower speeds for a given frame rate. For this reason, the retiming process is applied to allow for input frame rates to match output frame rates. Deviation from that condition would necessitate input cadence detection and possibly an external frame rate conversion, although some retiming processes can natively manage frame rate differences during the processing of the file.

CG and Animated content can be difficult to retime, due to the way that object movement is not continuous, so especially with subtracting frames, object motion can become jittery. Compensating for the discontinuous motion can introduce frame blending, which in turn can introduce undesirable artifacts. For mixed content (some CG or animated), it is often advisable to not retime the CG sequences in order to avoid potential artifacting.

What about Audio? Avoiding Alvin & The Chipmunks or Burl Ives Time adjusting video content using modern frame analysis and interpolation techniques can result in quality that is indiscernible from the original source, when reasonable limits are applied.

However, one critical aspect of time adjustment is processing for the audio. When voices and music are sped up or slowed down, the pitch can be dramatically affected in such a way that the viewer notices audible changes much more so than visual changes in speed.

Early efforts for pitch correction on sped up or altered audio involved a separate process of re-sampling and time stretching, and then applying a technique known as Pitch Scaling, in which the original (now the retimed) pitch of the audio is raised or lowered. [15] This method requires a lot of time, and a highly skilled audio engineer with specialized equipment – fine for high budget movies, but not practical for the common speeding up or slowing down of TV shows.

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 10

Engineers designing automated media retiming systems faced a number of challenges with respect to technological hurdles for non-destructive audio retiming:

• Real-time speed of operation

• Automated synching with video elements

• Pitch correction relative to source, but synched with the retimed asset

In order to accurately retime audio, it need not only be sped up or slowed down, but it also needs to be synchronized with the corresponding events that are occurring in the output’s new timeline.

Resampling of digital audio files is one method for speeding up or slowing down audio files. Pitch issues are common with this method, however, and viewers of content with a great deal of dialog or music will notice the increased or decreased pitch more readily than the video speed of playback.

A better method of retiming audio uses a solution put forth in 1978 that works in the time domain and attempts to find the period, or fundamental frequency of a given section of the wave. This process, called Time Domain Harmonic Scaling uses a pitch detection algorithm and very fast cross-fading of one period with another, and works quite well for most audio, but can sometimes lack accuracy when dealing with a complex multi-frequency track, such as an orchestral recording. [16]

Using an intelligent pitch detection algorithm solves The Chipmunks and Burl Ives issue of uncomfortably high or low pitch, but a necessary step after that is synchronizing the audio to the video track, so that lip-synch issues and “hearing the bat strike the ball when the bat strikes the ball” issues are eliminated. (See Figure 8)

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 11

Figure 8 – Aligning Retimed Audio with Retimed Video

Frame-accurate time-audio alignment is a critical aspect of non-destructive retiming due to the inherent quality issues of non-synchronized sound effects.

Metadata Retiming So far, we have examined retiming video files in a non-destructive fashion, and then retiming and pitch-correcting the audio. Another element that needs to be considered is metadata - typically Closed Caption and Subtitle data, which needs to be time-adjusted to match the retimed video and audio file. (See Figure 9)

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 12

Figure 9 – Retiming Metadata during the time adjustment process

Once the media has been retimed, the last challenge is to align metadata, such as captions or subtitles, to the new timing of the media.

The concept here is simple – keep track of where and when things change in the media, compare the original source and the original metadata files with the altered output file, and then generate a revised metadata file that matches.

For example, a system might establish the new time codes that match the previous subtitle or caption time codes, and then generate a new STL or SCC file to be delivered along with the retimed asset. In addition, some systems have the capability to burn in the captions or subtitles to the output.

Conclusion Retiming of original content has been a requirement for decades in the broadcast industry. Some of the reasons for adjusting running times involved commercial considerations – either fitting content into established program blocks, like fitting long-form movies into blocks that are reasonable for over-the-air schedules, or for planning for new business models that required additional commercial time to support the channels, as is often the case with expanded cable channel proliferation. [17]

Commercial applications aside, there are a number of other uses for accurate, non-destructive, real-time re-timing technologies. For example, editors spend a great deal of time doing

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 13

something called editing for time, which uses some of the manual techniques described in this paper. An editor’s time is valuable, and if automated systems can shave or add a few seconds here or there to edited content, this can save many hours of editing time for the typical editor, allowing them to concentrate on more important creative endeavors.

Today, more original content is shared among worldwide regions due to global media operations or partnerships among large media companies in different countries. Since the program blocks vary greatly by worldwide region – nearly everything that is shared needs to either be time-adjusted for broadcast in the local market, or have additional promotional content added. The ability to efficiently and non-destructively alter the running time of these assets is key to aligning the content to varying program blocks. In addition, the accurate retiming of metadata, such as subtitles, is all the more important in these applications.

Modern content retiming is driven, from a technological perspective, by the computing horsepower that can support sophisticated algorithms for analyzing and manipulating video, audio and metadata assets. This manipulation is to alter running time efficiently and on-demand in small and large editing and production facilities. Moore’s Law tells us that future hardware will enable more speed, and better accuracy for real-time media analysis and alteration.

Commercial interests for expanding channels and distribution, and many decades of archives of programming ensure that the demands for intelligent content time adjustment will continue to grow.

As the human bladder has not gotten any larger since Hitchcock’s time, that gauge may still dictate the maximum length of a movie or television show, but modern time adjustment technologies can help with achieving custom, non-destructive running times to meet today’s content requirements.

References 1 - "Kelly Gang" Film Began Era Of "Feature" Pictures." The Sunday Herald (Sydney: National Library of Australia). 9 October 1949. p. 9 Supplement: Features. http://trove.nla.gov.au/ndp/del/article/18471199 2 and 3 - "Screen Actors Guild Letter Agreement For Low-Budget Theatrical Pictures" (PDF). Screen Actors Guild. http://www.sagaftra.org/files/sag/Low_Budget_Ageement_1_5.pdf 4 - Chandler, Charlotte, It's Only A Movie: Alfred Hitchcock: A Personal Biography. Hal Leonard Corporation. (2006) 5 and 6 – “How Much Do Television Ads Costs? - A Primer On Television Advertising Costs - Resources For Entrepreneurs” - Gaebler Ventures - Chicago, Illinois". Gaebler.com.. http://www.gaebler.com/Television-Advertising-Costs.htm 7 and 8 and 9 – John Fricke, Jay Scarfone, William Stillman, “The Wizard of Oz: The Official 50th Anniversary Pictorial History” – Warner Books, (1989) 10 - http://www.mediacollege.com/video/editing/time/expansion.html and http://www.adobepress.com/articles/article.asp?p=2236041&seqNum=3

© 2015 Society of Motion Picture & Television Engineers® (SMPTE®) 14

11 - http://www.primeimage.com/TimeTailorHistory.html and http://www.primeimage.com/pdfs/Prime%20Image%20Time%20Tailor%20vs%20Non%20Linear%20Editing%20Systems.pdf [PDF] 12 - http://www.mediacollege.com/video/editing/time/expansion.html 13 – Still from Video credit: tkmckamy.com - Tim McKamy http://tkmckamy.com 14 – “Tempo Time Adjustment” - http://www.telestream.net/pdfs/datasheets/dat-Tempo.pdf [PDF] 15 – “Time Stretching and Pitch Shifting Overview A comprehensive overview of current time and pitch modification techniques” by Stephan Bernsee - http://blogs.zynaptiq.com/bernsee/time-pitch-overview/ 16 - Charpentier, F.; Stella, M. (Apr 1986). "Diphone synthesis using an overlap-add technique for speech waveforms concatenation". Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'86. 11: 2015–2018. doi:10.1109/ICASSP.1986.1168657. 17 - “How Much Do Television Ads Costs? - A Primer On Television Advertising Costs - Resources For Entrepreneurs” - Gaebler Ventures - Chicago, Illinois". Gaebler.com. Retrieved 2013-09-01. http://www.gaebler.com/Television-Advertising-Costs.htm