video fingerprinting and encryption principles for digital rights management

15
Video Fingerprinting and Encryption Principles for Digital Rights Management DEEPA KUNDUR, SENIOR MEMBER, IEEE, AND KANNAN KARTHIK, STUDENT MEMBER, IEEE Invited Paper This paper provides a tutorial and survey of digital fin- gerprinting and video scrambling algorithms based on partial encryption. Necessary design tradeoffs for algorithm development are highlighted for multicast communication environments. We also propose a novel architecture for joint fingerprinting and decryption that holds promise for a better compromise between practicality and security for emerging digital rights management applications. Keywords—Digital fingerprinting tutorial, digital video en- cryption survey, joint fingerprinting and decryption (JFD), video scrambling. I. INTRODUCTION This paper investigates multimedia security algorithms that enable digital rights management (DRM) in resource constrained communication applications. Our focus is on the video-on-demand (VoD) business model, in which sub- scribers to a content-providing service request and receive video information at scheduled intervals. We consider situ- ations in which on the order of hundreds or even thousands of users may wish near-simultaneous access to the same video content. Thus, for superior scalability the network service provider must transmit the content by making use of a multicast distribution model. We focus on the problems of video fingerprinting and encryption. Fingerprinting, which was first introduced by Wagner [1] in 1983, is the process of embedding a distinct set of marks into a given host signal to produce a set of fingerprinted signals that each “appear” identical for use, but have a slightly different bit representation from one another. These differences can ideally be exploited in order to keep Manuscript received September 14, 2003; revised December 15, 2003. D. Kundur is with the Department of Electrical Engineering, Texas A&M University, College Station, TX 77843-3128 USA (e-mail: [email protected]). K. Karthik is with the Edward S. Rogers Sr. Dept. of Electrical and Com- puter Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail: [email protected]). Digital Object Identifier 10.1109/JPROC.2004.827356 track of a particular copy of the fingerprinted signal. The marks, also called the fingerprint payload, are usually em- bedded through the process of robust digital watermarking. In digital watermarking, subtle changes are imposed on the host signal such that the perceptual content of the host remains the same, but the resulting composite watermarked signal can be passed through a detection algorithm that reliably extracts the embedded payload. In contrast, video encryption has the goal of obscuring the perceptual quality of the host signal such that access to the content is denied. In comparison to traditional cryptographic algorithms, those for video may often be “lightweight” in order to accommo- date computational complexity restrictions; the term “video scrambling” is often used to refer to such processes. The main objective of fingerprinting and encryption in a DRM context is to protect video content from a set of attacks applied by one or more attackers. We define an attacker as any individual who attempts to use a given piece of content beyond the terms, if any, negotiated with the content provider. Common attacks on video data include illegal access and tampering. Our work focuses on the problem of piracy, the illegal duplication and redistribution of content; we call such an attacker a pirate. Overall, the objectives of this paper are twofold: 1) to present a state-of-the-art review and tutorial of the emerging areas of video fingerprinting and en- cryption highlighting design challenges for multicast environments; 2) to propose the approach of joint fingerprinting and de- cryption (JFD) to establish a better compromise be- tween practicality and security for DRM applications. Section II introduces some general security architectures for video applications. Sections III and IV introduce and survey the areas of video fingerprinting and encryption, respectively, demonstrating the necessary compromises for algorithm and system design. Section V proposes a novel joint fingerprinting and encryption framework that overcomes many of the obstacles of previous architectures; 0018-9219/04$20.00 © 2004 IEEE 918 PROCEEDINGS OF THE IEEE, VOL. 92, NO. 6, JUNE 2004

Upload: iit-g

Post on 21-Feb-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Video Fingerprinting and Encryption Principlesfor Digital Rights Management

DEEPA KUNDUR, SENIOR MEMBER, IEEE, AND KANNAN KARTHIK, STUDENT MEMBER, IEEE

Invited Paper

This paper provides a tutorial and survey of digital fin-gerprinting and video scrambling algorithms based on partialencryption. Necessary design tradeoffs for algorithm developmentare highlighted for multicast communication environments. Wealso propose a novel architecture for joint fingerprinting anddecryption that holds promise for a better compromise betweenpracticality and security for emerging digital rights managementapplications.

Keywords—Digital fingerprinting tutorial, digital video en-cryption survey, joint fingerprinting and decryption (JFD), videoscrambling.

I. INTRODUCTION

This paper investigates multimedia security algorithmsthat enable digital rights management (DRM) in resourceconstrained communication applications. Our focus is onthe video-on-demand (VoD) business model, in which sub-scribers to a content-providing service request and receivevideo information at scheduled intervals. We consider situ-ations in which on the order of hundreds or even thousandsof users may wish near-simultaneous access to the samevideo content. Thus, for superior scalability the networkservice provider must transmit the content by making use ofa multicast distribution model.

We focus on the problems of video fingerprinting andencryption. Fingerprinting, which was first introduced byWagner [1] in 1983, is the process of embedding a distinctset of marks into a given host signal to produce a set offingerprinted signals that each “appear” identical for use, buthave a slightly different bit representation from one another.These differences can ideally be exploited in order to keep

Manuscript received September 14, 2003; revised December 15, 2003.D. Kundur is with the Department of Electrical Engineering, Texas

A&M University, College Station, TX 77843-3128 USA (e-mail:[email protected]).

K. Karthik is with the Edward S. Rogers Sr. Dept. of Electrical and Com-puter Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada(e-mail: [email protected]).

Digital Object Identifier 10.1109/JPROC.2004.827356

track of a particular copy of the fingerprinted signal. Themarks, also called the fingerprint payload, are usually em-bedded through the process of robust digital watermarking.In digital watermarking, subtle changes are imposed onthe host signal such that the perceptual content of the hostremains the same, but the resulting composite watermarkedsignal can be passed through a detection algorithm thatreliably extracts the embedded payload. In contrast, videoencryption has the goal of obscuring the perceptual qualityof the host signal such that access to the content is denied.In comparison to traditional cryptographic algorithms, thosefor video may often be “lightweight” in order to accommo-date computational complexity restrictions; the term “videoscrambling” is often used to refer to such processes.

The main objective of fingerprinting and encryption ina DRM context is to protect video content from a set ofattacks applied by one or more attackers. We define anattacker as any individual who attempts to use a given pieceof content beyond the terms, if any, negotiated with thecontent provider. Common attacks on video data includeillegal access and tampering. Our work focuses on theproblem of piracy, the illegal duplication and redistributionof content; we call such an attacker a pirate.

Overall, the objectives of this paper are twofold:

1) to present a state-of-the-art review and tutorial ofthe emerging areas of video fingerprinting and en-cryption highlighting design challenges for multicastenvironments;

2) to propose the approach of joint fingerprinting and de-cryption (JFD) to establish a better compromise be-tween practicality and security for DRM applications.

Section II introduces some general security architecturesfor video applications. Sections III and IV introduce andsurvey the areas of video fingerprinting and encryption,respectively, demonstrating the necessary compromisesfor algorithm and system design. Section V proposes anovel joint fingerprinting and encryption framework thatovercomes many of the obstacles of previous architectures;

0018-9219/04$20.00 © 2004 IEEE

918 PROCEEDINGS OF THE IEEE, VOL. 92, NO. 6, JUNE 2004

preliminary algorithmic ideas are discussed. General con-clusions are presented in Section VI.

II. SECURITY ARCHITECTURES

We consider a single transmitter which may be a VoDserver that we refer to as the source or server that communi-cates with receivers that we call users. In all situations,the source is responsible for embedding the global (i.e., staticfor all users) group watermark that may contain copy-right and ownership information, and is also responsible forencrypting the media content using secret key cryptographywith a group key that is common for all users. The useof a single group key for encryption under certain conditionscan enable multicast communications, but requires more so-phisticated key management.

At the receivers, each user must decrypt the content in-dividually. Fingerprinting can occur either at the transmitteror receiver, and separate or integrated with the cryptographicprocess which is the basis for our architecture classifications.Fingerprint detection is assumed to occur offline at a latertime outside the scope of the multicast communication setup.The reader should note that the watermarking process for(which may be optional) is distinct from fingerprinting.

A. Transmitter-Side Fingerprint Embedding

In this approach, introduced in [2], the fingerprint is em-bedded at the source. An optional copy control or ownershipwatermark is first embedded into the host media. Then adistinct fingerprint is marked in each copy of the media to bedelivered to each of the customers. Every watermarked andfingerprinted copy for is then encryptedseparately using the same group key (that is known at thesource and by all the users) to produce , .Fig. 1(a) summarizes the approach.

One characteristic of this method is that different copiesof the media have to be simultaneously transmitted, whichrepresents bandwidth usage of order . In addition, thenumber of copies of the media that must be encrypted andfingerprinted also increases linearly with . Thus, the overallmethod suffers from poor scalability and cannot exploit themulticast infrastructure. Many current methods for finger-printing implicitly make use of this architecture [3]–[5].

B. Receiver-Side Fingerprint Embedding

The next architecture, initially introduced in [6] withrespect to digital TV and more recently discussed in [2] and[7] for DRM in digital cinema, involves fingerprinting atthe receiver. In this scheme, shown in Fig. 1(b), the optionalcopyright watermark is embedded and the subsequent mediais encrypted with the group key to produce the encryptedcontent . Only one encryption (and no fingerprinting) isnecessary at the server, reducing latency and complexityfrom the previous architecture. In addition, because only onesignal needs to be transmitted to multiple users, multicastcommunications can be exploited.

At the receiver, the encrypted signal is decryptedby each user using and is immediately fingerprintedwith a mark that is distinct for each user to producethe fingerprinted media . For security, both decryptionand fingerprinting must be implemented on a single chipor application-specified integrated circuit (ASIC) so thatthe decrypted signal is not easily accessible before fin-gerprinting. Furthermore, tamperproof hardware, which isdifficult to achieve and still an open research problem, mustbe used in order to protect the purely decrypted host signalfrom eavesdropping.

The additional burden of fingerprinting at the receiver maybe problematic if the transmission is real time. Consumersare not willing to pay excessively for security features that donot directly benefit them. Therefore, either low-complexityalgorithms or nonreal-time implementations of fingerprintembedding are necessary, which may limit the use of this ar-chitecture for some applications.

C. Joint Fingerprinting and Decryption

In order to overcome the complexity issues of finger-printing at the receiver while preserving the bandwidth,complexity, and latency efficiencies at the source, wepropose the notion of integrating the decryption and fin-gerprinting processes. As discussed in the previous section,the server encrypts the media using the group key .However, at each receiver a single secret key that isunique for each user is employed for JFD. The process, inpart, mimics decryption. However, the use of fordecryption allows the introduction of a unique fingerprintfor each user, making each decrypted copy distinct. Theinformation carried by the fingerprint can be representedas the relative entropy between the source and decryptionkeys, that is, , where is theentropy of the fingerprint for user , and is theconditional entropy of the group key given the receiver’skey. We present this approach in Fig. 1(c) and discuss apreliminary implementation in Section V. The structureof the scheme does away with the need for tamperproofhardware, but raises issues with respect to the tradeoffbetween imperceptibility and robustness of the fingerprint,especially with respect to collusion attacks.

D. Other Scalable Fingerprinting Architectures

Other methods have been proposed to overcome, in part,the scalability challenges of fingerprinting by distributingthe fingerprinting process over a set of intermediate nodessuch as routers. This shift in trust from the network sourceand destination extreme points to intermediate nodes cre-ates a different set of challenges such as vulnerability to in-termediate node compromise and susceptibility to standardnetwork congestion and packet dropping. A more detaileddiscussion is beyond the scope of this paper; the reader isreferred to [8] for a comparative survey by Luh and Kundur.

In the next section, we focus on the fingerprinting problemand introduce the coding and signal processing challenges ofdata embedding.

KUNDUR AND KARTHIK: VIDEO FINGERPRINTING AND ENCRYPTION PRINCIPLES FOR DIGITAL RIGHTS MANAGEMENT 919

Fig. 1. Security models. (a) Transmitter-side fingerprint embedding. (b) Receiver-side fingerprintembedding (decryption and fingerprinting are decoupled). (c) Proposed JFD.

III. VIDEO FINGERPRINTING

A. General Formulation

We consider a video server that distributes fingerprintedcopies of media to users for where

represents the th user and is the set of all users at aspecific time in the media distribution system. A fingerprint

associated with user is a binary sequence of length. The set of all fingerprints associated for the users in

is denoted with cardinality

920 PROCEEDINGS OF THE IEEE, VOL. 92, NO. 6, JUNE 2004

Fig. 2. Fingerprint embedding. The binary fingerprint is channel coded for robustness and thenmodulated so that it can be embedded in the host media. Perceptual analysis is conducted on thehost in order to determine the depth or strength of embedding that provides a good tradeoffbetween imperceptibility and robustness.

and codeword length bits. The fingerprint codewordsare modulated, which involves the use of signal processingstrategies to form a signal , that can beadded imperceptibly to the host media with the help of humanperception models. Fig. 2 shows the process of fingerprintembedding. The modulated version of the fingerprint isoften called a watermark in the research literature, whichhas a distinct objective from the copyright watermarkpreviously discussed. For simplicity from now on our useof the term “watermark” unless otherwise specified willapply to the modulated fingerprint and not .

In this context, a pirate is a user who illegallyredistributes his/her copy of the distributed media either inmodified or unmodified form; an illegally distributed mediais termed a pirated copy. In order to make sure that a piratedcopy cannot be traced back to the pirate , he/she will tryto cover any tracks by attempting to erase the associated fin-gerprint or frame another user. It is also possible that asubset of pirates may combine different copies oftheir fingerprinted media to achieve their goal. This powerfulattack is called collusion. Collusion is also possible when asingle user requests different copies from the server underdifferent aliases (but this does not change the formulation ofthe problem). Fig. 3 summarizes where the general collusionattacks takes place in an overall media distribution system.

The server is responsible for secure video distribution,which is achieved through a combination of compression andsecurity processing that includes fingerprinting. The serverneeds to balance efficiency of transmission and robustnessof security without hindering the viewing experience of theusers.

Once the server transmits the secured media over the dis-tribution channel, it is received by all users consisting of twodisjoint groups: the lawful users and the pirates. Lawful usersconsume the media in the manner that was agreed upon. Pi-rates tamper with the decrypted media, potentially collude inorder to remove any fingerprints or frame other user(s), andinsert the pirated copy into illegal distribution channels. If,at a later time, such a pirated copy is discovered, then it issent to the source (or a party working with the source) andthe associated fingerprint(s) are detected in order to tracethe pirate(s). A fingerprinting scheme that consists of fin-gerprint generation, embedding, and detection stages must,

therefore, address several challenges that we discuss in thenext sections.

B. Fingerprinting Requirements

1) Fundamental Compromises: There is a basic tradeoffbetween fingerprint embedding and source coding. Com-pression attempts to remove redundancy and irrelevancyfor the purpose of reducing storage requirements whilemaintaining the perceptual fidelity of the media. In contrast,fingerprinting shapes the irrelevancy within the mediasignal to transparently communicate security codesalong with the media. Thus, as first observed by Andersonand Petitcolas, if perfect compression existed, it wouldannihilate the process of fingerprinting [9]. The restrictedstructure of compression algorithms and limited accuracy ofperceptual coding models, however, allows some irrelevantsignal bandwidth to be used for fingerprinting. In general,the lower the required bit rate after compression, the smallerthe length of the fingerprint that can be robustly embedded.

Another set of compromises exists between fingerprintcapacity, defined as the total number of unique fingerprintsthat can be embedded and successfully distinguished at thereceiver, and robustness, which reflects the inability for oneor more pirates to erase or forge the fingerprint withoutaffecting the commercial quality of the video. In general,the smaller the required fingerprint capacity, the easier itis through the effective use of redundancy to make theembedding robust. In general, the fingerprint capacity mustensure that all possible users can be serviced in the life cycleof the distribution system.

Similarly, there is a tradeoff between perceptual qualityand robustness (or capacity). Transparency requires that thefingerprint be embedded in either perceptually irrelevantcomponents such as high spatial frequencies or perceptuallysignificant components with severely limiting amplitude.This is contradictory to the goals of robustness that aimto ideally embed high-energy fingerprint watermarks inperceptually significant components so that they cannot beeasily removed [10].

Thus, one arrives at a set of compromises that must beresolved to design an effective media fingerprinting scheme.To summarize, these include the ability to work with existing

KUNDUR AND KARTHIK: VIDEO FINGERPRINTING AND ENCRYPTION PRINCIPLES FOR DIGITAL RIGHTS MANAGEMENT 921

Fig. 3. Overview of video security, distribution, and piracy process. Fingerprinting consists of threestages: fingerprint generation, fingerprint embedding, and fingerprint detection as highlighted withbold boxes. We do not include copyright watermarking in this figure, which may be added prior toencryption, for reasons of simplicity. The compression process is placed prior to fingerprinting forpracticality, although it may occur at other stages.

compression standards, robustness to signal processing andcollusion-based attacks, capacity to handle unique detectionof all possible users in the life cycle of the VoD system, andperceptual quality.

2) Practical Constraints: Other pragmatic issues involverestrictions on complexity, latency, quality of service (QoS)and bandwidth. For example, the server may receive hundredsof video requests within a short span of time, requiringreal-time algorithm complexity. Related to this concept ishardware complexity that refers to computational resources,power, and memory utilization that must be minimal toreduce implementation costs. Bandwidth consumption is anadditional concern for the content distributor, as Internetservice provider (ISP) utilization costs may be nonnegligible.

The final consideration must be that the superposition ofthe DRM architecture must not affect the end-to-end QoSof the distribution network.

From the perspective of the ISP, bandwidth efficiency isof primary concern. Therefore, fingerprinting schemes inwhich the associated watermark is embedded at the source,as discussed in Section II-A, do not scale well as the numberof users grow. In contrast, more efficient architectures,presented in Sections II-B and II-C, are more suitable.Another issue of importance to the ISP is the level of DRMprocessing required from intermediate nodes or routersin the network. Ideally, an ISP would prefer to avoid theneed for audit trails by using intermediate network nodes asdiscussed in Section II-D. Such strategies for DRM are not

922 PROCEEDINGS OF THE IEEE, VOL. 92, NO. 6, JUNE 2004

“network friendly,” since there are issues related to trust andcomputational complexity at the inner nodes that increasenetwork latency.

The end user, who has the most influence in determiningthe acceptance of a given media distribution and rightsmanagement system, requires a desired QoS for sufficientlylimited power constraints. The fingerprinting cannot causeperceptual discomfort, for example, through watermarkerror propagation as discussed in [2]. Furthermore, in re-ceiver-oriented fingerprinting schemes, the power consumedfor fingerprinting must be minimal, especially in wirelessenvironments.

C. Existing Work

The body of literature concerned with digital finger-printing for DRM applications can be classified into threebasic categories: coding theoretic, protocol enhancing, andalgorithm specific. In coding-theoretic work, the authorsview fingerprinting as a code design problem in which thecodebook often represents the unique set of fingerprints (ora closely related quantity) that can be used for embedding.These formulations allow a well-structured modeling ofthe problem in order to isolate issues such as the necessarylength of the fingerprint code, the number of users thatmay be supported by such a system, and the ability forusers to collude to erase or fabricate a given fingerprintcode. However, they make somewhat restrictive assump-tions on the fingerprinting process and attackers’ behavior.Early theoretical work by Blakely et al. [11] characterizesthe constraints on a pirate with access to several distinctfingerprinted copies of data when attempting to erase orfabricate fingerprints. Boneh and Shaw [3] investigate theproblem of fingerprint code design when one would liketo restrict a maximum coalition of users, each carrying adistinct fingerprinted copy of the content, from colluding tofabricate another fingerprint (and, hence, frame another userfor an unwanted act).

Protocol-enhancing methods provide protection againstattacks on innocent users by the fingerprinting source.Pfitzmann and Schunter [12] introduce the notion of asym-metric fingerprinting, in which they investigate protocolsto protect innocent users from being framed for piracy bythe source. Pfitzmann and Waidner [13] address privacyissues by proposing a protocol that allows users to maintaintheir anonymity during content purchase, although they canbe later identified if they distribute content illegally. Thefinal class of algorithm-specific methods is the focus of ourproposed work and is discussed in Section III-C3.

1) Fingerprint Generation, Embedding, and Detec-tion: The fingerprinting problem can be broken down intoseveral stages including fingerprint generation, embedding,and detection as highlighted in Fig. 3. Each of these compo-nents must be designed in order to keep the overall systemrobust to a given set of attacks. Fingerprint generationinvolves the design of a fingerprint “code” for every userthat makes it possible to uniquely identify the code and,hence, the user during detection. Previous literature on codeconstructions [1], [3], [11] is useful for this stage in order

to keep the fingerprints collusion-resistant under a set ofconditions.

In addition, error correction code (ECC) strategies maybe used to make the embedded codes more reliable in theface of attacks. The use of ECCs in the area of water-marking was motivated by the communication analogy ofwatermarking in which the processes of fingerprint embed-ding and detection are likened to modulation and signalreception in a communication system. Any attacks on thefingerprint characterize nonideal communication channelmodel often called the watermark channel. It then followsthat for improved performance, fingerprint generation mayuse elements of channel coding to reduce the probabilityof detection error by creating interdependencies within theembedded codes at the price of increased complexity andbandwidth. Given the success of ECCs in achieving near-capacity information transmission, it has been concurredthat such an approach is effective in improving the detectionof fingerprints, although the tradeoffs for practical water-marking applications remain, in part, unexplored. Work byBaudry et al. [14] investigates the use of repetition andBose–Chaudhuri–Hocquenghem codes (BCH codes) in im-proving the fingerprint robustness. The degree of tradeoffbetween robustness and code length is characterized as-suming the watermark channel is an additive white Gaussiannoise (AWGN) channel. Fernando et al. [15] investigate theeffects employing channel codes such as block, convolu-tional, and orthogonal codes prior to watermark embedding.Their analysis shows the superior performance of convolu-tion codes to the other coding approaches. One importantquestion that still remains is how a designer can matcha coding scheme with an embedding process to bolsterperformance over a broad class of attacks.

Fingerprint embedding employs signal processingmethods to insert the fingerprints into the host signal suchthat the fingerprint is imperceptible yet robust to attacksincluding collusion. Detection complements this and takesinto account the watermark channel characteristics to min-imize watermark detection error. Several archetypes havebeen proposed for embedding and detection. The two mainparadigms are based on spread spectrum signaling and quan-tization. Quantization methods [16], [17] are characterizedby their ability to avoid host signal interference. That is,watermark detection under ideal conditions is guaranteed.Spread spectrum methods [18] are popular for their higherresistance to attacks modeled as narrow-band interference.However, given the vulnerability of spread spectrum ap-proaches to fading-type attacks, other approaches such ascommunication diversity may be additionally adopted tosupplement performance [17]. As previously discussed, allmethodologies exhibit a compromise amongst fingerprintperceptibility, robustness, and capacity.

To be effective, the selection of the generation, embed-ding, and detection stages must match the type attack thatwill be applied to the marked media. We focus on the collu-sion attack, which is unique to fingerprinting applications;other watermarking applications in which only the staticwatermark is embedded in the host for all users are notsusceptible to collusion.

KUNDUR AND KARTHIK: VIDEO FINGERPRINTING AND ENCRYPTION PRINCIPLES FOR DIGITAL RIGHTS MANAGEMENT 923

2) Collusion: Many popular signal processing attacks onwatermarks have been modeled as random noise [18] or asfading [17].1 These operations on the watermarked signal at-tempt to remove the watermark while maintaining perceptualfidelity. The location or some other random characteristic ofthe watermark, often called the watermark key, is used forwatermark generation and embedding and is unknown to theattacker. This strategy, although not equivalent in security tocryptography, can provide basic protection for the embeddeddata. Security, however, may be significantly compromisedif two or more copies of distinctly watermarked signals areavailable to the attacker. In such a case, it is possible thatsome portions of the watermark key, previously assumed tobe unknown to the attacker, can be easily estimated. As thenumber of distinctly watermarked copies increases, it maybe possible that a growing degree of information about thewatermark key becomes known until the overall system istrivially broken.

Such an attack is called collusion. In collusion attacks, agroup of users compare their distinctly fingerprinted copiesto form another composite signal that either contains no fin-gerprint or frames an innocent user. We first focus on a pop-ular subclass of collusion attacks, called linear collusion thatworks as follows:

(1)

where is the colluded (or composite) copyand is the fingerprinted copy of the video received by theth user. It is possible that the originally fingerprinted copy

at the source of user (which we denote ) has undergonesome small incidental distortions to produce which is notexactly the same as .

In (1) it is clear that has elements of all fingerprintedcopies of the colluders. It is possible, depending on the fin-gerprint generation, embedding, and detection stages, thatone or more of the colluders can be identified from . In-tuitively, the fingerprinted copy of the user corresponding tothe largest value of has the most influence on and,hence, may be more easily identified from . In many for-mulations of collusion, is employed as a conditionof fairness among the colluders to equalize the probabilitythat any of them are caught. Research on linear collusion forspread spectrum watermarking approaches of fingerprintinghas been conducted and demonstrates how this attack is pow-erful when the correlation between watermarks is small andthe number of colluders is large [18].

General collusion attacks can be more deceptive and in-corporate the various fingerprinted video in a nonlinearfashion by, for example, taking the minimum, maximum, ormedian value of pixels in a given video stream location [19].The only constraints on the pirate is that the be percep-tually identical to the family of fingerprinted signals , andthat the attack be computationally feasible. This means thatthe cost of the attack to the pirate is lower than the value ofthe video content.

1The reader should note that we are not including geometric or protocolattacks among others in our discussion.

3) Collusion-Resistant Fingerprinting Techniques: Inthis section, we discuss three contributions that fall underthe class of algorithm-specific fingerprinting schemes. Oneof the first papers addressing collusion-resistant fingerprintdesign was by Dittmann et al. [20]. Their objective isto trace the exact subset of colluders from a total ofusers when the number of colluders is equal to or belowa prespecified threshold number . In the same spirit as[1], [3], and [11], the authors assume that collusion occurswhen the colluders compare distinctly fingerprinted signals,identify all the locations that differ for at least one pair ofcolluders, and then modify or remove the fingerprint fromthose positions of the media. Given this model, the taskof collusion-resistant design becomes one of embeddingfingerprints such that there are common elements among any

or fewer subsets of users so if any of these groups colludeto form , the common elements of their fingerprint (thatstill remain in the media by assumption) identifies them.Dittmann et al. demonstrated how judiciously developedwatermarks could be employed to achieve this task. Forinstance, for and , the goal is to identify one ortwo colluders from an overall set of three system users. Thiscan be achieved by having a common watermark locationsbetween each possible pair (since ) of usersfor and a unique location foreach user for . To design the collusion-resistantwatermark and its locations for each user, the authors of[20] make use of the concept of finite geometries andprojective spaces. For the case of and , this in-volves finding three intersecting lines in a two-dimensionalprojective space (in this case, a plane). Every point inthe projective spaces pseudorandomly maps to a locationin the media. The intersecting points of any two lines inthe projective space represents the common watermarklocations of a pair of users that identifies the colluding pair.The method is extended for general values of and inwhich higher dimensional projective spaces are used. Themain challenge with this approach is that a large numberof fingerprint marking locations are necessary for large .Thus, the scheme does not scale easily when involving shorthost video clips with fewer possible embedding locations.

Trappe et al. [5] investigate orthogonal and code divisionmultiple access (CDMA)-type modulation scenarios forfingerprinting in the presence of collusion. In orthogonalmodulation, each user is assigned a fingerprint that isorthogonal to all other fingerprints embedded in the signal;the fingerprint may be a pseudorandom noise (pn) sequence.Detection ideally requires the correlation of the receivedsignal with all possible fingerprints where the highestcorrelation result identifies the embedded fingerprint. Thus,detection complexity scales linearly with the number ofusers in the system. Trappe et al. [5] show that the computa-tional cost can be reduced at some expense of performancewith coded modulation. Here, CDMA-type modulationis used in which every user has the same pn sequencethat is modulated using a unique (for each user) binarycode that are generated to be collusion-resistant using thetheory of balanced incomplete block designs (BIBD) andthe resulting fingerprints are termed anticollusion codes

924 PROCEEDINGS OF THE IEEE, VOL. 92, NO. 6, JUNE 2004

(ACCs). Analysis and simulations verify the improvedperformance of the scheme to practical linear collusionattacks. The challenges involve overcoming the assumptionthat linear collusion is equivalent to a logical AND of theACCs, which is not the case for more than two colluders,and the nontrivial nature of designing BIBD-codes for ACCsof arbitrary system parameters.

In contrast, Su et al. [21], [22] analyze the necessaryconditions for the problem of resistance against linear col-lusion among frames within the same video sequence. Theauthors focus on two types of collusion: “Type I,” in whichthe colluded result is used to estimate the fingerprint in agiven host video (and, hence, subtract it out), and “Type II,”in which is an estimate of the host where the fingerprinthas been filtered out. The authors derive watermark designrules to ensure neither Type I and Type II collusion can beachieved by an attacker. The rules state that under a givenset of conditions, the watermarks embedded in each framemust have proportional correlations to their host video framecounterparts. Thus, as the similarity amongst the host videoframes changes temporally, the correlation of the associatedwatermarks must also follow a comparable evolution. Todemonstrate the utility of the design rules, the authorsdevelop the Spatially Localized Image-Dependent (SLIDE)video watermarking method [23]. A feature extractionalgorithm is designed which selects anchor points aroundwhich components of the watermark are embedded. As thehost video frame evolves in time, the locations of the anchorpoints also change such that similar frames contain similarwatermarks and diverse frames contain diverse watermarks.Under a specific set of assumptions, it is derived that theSLIDE algorithm keeps the correlation of the watermarksproportional to those of the host frames up to a certainresolution related to the total number of anchor points in aframe.

It should be noted that overall digital fingerprinting is apassive form of security and works only after the content hasbeen received and made available to the user. In the next sec-tion, we discuss video encryption, an active form of protec-tion that when applied prevents a user from content access.

IV. VIDEO ENCRYPTION

A. Partial Encryption

Encryption can be defined as a transformation that isparameterized by a numeric value called an encryption key

of a given input signal called the plaintext. The outputof the transformation, called the ciphertext, must ideally“appear” random to make estimation of the plaintext fromthe ciphertext computationally difficult without access to thedecryption key ; may be the same or different from

depending on the type of transformation employed.The process of decryption is the inverse transformation ofencryption.

Video encryption has gained interest in recent yearsbecause use of well-known and tested secret key encryp-tion algorithms, such as Triple Data Encryption Standard(3DES) and Advanced Encryption Standard (AES), are

Fig. 4. Overview of partial encryption of raw video.

considered computationally infeasible for high volumes ofplaintext in low-complexity devices or for near real-time ormassively parallel distribution of video flows. In order toovercome some limitations, Cheng and Li [24] discuss thenotion of partial encryption, in which a smaller subset ofthe plaintext is encrypted to lower computation and delaywhile integrating the overall process with compression.2

Fig. 4 summaries the basic idea. The raw video data ispartially compressed (e.g., the transform coefficients ofare quantized) and then separated into two components:the essential features (EF) that must be encryptedand the nonessential features (NEF) that are left inplaintext form. The output of the encryption stage denoted

is further compressed (which involves nonlossycompression such as entropy coding, for example) andthe result is a multiplexed with to produce the finalencrypted and compressed content.

Using this partial encryption framework, the authorsof this paper assert that the problem of designing an ef-fective video encryption algorithm involves selecting theappropriate EF and NEF of the video content for a givenapplication. The EF–NEF selection may be based on anumber of different criteria.

1) It is desirable that the fraction of the video stream thatneeds to be encrypted is as small as possible; thus,

2Another effort to address this problem, which we do not address in thispaper, involves the digital video broadcasting (DVB) scrambling systemsuitable for set-top boxes.

KUNDUR AND KARTHIK: VIDEO FINGERPRINTING AND ENCRYPTION PRINCIPLES FOR DIGITAL RIGHTS MANAGEMENT 925

Fig. 5. Video encryption by coefficient and sign scrambling.

the bit rate of should ideally be much smallerthan the bit rate of the after the first stage ofcompression.

2) The security in a video watermarking context isrelated, in part, to visual quality. Thus, the EF–NEFpartitioning should guarantee that the “visual quality”of the encrypted video is highly dissimilar to theplaintext. That is, should contain most of theperceptually critical content of . In some applica-tions, only the commercial quality of the signal needsto be degraded by scrambling allowing some of theoriginal content through.

3) It should not be possible for an attacker to estimatethe EF from the NEF. Otherwise, it may be possible toobtain , an estimate of from and thenuse and in a traditional cryptographicknown-plaintext attack.

4) If the compression stages are based on a coding stan-dard, then the and components must beeasily accessible elements related to the compressionprocess, such as discrete cosine transform (DCT) co-efficients in the case of MPEG-2.

B. Practical Considerations for a Video Encryption Scheme

In addition to the complexity and partial encryptionconsiderations of the previous sections, we highlight otherpractical requirements in video encryption design. Com-munication latency, which refers to the time lag betweenthe source of communication and its destination, must beminimized. This is a more serious issue if the encryptedtransmission has to be done in real time for applicationssuch as video conferencing as opposed to VoD; however,severe latency is a problem for, say, live Web broadcasts.Friedman [25] demonstrates through simulations that usingthe IP Sec protocol with encryption and authentication using3DES and SHA-1, respectively, for securing real-time voicecommunications introduced an end-to-end delay of 1.2 ms;the associated delays for video would be unacceptable.

In addition, transcodability must be possible if customersin a VoD system have different QoS constraints. Transcodingis the process of converting directly from one video formatto another with often lower quality requirements to facilitatecontent distribution to users with varying bandwidths.Practically, transcoders can be found at video gatewaysthat connect fast networks to slow ones. A straightforward

approach to integrating transcoding with encryption requiresa complete decryption of the video content, then standardtranscoding, and finally reencryption. Chang et al. [26]and Wee and Apostolopoulos [27] propose subdividing thevideo content into multiple components, each independentlyencrypted such that the transcoder can judiciously dropsome of the encrypted components without processing themand yet not affecting the decryption of the remaining ones.

Finally, the influence of encryption on the video compres-sion rate must be minimal. For overall efficiency, encryptionand compression are often integrated as discussed in Fig. 4where compression is broken up into two stages. Attemptsof compression at Stage II (after the encryption) are often in-effective because the scrambling increases entropy such thatfurther compression has diminishing returns. Thus, the com-pression stages should be designed such that much informa-tion reduction occurs in Stage I while leaving the data in aform where and with encryption-friendly char-acteristics can be easily separated and processed accordingly.

C. Existing Partial Encryption Approaches

Many of the techniques for partial encryption are proposedfor efficient MPEG encryption. Therefore, intermediatesignal elements related to the MPEG standards, such as theDCT or discrete wavelet transform (DWT) coefficients, areconvenient to use as and .

1) Coefficient Scrambling: The coefficient scramblingclass of methods encrypts some property of the video signalsuch as the overall value, position or sign of its DCT or DWTcoefficients (depending on the codec used for compression).As Fig. 5 demonstrates, the scrambling occurs either imme-diately before or after quantization. The advantages includelow computational complexity and easy integration withcompression. The primary disadvantage is that this approachis not necessarily secure if simple shuffling procedures areused to obscure the coefficients.

In [28] Tang proposes a DCT shuffling scheme forcompatibility with JPEG or MPEG-2, in which the DCTcoefficients within each 8 8 block are permuted using asecret key. Zigzag scanning has the same computation orderas shuffling, so no computational security overhead overcompression is incurred. The scheme, although simple toimplement, changes the statistical (run length) property ofthe DCT coefficients, making it less susceptible to entropycoding and, therefore, increases the bit rate of the encrypted

926 PROCEEDINGS OF THE IEEE, VOL. 92, NO. 6, JUNE 2004

stream. This problem, however, can be alleviated by re-stricting the scrambling to the lower DCT frequencies (thatcontains most of the signal energy) without significantlycompromising on security.

To overcome the increased bit rate, Shi and Bhargava [29]propose the MPEG Video Encryption Algorithm (VEA) thatencrypts the sign bits of all the DCT coefficients3 of anMPEG video stream. The main advantage is that VEA addsminimum overhead to the MPEG codec, because encryptioninvolves a bitwise XOR operation of each nonzero DCTcoefficient with the secret key.

2) MPEG Bit Stream Encryption: For increased securityover the shuffling procedures of the previous section, anumber of partial encryption proposals use algorithms suchas DES, 3DES, and AES. However, to sustain the same levelof complexity, a lower volume of video must be encrypted.The challenge with this method is that the sparse encryptedcomponents may be treated by an attacker as “errors” inthe transmitted video stream that can be corrected by usingthe natural redundancy in the video stream. Such errorcorrection capability is equivalent to breaking the partialencryption algorithm. Thus, care must be taken to select

such that it contains critical information that cannot beestimated from

Early attempts suggest encrypting only the I-frames orIntra-blocks of the MPEG video stream. However, Agi andGong [30] show that although it may appear that the P- or B-frames are visually meaningless without the correspondingI-frame, a series of P- and B- frames carries much moreperceptual information especially if their base I-frames arecorrelated, which can be used to increase fidelity from theencrypted video stream. One solution is to increase thefrequency of the I-frames and, hence, encrypted content,which in turn raises the bit rate. The proposal by Agi andGong involves using DES to encrypt the I-frames. How-ever, one problem with the use of block ciphers is that anerror in a single bit will render the entire decrypted blockunintelligible.

Qiao and Nahrstedt [31] proposed a video encryptionscheme that works exclusively on the data bytes and doesnot interpret the MPEG stream for selection of . In thisalgorithm the authors divide the MPEG stream into twocomponents composed of odd- and even-numbered bytes.Through statistical analysis, they show that there is norepetitive pattern in the even byte stream which has randomcharacteristics similar to that desirable for an encryptionkey. Using the even byte stream as a key for a one-time padencryption algorithm, the odd numbered bytes are XORedwith the even ones. To protect the identity of the evenstream, it is then protected by applying a standard algorithmsuch as DES. The result is a 47% reduction in computationas compared to the full encryption scheme.

3) Hierarchically Based Encryption: In compressionschemes based on multiresolution analysis of images such asquadtree decomposition or zerotrees wavelet compression,

3More exactly, the differential DC values of the DC coefficients areencrypted.

there is a correlation between sets of coefficients that existat various levels of the tree. Coefficients that have similarcharacteristics are grouped together by forming linked lists.Since each linked list is essentially a chain of pointers, onecan ensure secrecy by encrypting just the parent set (or node)and the corresponding parameters that describe the parentset, so that the subsequent links lack reference informationand are ideally useless in deciphering the video content.

Using such an approach, Cheng and Li [24] and Shapiro[32] propose partial encryption for quadtree compression andwavelet compression based on zerotrees, respectively. Themain advantage is that only 13%–27% and 2% of the com-pressed output of typical images need to be compressed for[24] and [32], respectively.

V. JOINT FINGERPRINTING AND DECRYPTION

As discussed in Section II-C, we propose a new architec-ture for both tracing pirates and securing multimedia acrossmulticast networks, in which we combine the process of fin-gerprinting and decryption at the receiver. Shifting the fin-gerprinting process to the receiver and, thus, integrating itwith decryption has the advantage that the media needs to beencrypted just once at the source before being multicast tothe different receivers, a tremendous saving in computationalpower, memory, latency, and bandwidth, over architecturesdiscussed in Sections II-A and II-D without requiring tam-perproofing hardware.

The discussions in Sections III and IV highlight a numberof important compromises for multimedia security design.In this section, we investigate possible advantages of usingstrategies to integrate on one or more processing stages.Specifically, we propose a new architecture for both tracingpirates and securing multimedia across multicast networks,in which we combine the process of fingerprinting and de-cryption at the receiver as discussed in Section II-C. Shiftingthe fingerprinting process to the receiver has the advantagethat the media needs to be encrypted just once at the sourcebefore being multicast to the different receivers, a tremen-dous saving in computational power, memory, latency, andbandwidth, over architectures discussed in Sections II-Aand II-D. Since every receiver can be a potential pirate, thefingerprint embedding process must be made inaccessible.The JFD architecture is a more computationally efficientand convenient way of fingerprinting at the receiver withoutthe need for expensive tamperproofing equipment.

A. Related Work

Our architecture is inspired by the work of Andersonand Manifavas on chameleon ciphers [33] developed formulticast- or broadcast-type channels. Chameleon ciphersare a clever way to combine encryption and fingerprintingthat offers computational efficiency. The idea is that encryp-tion is performed on the plaintext at the source using a groupkey to produce the ciphertext that is multicast to allthe users. Slightly different decryption keys from the set

are distributed to the users suchthat when decryption is performed on the ciphertext, the

KUNDUR AND KARTHIK: VIDEO FINGERPRINTING AND ENCRYPTION PRINCIPLES FOR DIGITAL RIGHTS MANAGEMENT 927

Fig. 6. Overview of the JFD architecture.

result is slightly different for each user. In their implementa-tion for raw audio streams, Anderson and Manifavas adapta block cipher for encryption in output feedback mode tooperate on raw audio pulse code modulation bits and ensurethat the least significant bits of the plaintext audio aloneare changed when the content is decrypted with a user’sdecryption key. This guarantees that the fingerprint doesnot result in perceptual degradation. Some of the challengesinvolving chameleon ciphers include robustness of thefingerprint due to its LSB nature, the large key size that isinappropriate for multicast scenarios, the assumption thatthe media signal is in raw format rather than in compressedform, and its vulnerability to collusion. It is shown throughsimulations that five or more users can produce a plaintextthat cannot be traced by using a bit-wise majority voting toerase the watermark from the LSBs.

More recently, a scheme by Parvianen and Parnes [34] hasbeen proposed in which the authors also adapt a stream ci-pher such that different users equipped with long and distinctdecryption keys receive fingerprinted copies of the plaintext.Their algorithm allows the use of more general fingerprintingtechniques because the process of embedding occurs sepa-rately from encryption at the source, but in such a way that thebandwidth usage is at most doubled over that of normal mul-ticast. Analysis is provided to demonstrate that if colludersrepresent small fractions of the overall users in the system,the detection likelihood of traitors is high.

B. JFD Architecture

The source extracts the perceptually relevant featuresfrom the multimedia content and selectively encryptsthem with as shown in Fig. 6. Based on this model,the source multicasts the encrypted content tousers. Each receiver upon subscription receives a decryptionkey from the source. The decryption key set denoted

is designed jointly with thesource key set so that an imperceptible and indeliblefingerprint is embedded in the content after decryption. Thereader should note that although we require , wedo not employ asymmetric encryption. The key asymmetry,as it will become clear, stems from the necessity of jointlydecryption and fingerprint embedding.

For user , the fingerprint information is essentially con-tained in the asymmetric key pair out of whichthe th receiver has access only to the decryption keyand the encrypted content . The receivers donot have knowledge of . The fingerprint is a function ofthe correlation between the encryption and decryption keys.We assume the encryption and decryption keys are correlatedrandom variables and an encryption and decryption structurewith limited diffusion capability is employed; the source of

the secrecy comes primarily from the process of confusion[35]. We define the fingerprint payload capacity, which is themaximum length of the fingerprint that can be embedded ina given media object, as

(2)

where , , and are the entropyof the encryption key, the mutual information between theencryption and decryption keys, and the conditional entropyof the encryption key given the decryption key, respectively.Since the fingerprint embedding is done within the decryp-tion framework, the perceptual quality of the fingerprintedframe can be described as a function of the correlation be-tween and . The mutual information is afunction of the correlation between the keys, which in turnaffects the perceptual quality of the decrypted/fingerprintedimage given our encryption structure. Intuitively one can saythat the higher the correlation, the smaller the perceptualdegradation. However, it is also clear that the larger the valueof due to the increased correlation, the lower thefingerprint payload capacity . Thus, (2) illustrates the in-herent tradeoff between fingerprint payload capacity and per-ceptual quality as a byproduct of combining watermarkingand decryption.

C. Design Challenges

Many interesting challenges arise in the JFD frameworkowing to the merging of the two seemingly orthogonalprocesses of watermarking and encryption. Traditionalfingerprinting leaves a triangular tradeoff among robustness,imperceptibility, and capacity. However, the JFD frameworkhas an additional fourth element of secrecy defined as thestrength of an encryption. Our focus in this paper is onissues related to fingerprint design within a JFD framework,so we do not discuss security extensively in the remainderof this section, which is the topic of another paper.

1) Imperceptibility Versus Secrecy: Imperceptibility ofthe fingerprint after decryption with is necessary toensure that users with valid decryption keys are able to de-crypt the content properly without degrading the perceptualquality. Given a key , which need not be in the key space

(since may not be a legitimate decryption key, butthe result of collusion), two necessary conditions for theasymmetric fingerprinting scheme are

if then

for (3)

else if then

for (4)

with high probability

928 PROCEEDINGS OF THE IEEE, VOL. 92, NO. 6, JUNE 2004

where is the perceptual distortion measure, is aglobal masking threshold for all the fingerprints, andis the minimum perceptual distance necessary to obscure theviewing experience if the wrong key is used. Condition (3)guarantees that a legitimate user can decrypt the content withreasonable perceptual quality, and condition (4) suggests thatif the wrong key is used, there is a severe degradation ofthe media quality; thus, content access is denied. The distor-tion measure is an application- and media-dependentquantity; contenders for this metric are the and normsand peak signal-to-noise ratio (PSNR) measure.

2) Collusion Attack Model and Countermeasures: Weperceive two types of collusion attacks in this framework:1) collusion of decryption keys, which is a protocol attack[Type (A)], and 2) collusion of the fingerprinted videoframes, which is a signal processing attack [Type (B)]. Thefingerprint must be robust to both types of attacks. We focuson Type (B) collusion and restrict the attack to be linear inthis preliminary formulation.

Definition 1 [Type (A): Collusion of DecryptionKeys]: We can break up the set of receivers intotwo mutually exclusive sets: colluders and innocentusers such that . Correspondingly, wecan split up the decryption key space into two dis-joint spaces, such that , where

. These unknown colludersin the set , can collude the keys using some function toobtain an estimate of a decryption key as follows:

(5)

The nature of the function depends specifically on thestructure of the keys and the type of encryption used.

Definition 2 (Attack—Framing): If constraints (3) and (4)are met successfully by the key designer, then the pirates willhave no choice but to choose the function such that

, i.e., the key is within the decryption key space of innocentusers. This attack is called framing.

Proposition 1 (Frameproof Coding): The decryption keysmust be designed in such a way that for any computationallypossible function , the key estimate : 1) must not mapto one of the keys in the decryption key space or 2) mustmap to one of the colluders.

Definition 3 [Type (B): Linear Collusion of FingerprintedCopies]: Given , the set of finger-printed copies available to the colluders, the colluded copyis generated as follows:

(6)

We assert that one way to mitigate the effect of a large-scale linear collusion is by introducing some common ele-ments (or invariant marks) in the fingerprints distributed tosubsets of users. In this paper we propose a scheme in whichany combination of or fewer colluding groups out of canbe detected with high probability. In this approach, which canbe considered a variation of the Dittmann et al. scheme [20],

for each unique combination of colluders, a different set ofmarks (or bits) are preserved that can be used to determinethe exact colluding members.

D. Joint Fingerprint Embedding and Decryption Based onCoefficient Set Scrambling

To demonstrate the practical compromises necessaryfor algorithm design, we implement a straightforward JFDscheme based on scrambling of the DCT coefficients. Forsimplicity, we focus on results for a single image rather thana video stream, but the same insights hold for both types ofmedia.

1) Fingerprint Embedding: Our scheme, in part, may beconsidered a merging of the method by Tang [28] with thenotion of chameleon ciphers by Anderson and Manifavas[33] to generate a set of fingerprints during decryptionwith the invariant character discussed by Dittmann et al.scheme [20]. The raw video frame is DCT encoded andquantized. The DCT is taken of the entire image, and it is notdivided into 8 8 blocks as in the case of [28] for reasonsof perceptibility of fingerprint during decryption. We thenidentify a set of coefficients in the low and midfrequencyregion that are perceptually significant. From this set, wepartition the coefficients into subsets. These subsets are allsign scrambled (i.e., the sign is arbitrarily flipped dependingon the key) during encryption and some are left scrambledduring decryption for the purpose of forming the fingerprintas in the case of [33].

There are two types of encryption keys associated witheach component; the first serving as pointers to subsets ofcoefficients to encrypt, and the second as the scrambling keythat dictates the order in which the coefficients are permutedwithin a given subset. The receiver is given a unique subset ofkeys for decrypting only a fraction of the encryptedsubsets. The remaining subsets are hidden from thereceiver and their unique sign bit signature along with theirconcealed positions in the frame constitutes the fingerprint.An overview of the system is provided in Fig. 7. The lengthof the receiver key set in relation to the source key setis a measure of the correlation between the two keys, since

.The number of coefficients in each subset denoted

(where is the subset index) is chosen to trade perceptualquality and robustness. As increases, there is a corre-sponding increase in diversity of the fingerprint embedded,which, consequently, increases the probability of retrievingthe mark. However, if a large number of coefficients are leftscrambled, this will increase the perceptual distortion in thefingerprinted frame.

The only way to meet the imperceptibility constraintwithout compromising on robustness would be by de-creasing the number of hidden subsets , which resultsin a decrease in the fingerprint payload. Thus, we, onceagain, have the triangular tradeoff scenario in traditionalwatermarking.

If is small as compared to , then the fingerprints willbe almost orthogonal to each other, and a maximum of

KUNDUR AND KARTHIK: VIDEO FINGERPRINTING AND ENCRYPTION PRINCIPLES FOR DIGITAL RIGHTS MANAGEMENT 929

Fig. 7. Overview of fingerprint embedding through subset scrambling.

Fig. 8. Original, encrypted, and fingerprinted images (for n = 25, k = 6,M = 200).

Table 1Results Demonstrating Tradeoff Between Payload, Robustness, and Perceptual Quality

fingerprints can be embedded. But as the number of col-luders increases in an orthogonal fingerprinting scheme, thefalse positive and the false negative rates increase, mainlybecause the distinguishing features between the fingerprintsassume negligible amplitude when the copies are averaged.If we introduce some “common” elements among subsetsof colluders, then these elements are likely to be preservedwith a higher probability. However, there is a sacrifice inresolution for reliable detection. This lays the foundationfor a group-based fingerprinting scheme by introducing atwo-level hierarchical structure to fingerprinting. Thus, inthe present scheme, the fingerprint has two components, thegroup ID and the user ID.

2) Fingerprint Detection and Simulation Results: Eachfingerprint resulting from partial decryption has a uniquesign bit signature. Fingerprint detection is carried out bycorrelating the subsets in the retrieved copy with a list of ref-erence signatures. Let be the signsignatures of the subsets in the encrypted frame and

be the retrieved copy from which the colluders (if any)have to be detected. In a similar fashion, we create a sign ma-trix from , where .We then form the correlation matrix

and an autocorrelation vector,, . is the average number of

coefficients in each subset. If , then we declare thatthe subset is encrypted and the corresponding video framecomponent is marked as “1”; otherwise, it is marked as “0.”The locations of the ones in the string constitute the finger-print, which is compared with the entries in the database toidentify potential colluders. The threshold is a function of thedesign parameters , , and can be adjusted experimen-tally to balance the false positive and false negative rates.

For a 256 256 grayscale Lena image, we set theparameters to be , , . The corre-sponding PSNR values of the encrypted and fingerprintedimages were found to be dB and dB,respectively. We set these as the lower and upper boundsfor obscurity (or secrecy) and imperceptibility respectively(better bounds can be obtained experimentally by averagingthe PSNR values over an ensemble of images with differenttextures). Fig. 8 shows the original, scrambled, and finger-printed images. Our intent is only to obscure the commercialquality of the video.

Table 1 illustrates the tradeoff between imperceptibilityPSNR , robustness, and payload ( ). A JPEG

930 PROCEEDINGS OF THE IEEE, VOL. 92, NO. 6, JUNE 2004

compression has been applied to the fingerprintedimage. We measure robustness as the fraction of hiddenmarks inaccurately detected. Payload can be increasedby increasing or and diversity by increasing .As seen from the table, increasing both and hasa severe impact on the perceptual quality of the finger-printed image. To maintain a minimum level of secrecyPSNR dB without compromising on

robustness, the width of the subset must be greater than 100.Beyond a particular value of , the error rate surprisinglyincreases because of overlapping subsets, which causes falsenegatives.

E. Work in Progress and Future Challenges

The preliminary scheme discussed here was developedheuristically and later adapted to meet design rules toillustrate the feasibility of the JFD approach. Although wehave proposed ways of tracing subsets of colluders in theface of linear collusion attacks, this scheme is susceptible tokey collusion attack. A group of pirates can compare the keysets to successfully identify and erase some of the subbandsthat have been hidden in the frame. However, framing isvery difficult, since accurate reconstruction of an innocentuser’s fingerprint would require exact knowledge of his/herdecryption key, which is improbable.

The receiver should not know the video features being de-crypted nor the features hidden; in the presented scheme, theformer is available to the receiver. Work in progress inves-tigates designing a system such that the receiver is not ableto derive information about the location of encrypted com-ponents without access to the source secret key. At present,we are also working on an analytical model for illustratingtradeoff issues in the JFD framework, which can then be usedto develop a more robust fingerprinting scheme.

VI. CONCLUSION

This paper, in part, provides an overview of the manyissues that must be addressed for video encryption andfingerprinting in a DRM context. Given the thrust toward se-curity for emerging resource constrained DRM applications,there is a need for solutions that provide a better compro-mise between security and complexity. This is resulting ina paradigm shift in the area of information protection, inwhich ideas from areas such as media processing are oftenincorporated to provide more lightweight solutions.

Low-complexity security solutions must take carefulaccount of the application-dependent restrictions and com-peting objectives. Current solutions often strip down thesecurity algorithm or protect a partial component of theinformation. Inspired by the chameleon cipher, we focus onthe approach of integration to potentially achieve a moreappropriate compromise. Given that perfect security maybe unattainable in a practical context, we hope that furtherresearch in this area will result in more effective yet efficientdesigns.

REFERENCES

[1] N. R. Wagner, “Fingerprinting,” in Proc. IEEE Symp. Security andPrivacy, 1983, pp. 18–22.

[2] F. Hartung and B. Girod, “Digital watermarking of MPEG-2 codedvideo in the bitstream domain,” in Proc. Int. Conf. Acoustics, Speechand Signal Processing, vol. 4, 1997, pp. 2621–2624.

[3] D. Boneh and J. Shaw, “Collusion-secure fingerprinting for digitaldata,” IEEE Trans. Inform. Theory, vol. 44, pp. 1897–1905, Sept.1998.

[4] W. Trappe, M. Wu, and K. J. R. Liu, “Collusion-resistant finger-printing for multimedia,” in Proc. IEEE Int. Conf. Acoustics, Speechand Signal Processing, vol. 4, 2002, pp. 3309–3312.

[5] W. Trappe, M. Wu, Z. J. Wang, and K. J. R. Liu, “Anti-collusionfingerprinting for multimedia,” IEEE Trans. Signal Processing, vol.51, pp. 1069–1087, Apr. 2003.

[6] B. M. Macq and J.-J. Quisquater, “Cryptology for digital TV broad-casting,” Proc. IEEE, vol. 83, pp. 944–957, June 1995.

[7] J. Bloom, “Security and rights management in digital cinema,” inProc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, vol.4, Apr. 2003, pp. 712–715.

[8] W. Luh and D. Kundur, “Media fingerprinting techniques andtrends,” in Multimedia Security Handbook, B. Furht and D.Kirovski, Eds. Boca Raton, FL: CRC, 2004, ch. 19.

[9] R. Anderson and F. A. P. Petitcolas, “On the limits of steganog-raphy,” IEEE J. Select. Areas Commun., vol. 16, pp. 474–481, May1998.

[10] I. J. Cox and M. L. Miller, “A review of watermarking and the im-portance of perceptual modeling,” Proc. SPIE, Human Vision andElectronic Imaging II, vol. 3016, pp. 92–99, 1997.

[11] G. R. Blakely, C. Meadows, and G. B. Purdy, “Fingerprinting longforgiving messages,” in Proc. Advances in Cryptology, 1985, pp.180–189.

[12] B. Pfitzmann and M. Schunter, “Asymmetric fingerprinting,” inProc. Eurocrypt 1996, pp. 84–95.

[13] B. Pfitzmann and M. Waidner, “Anonymous fingerprinting,” in Proc.Eurocrypt 1997, pp. 88–102.

[14] S. Baudry, J. F. Delaigle, B. Sankur, B. Macq, and H. Maitre,“Analyses of error correction strategies for typical communicationchannels in watermarking,” Signal Process., vol. 81, no. 6, pp.1239–1250, June 2001.

[15] P. Fernando, P. Gonzalez, J. R. Hernandez, and F. Balado, “Ap-proaching the capacity limit in image watermarking: a perspectiveon coding techniques for data hiding applications,” Signal Process.,vol. 81, no. 6, pp. 1215–1238, June 2001.

[16] B. Chen and G. W. Wornell, “Quantization index modulation: A classof provably good methods for digital watermarking and informationembedding,” IEEE Trans. Inform. Theory, vol. 47, pp. 1423–1443,May 2001.

[17] D. Kundur and D. Hatzinakos, “Diversity and attack characterizationfor improved robust watermarking,” IEEE Trans. Signal Processing,vol. 29, pp. 2383–2396, Oct. 2001.

[18] I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon, “Secure spreadspectrum watermarking for multimedia,” IEEE Trans. Image Pro-cessing, vol. 6, pp. 1673–1687, Dec. 1997.

[19] H. Zhao, M. Wu, Z. J. Wang, and K. J. R. Liu, “Nonlinear collusionattacks on independent fingerprints for multimedia,” in Proc. IEEEInt. Conf. Acoustics, Speech, and Signal Processing, vol. 5, 2003,pp. 664–667.

[20] J. Dittmann, A. Behr, M. Stabenau, P. Schmitt, J. Schwenk, andJ. Ueberberg, “Combining digital watermarks and collusion securefingerprints for digital images,” in In Proc. SPIE Conf. ElectronicImaging ’99, Security and Watermarking of Multimedia Contents,vol. 41, 1999, pp. 171–182.

[21] K. Su, D. Kundur, and D. Hatzinakos, “A novel approach tocollusion-resistant video watermarking,” in Proc. SPIE, Securityand Watermarking of Multimedia Content IV, vol. 4675, 2002,pp. 491–502.

[22] , “Statistical invisibility for collusion-resistant digital video wa-termarking,” IEEE Trans. Multimedia, to be published.

[23] , “Spatially localized image-dependent watermarking for statis-tical invisibility and collusion resistance,” IEEE Trans. Multimedia,to be published.

[24] H. Cheng and X. Li, “Partial encryption of compressed images andvideos,” IEEE Trans. Signal Processing, vol. 48, pp. 2439–2451,Aug. 2000.

KUNDUR AND KARTHIK: VIDEO FINGERPRINTING AND ENCRYPTION PRINCIPLES FOR DIGITAL RIGHTS MANAGEMENT 931

[25] A. Friedman, “Wireless and mobile communication: A real time se-curity solution over wireless IP,” Fortress Technol., White Paper,1999.

[26] Y. Chang, R. Han, C. Li, and J. Smith, “Secure transcoding of in-ternet content,” in Proc. Int. Workshop Intelligent Multimedia Com-puting and Networking (IMMCN), 2002, pp. 940–943.

[27] S. J. Wee and J. G. Apostolopoulos, “Secure scalable streaming en-abling transcoding without decryption,” in Proc. IEEE Int. Conf.Image Processing, vol. 1, 2001, pp. 437–440.

[28] L.Lei Tang, “Methods for encrypting and decrypting MPEG videodata efficiently,” in Proc. 4th ACM Int. Conf. Multimedia, 1996, pp.219–229.

[29] C. Shi and B. Bhargava, “A fast MPEG video encryption algorithm,”in Proc. 6th ACM Int. Multimedia Conf., 1998, pp. 81–88.

[30] I. Agi and L. Gong, “An empirical study of secure MPEG videotransmissions,” in Proc. Internet Society Symp. Network and Dis-tributed System Security, 1996, pp. 137–144.

[31] L. Qiao and K. Nahrstedt, “Comparison of MPEG encryption algo-rithms,” Comput. Graph., vol. 22, no. 4, pp. 437–448, 1998.

[32] J. M. Shapiro, “Embedded image coding using zerotrees of waveletcoefficients,” IEEE Trans. Acoust., Speech, Signal Processing, vol.41, pp. 3445–3462, Dec. 1993.

[33] R. Anderson and C. Manifavas, “Chameleon—A new kind of streamcipher,” in Lecture Notes in Computer Science, Fast Software En-cryption, E. Biham, Ed. Heidelberg, Germany: Springer-Verlag,1997, pp. 107–113.

[34] R. Parnes and R. Parviainen, “Large scale distributed watermarkingof multicast media through encryption,” in Proc. IFIP Int. Conf.Communications and Multimedia Security Issues of the New Cen-tury, 2001, p. 17.

[35] C. E. Shannon, “Communication theory of secrecy systems,” BellSyst. Tech. J., vol. 28, pp. 656–715, Oct. 1949.

Deepa Kundur (Senior Member, IEEE) wasborn in Toronto, ON, Canada. She received theB.A.Sc., M.A.Sc., and Ph.D. degrees in electricaland computer engineering in 1993, 1995, and1999, respectively, at the University of Toronto,Toronto.

From 1999 to 2002, she was an AssistantProfessor in the Edward S. Rogers Sr. Depart-ment of Electrical and Computer Engineering,University of Toronto, where she held the title ofBell Canada Junior Chair-holder in Multimedia.

In 2003, she joined the Electrical Engineering Department at Texas A&MUniversity, College Station, where she is a member of the Wireless Com-munications Laboratory and holds the position of Assistant Professor. Sheis the author of over 60 papers. Her research interests include multimediaand network security, video cryptography, sensor network security, datahiding and steganography, covert communications, and nonlinear andadaptive information processing algorithms.

Dr. Kundur is the recipient of several awards, including the 2002 GordonSlemon Teaching of Design Award. She has been on numerous technicalprogram committees and has given over 30 talks in the area of digital rightsmanagement, including tutorials at ICME 2003 and Globecom 2003.

Kannan Karthik (Student Member, IEEE)received the B.E degree in electrical engineeringfrom Bombay University in 1998 and the M.Engdegree in Power Electronics and Controls fromMemorial University of Newfoundland, St.John’s, Canada in 2001. He is currently workingtoward the Ph.D degree in Multimedia Securityat the University of Toronto, ON, Canada.

His research interests include multimedia secu-rity, video watermarking and cryptography, andstatistical signal processing.

932 PROCEEDINGS OF THE IEEE, VOL. 92, NO. 6, JUNE 2004